Python multiprocessing.Pool with processes that crash
Well, they're not supposed to crash, but they do anyway. Is there a way to get multiprocessing.Pool, or any other multiprocessing tool to re-start a process that dies? How would I do this otherwise?
Thanks!
Edit: Some background. The process does several things with geometry in Autodesk Maya. Which it does totally fine. The problem is that every once in a while I'll have a file that decides, once it's finished and a new scene is being opened, to completely exit Maya (or mayapy) with no python warnings or errors, or critical process errors from Windows. It just dies. There's not开发者_StackOverflow社区 really anything I can do about the crashing unfortunately.
What I'm hoping for is a way to re-start any processes that have died from a crash.
Indeed the error handling is better in python 3.3 as masida said. Here I check for timeouts when a child process has died silently.
This workaround is for python <3.3 and multiprocessing.pool, of course managing your own processes is a good alternative.
Use pool.map_async to run the processes asynchronously, you can then check if the jobs are done and how long they are taking. If they take too long (for instance when one process died and won't return) -> kill all pool processes with pool.terminate() and start over. In code:
done = False # not finished yet
while not(done):
job_start = time.time() # start time
Jobs = pool.map_async(args) # asynchronous pool call
redo = False # no redo yet
while not(Jobs.ready()): # while jobs are not finished
if (time.time() - job_start) > maxWait: # check maximum time (user def.)
pool.terminate() # kill old pool
pool = multiprocessing.pool(args) # create new pool
redo = True # redo computation
break # break loop, (not finished)
if not(redo): # computation was successful
result = Jobs.get() # get results
done = True # exit outer while
Another option is to use a timeout on the iterator returned by pool.imap, which can be provided as a parameter to the iterator's 'next' method, next(timeout). If a process exceeds the timeout, then multiprocessing.TimeoutError is raised in the main process and similar actions as explained above, can follow within the except block, although I have not tested this thoroughly.
Apparently, recently they've changed the behaviour in Python 3.3, to raise an exception in this case: http://hg.python.org/cpython/rev/6d6099f7fe89
The defect that lead to this ticket is: http://bugs.python.org/issue9205
However, if you manually spawn the workers (which I usually do when I use multiprocessing), you may try to use the Process.is_alive() function: http://docs.python.org/dev/library/multiprocessing#multiprocessing.Process.is_alive
精彩评论