Python's multiprocessing Does Not Play Nicely With threading.local?
I have two processes (see sample code) that each attempt to access a threading.local object. I would expect the below code to print "a" and "b" (in either order). Instead, I get "a" and "a". How can I elegantly and robustly reset the threading.local object when I startup whole new processes?
import threading
import multiprocessing
l = threading.local()
l.x = 'a'
def f():
print getattr(l, 'x', 'b')
multiprocessing.Process(target=f).start()开发者_C百科
f()
edit: For reference, when I use threading.Thread instead of multiprocessing.Process, it works as expected.
Both operating systems you mentioned are Unix/Linux based and therefore implement the same fork()
ing API.
A fork()
completely duplicates the process object, along with its memory, loaded code, open file descriptors and threads. Moreover, the new process usually shares the very same process object within the kernel until the first memory write operation. This basically means that the local data structures are also being copied into the new process, along with the thread local variables. Thus, you still have the same data structures and l.x
is still defined.
To reset the data structures for the new process, I'd recommend the process starting function to first call for some clearing method. You could, for example, store the parent process pid with process_id = os.getpid()
and use
if process_id != os.getpid():
clear_local_data()
In the child process main function.
Because threading.local
does the trick for threads, not for processes, as clearly described in its documentation:
The instance’s values will be different for separate threads.
Nothing about processes.
And a quote from multiprocessing doc:
Note
multiprocessing contains no analogues of threading.active_count(), threading.enumerate(), threading.settrace(), threading.setprofile(), threading.Timer, or threading.local.
There is now a multiprocessing-utils (github) library on pypi with a multiprocessing-safe version of threading.local()
which can be pip installed.
It works by wrapping a standard threading.local()
and checking that the PID has not changed since it was last used (as per the answer here from @immortal).
Use it exactly like threading.local()
:
l = multiprocessing_utils.local()
l.x = 'a'
def f():
print getattr(l, 'x', 'b')
f() # prints "a"
threading.Thread(target=f).start() # prints "b"
multiprocessing.Process(target=f).start() # prints "b"
Full disclosure: I just created this module
精彩评论