Better multithreaded use of Python subprocess.Popen & communicate()?
I'm running multiple commands which may take some time, in parallel, on a Linux machine running Python 2.6.
So, I used subprocess.Popen
class and process.communicate()
method to parallelize execution开发者_Go百科 of mulitple command groups and capture the output at once after execution.
def run_commands(commands, print_lock):
# this part runs in parallel.
outputs = []
for command in commands:
proc = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
output, unused_err = proc.communicate() # buffers the output
retcode = proc.poll() # ensures subprocess termination
outputs.append(output)
with print_lock: # print them at once (synchronized)
for output in outputs:
for line in output.splitlines():
print(line)
At somewhere else it's called like this:
processes = []
print_lock = Lock()
for ...:
commands = ... # a group of commands is generated, which takes some time.
processes.append(Thread(target=run_commands, args=(commands, print_lock)))
processes[-1].start()
for p in processes: p.join()
print('done.')
The expected result is that each output of a group of commands is displayed at once while execution of them is done in parallel.
But from the second output group (of course, the thread that become the second is changed due to scheduling indeterminism), it begins to print without newlines and adding spaces as many as the number of characters printed in each previous line and input echo is turned off -- the terminal state is "garbled" or "crashed". (If I issue reset
shell command, it restores normal.)
At first, I tried to find the reason from handling of '\r'
, but it was not the reason. As you see in my code, I handled it properly using splitlines()
, and I confirmed that with repr()
function applied to the output.
I think the reason is concurrent use of pipes in Popen
and communicate()
for stdout/stderr. I tried check_output
shortcut method in Python 2.7, but no success. Of course, the problem described above does not occur if I serialize all command executions and prints.
Is there any better way to handle Popen
and communicate()
in parallel?
A final result inspired by the comment from J.F.Sebastian.
http://bitbucket.org/daybreaker/kaist-cs443/src/247f9ecf3cee/tools/manage.py
It seems to be a Python bug.
I am not sure it is clear what run_commands needs to be actually doing, but it seems to be simply doing a poll on a subprocess, ignoring the return-code and continuing in the loop. When you get to the part where you are printing output, how could you know the sub-processes have completed?
In your example code I noticed your use of:
for line in output.splitlines():
to address partially the issue of " /r " ; use of
for line in output.splitlines(True):
would have been helpful.
精彩评论