开发者

Why do I have to use .wait() with python's subprocess module?

I'm running a Perl script through the subprocess module in Python on Linux. The function that runs the script is called several times with variable input.

def script_runner(variable_input):

    out_file = open('out_开发者_运维知识库' + variable_input, 'wt')
    error_file = open('error_' + variable_input, 'wt')

    process = subprocess.Popen(['perl', 'script', 'options'], shell=False,
                           stdout=out_file, stderr=error_file)

However, if I run this function, say, twice, the execution of the first process will stop when the second process starts. I can get my desired behavior by adding

process.wait()

after calling the script, so I'm not really stuck. However, I want find out why I cannot run the script using subprocess as many times as I want, and have the script make these computations in parallel, without having to wait for it to finish between each run.

UPDATE

The culprit was not so exciting: the perl script used a common file that was rewritten for each execution.

However, the lesson I learned from this was that the garbage collector does not delete the process once it starts running, because this had no influence on my script once I got it sorted out.


If you are using Unix, and wish to run many processes in the background, you could use subprocess.Popen this way:

x_fork_many.py:

import subprocess
import os
import sys
import time
import random
import gc  # This is just to test the hypothesis that garbage collection of p=Popen() causing the problem.

# This spawns many (3) children in quick succession
# and then reports as each child finishes.
if __name__=='__main__':
    N=3
    if len(sys.argv)>1:
        x=random.randint(1,10)
        print('{p} sleeping for {x} sec'.format(p=os.getpid(),x=x))
        time.sleep(x)
    else:
        for script in xrange(N): 
            args=['test.py','sleep'] 
            p = subprocess.Popen(args)
        gc.collect()
        for i in range(N):
            pid,retval=os.wait()
            print('{p} finished'.format(p=pid))

The output looks something like this:

% x_fork_many.py 
15562 sleeping for 10 sec
15563 sleeping for 5 sec
15564 sleeping for 6 sec
15563 finished
15564 finished
15562 finished

I'm not sure why you are getting the strange behavior when not calling .wait(). However, the script above suggests (at least on unix) that saving subprocess.Popen(...) processes in a list or set is not necessary. Whatever the problem is, I don't think it has to do with garbage collection.

PS. Maybe your perl scripts are conflicting in some way, which causes one to end with an error when another one is running. Have you tried starting multiple calls to the perl script from the command line?


You have to call wait() in order to ask to "wait" the ending of your popen.

As popen execute in background the perl script, if you do not wait(), it will be stopped at the object "process" 's end of life... that is at the end of script_runner.


As said by ericdupo, the task is killed because you overwrite your process variable with a new Popen object, and since there are no more references to your previous Popen object, it is destroyed by the garbage collector. You can prevent this by keeping a reference to your objects somewhere, like a list:

processes = []
def script_runner(variable_input):

    out_file = open('out_' + variable_input, 'wt')
    error_file = open('error_' + variable_input, 'wt')

    process = subprocess.Popen(['perl', 'script', 'options'], shell=False,
                           stdout=out_file, stderr=error_file)
    processes.append(process)

This should be enough to prevent your previous Popen object from being destroyed


I think you want to do

list_process = []
def script_runner(variable_input):

    out_file = open('out_' + variable_input, 'wt')
    error_file = open('error_' + variable_input, 'wt')

    process = subprocess.Popen(['perl', 'script', 'options'], shell=False,
                           stdout=out_file, stderr=error_file)
    list_process.append(process)
#call several times script_runner
for process in list_process:
    process.wait()

so your process will be run in parallel

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜