Multi processing subprocess

2023-01-12 15:39 问答作者：

I'm new to subprocess module of python, currently my implementation is not multi processed.

import subprocess,shlex
    def forcedParsing(fname):

        cmd = 'strings "%s"' % (fname)
        #print cmd
        args= shlex.split(cmd)
        try:
        开发者_运维知识库    sp = subprocess.Popen( args, shell = False, stdout = subprocess.PIPE, stderr = subprocess.PIPE )
            out, err = sp.communicate()
        except OSError:
            print "Error no %s  Message %s" % (OSError.errno,OSError.message)
            pass

        if sp.returncode== 0:
            #print "Processed %s" %fname
            return out

    res=[]
    for f in file_list: res.append(forcedParsing(f))

my questions:

Is sp.communicate a good way to go? should I use poll?

if I use poll I need a sperate process which monitors if process finished right?
should I fork at the for loop?

1) subprocess.communicate() seems the right option for what you are trying to do. And you don't need to poll the proces, communicate() returns only when it's finished.

2) you mean forking to paralellize work? take a look at multiprocessing (python >= 2.6). Running parallel processes using subprocess is of course possible but it's quite a work, you cannot just call communicate(), which is blocking.

About your code:

cmd = 'strings "%s"' % (fname)
args= shlex.split(cmd)

Why not simply?

args = ["strings", fname]

As for this ugly pattern:

res=[]
for f in file_list: res.append(forcedParsing(f))

You should use list-comprehensions whenever possible:

res = [forcedParsing(f) for f in file_list]

About question 2: forking at the for loop will mostly speed things up if the script's supposed to run on a system with multiple cores/processors. It will consume more memory, though, and will stress IO harder. There will be a sweet spot somewhere that depends on the number of files in file_list, but only benchmarking on a realistic target system can tell you where it is. If you find that number, you could add an if len(file_list) > <your number>: with optional fork() 'ing [Edit: rather, as @tokland say's via multiprocessing if it's available on your Python version (2.6+)] that chooses the most efficient strategy on a per-job basis.

Read about Python profiling here: http://docs.python.org/library/profile.html

If you're on Linux, you can also run time: http://linuxmanpages.com/man1/time.1.php

There are several warnings in the subprocess documentation that advise you to use communicate to avoid problems with a processes blocking, so it would be a good idea to use that.

继续阅读：fork multithreading process python subprocess

Multi processing subprocess

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？