开发者

How do I run lots of subprocesses efficiently in python?

Basic setup:

I am using a python script for automatic testing of a programming project that I am working on. In the test, I run my executable with lots of different options and compare the result with previous runs. The testing takes quite a lot of time since I have roughly 600k different tests to run.

At the moment, I have split my script into two parts, a test-module that grabs tests from a job-queue and places results in a result-queue, and a main-module that creates the job-queue and then checks the results. This allows me to play around with using several test-processes/threads which so far has not given any improvement in testing speed (I am running this on a dual-core computer, I would expect more test-processes to work better on a quad-core).

In t开发者_运维知识库he test module, I create a command string that I then execute using

subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)

I then read the results from the pipe and place it in the result-queue.

Question:

Is this the most efficient way of running lots and lots of command strings on a multi-core system? Every Popen I do creates a new process, which seems like it might create quite a bit of overhead, but I can't really think of a better way to do it.

(I am currently using python 2.7 in case this matters.)

EDIT:

OS is Linux

The subprocesses that I spawn are commandline C-executables with arguments.


You could have a look to mulitprocessing module, especially the Pool part.

It will allow you to launch as processes as you want (default as many as CPU cores).


First, try measuring the testing script/scheme with a null-executable. That way you can see how much overhead the process spawning has w.r.t. actual testing time. Then we have some real data to act on.

Adding a batch mode to your exe (that reads command lines off a file and does that work) is probably a good idea if the amount of work is small compared to the time it takes to load and shut down your process. Plus, it will help you find memory leaks. :)

By moving stuff out of main(), this isn't so hard to do.


In the end I created python C-bindings (with SWIG) directly to the code that I wanted to test. It turned out to be several hundred times faster than starting subprocesses.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜