python queue concurrency process management

2023-01-23 11:44 问答作者：

The use case is as follows : I have a script that runs a series of non-python executables to reduce (开发者_StackOverflow中文版pulsar) data. I right now use subprocess.Popen(..., shell=True) and then the communicate function of subprocess to capture the standard out and standard error from the non-python executables and the captured output I log using the python logging module.

The problem is: just one core of the possible 8 get used now most of the time. I want to spawn out multiple processes each doing a part of the data set in parallel and I want to keep track of progres. It is a script / program to analyze data from a low frequencey radio telescope (LOFAR). The easier to install / manage and test the better. I was about to build code to manage all this but im sure it must already exist in some easy library form.

The subprocess module can start multiple processes for you just fine, and keep track of them. The problem, though, is reading the output from each process without blocking any other processes. Depending on the platform there's several ways of doing this: using the select module to see which process has data to be read, setting the output pipes non-blocking using the fnctl module, using threads to read each process's data (which subprocess.Popen.communicate itself uses on Windows, because it doesn't have the other two options.) In each case the devil is in the details, though.

Something that handles all this for you is Twisted, which can spawn as many processes as you want, and can call your callbacks with the data they produce (as well as other situations.)

Maybe Celery will serve your needs.

If I understand correctly what you are doing, I might suggest a slightly different approach. Try establishing a single unit of work as a function and then layer on the parallel processing after that. For example:

Wrap the current functionality (calling subprocess and capturing output) into a single function. Have the function create a result object that can be returned; alternatively, the function could write out to files as you see fit.
Create an iterable (list, etc.) that contains an input for each chunk of data for step 1.
Create a multiprocessing Pool and then capitalize on its map() functionality to execute your function from step 1 for each of the items in step 2. See the python multiprocessing docs for details.

You could also use a worker/Queue model. The key, I think, is to encapsulate the current subprocess/output capture stuff into a function that does the work for a single chunk of data (whatever that is). Layering on the parallel processing piece is then quite straightforward using any of several techniques, only a couple of which were described here.

继续阅读：concurrency process python

python queue concurrency process management

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？