Calling Python from .NET in a loop is just too slow
We've got a number of python libraries that need to be called from inside some loops in .NET that are just开发者_高级运维 too slow. We're using Process.Start, which is taking about 1/3 of a second per call, which means that a few dialogs take 30 seconds to load (on an 8 core machine - our customers will most likely have much slower computers).
For various reasons, we can't use IronPython (such as some of the files using the csv module which has known issues with IronPython).
What can I do to speed things up? Even though I'm a python newb, some of these functions are simple if elif else
blocks, and some profiling shows the main "cost" is starting python.exe
a few dozen times. Is there some secret options to starting a single python process and streaming things in and out?
A similar question is this one.
You can rewrite your scripts to accept commands from the standard input instead and not exit when they're finished.
Then start up a process (or a pool of them) and feed them commands, instead of starting new processes. For example instead of creating a new some_script.py args1
, some_script.py args2
, ... only spawn some_script_wrapper.py
and feed it:
args1
args2
\end
You'll save the startup time this way and can even enable multiprocessing in a pool if needed.
Of course you'll have to make sure input / output is processed completely between data parts. To make it easier, you can wrap your arguments / output into a known structured format (json?), make sure you know the "message size", or even just use magic markers to find out when you get the end of a data chunk.
You could try to embed python into your .NET application with pythonnet (http://pythonnet.sourceforge.net/readme.html#embedding) but unfortunately there is hardly any documentation on this.
Some pointers are http://mail.python.org/pipermail/pythondotnet/2007-June/000620.html and http://mail.python.org/pipermail/pythondotnet/2010-August/000994.html.
Don't call a python function in every loop. If you need to call a function 100 times, modify said function to accept 100 inputs and return 100 outputs. In .NET construct what I'll call your batch query, send it off in one huge function call, then move on to whatever else needs to be done.
One could call my suggestion buffering, though it may not be possible depending on what you're actually doing.
精彩评论