开发者

what is the neat way to divide huge nested loops to 8(or more) processes using Python?

this time i'm facing a "design" problem. Using Python, I have a implement a mathematical algorithm which uses 5 parameters. To find the best combination of these 5 parameters, i used 5-layer nested loop to enumerate all possible combinations in a given range. The time it takes to finish appeared to be beyond my expectation. So I think it's the time to use multithreading...

The task in the core of nested loops are calculation and saving. In current code开发者_如何转开发, result from every calculation is appended to a list and the list will be written to a file at the end of program.

since I don't have too much experience of multithreading in any language, not to mention Python, I would like to ask for some hints on what should the structure be for this problem. Namely, how should the calculations be assigned to the threads dynamically and how should the threads save results and later combine all results into one file. I hope the number of threads can be adjustable.

Any illustration with code will be very helpful.

thank you very much for your time, I appreciate it.

#

update of 2nd Day: thanks for all helpful answers, now I know that it is multiprocessing instead of multithreading. I always confuse with these two concepts because I think if it is multithreaded then the OS will automatically use multiple processor to run it when available. I will find time to have some hands-on with multiprocessing tonight.


You can try using jug, a library I wrote for very similar problems. Your code would then look something like

from jug import TaskGenerator
evaluate = TaskGenerator(evaluate)

for p0 in [1,2,3]:
    for p1 in xrange(10):
        for p2 in xrange(10,20):
             for p3 in [True, False]:
                for p4 in xrange(100):
                    results.append(evaluate(p0,p1,p2,p3,p4))

Now you could run as many processes as you'd like (even across a network if you have access to a computer cluster).


Multithreading in Python won't win you anything in this kind of problem, since Python doesn't execute threads in parallel (it uses them for I/O concurrency, mostly).

You want multiprocessing instead, or a friendly wrapper for it such as joblib:

from joblib import Parallel, delayed

# -1 == use all available processors
results = Parallel(n_jobs=-1)(delayed(evaluate)(x) for x in enum_combinations())
print best_of(results)

Where enum_combinations would enumerate all combinations of your five parameters; you can likely implement it by putting a yield at the bottom of your nested loop.

joblib distributes the combinations over multiple worker processes, taking care of some load balancing.


Assuming this is a calculation-heavy problem (and thus CPU-bound), multi-threading won't help you much in Python due to the GIL.

What you can, however, do is split the calculation across multiple processes to take advantage of extra CPU cores. The easiest way to do this is with the multiprocessing library.

There are a number of examples for how to use multiprocessing on the docs page for it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜