Implementing Load Balancing Using Python

2023-01-08 09:11 问答作者：

I am doing some research this summer and working on parallelizing pre-existing code. The main focus right now is a way to load balance the code so that it will run more efficient on the cluster. T开发者_如何转开发he current task is to make a proof of concept that creates several processes with each one having their own stack available and when the process is finished processing the stack it queries the two closest processes to see if they have any more work available in their stack.

I am having difficulties conceptualizing this in python, but was hoping someone could point me in the right direction or have some sort of example that is similar in either mpi4py or ParallelPython. Also if anyone knows of a better or easier module then that would be great to know.

Thanks.

Here's a simple way to do this.

Create a single common shared queue of work to do. This application will fill this queue with work to do.
Create an application which gets one item from the queue, and does the work.

This is the single-producer-multiple-consumer design. It works well and can swamp your machine with parallel processes.

To use the built-in queue class, you need to wrap the queue in some kind of multi-processing API. http://docs.python.org/library/queue.html. Personally, I like to create a small HTTP-based web-server that handles the queue. Each application does a GET to fetch the next piece of work.

You can use tools like RabbitMQ to create a very nice shared queue. http://nathanborror.com/posts/2009/may/20/working-django-and-rabbitmq/

You might be able to use http://hjb.python-hosting.com/ to make use of JMS queues.

You'll need a small application to create and fill the queue with work.

Create as many copies of the application as you like. For example:

for i in 1 2 3 4 5 6 7 8 9 10
do
    python myapp.py &
done

This will run 10 concurrent copies of your application. All 10 are trying to get work from a single queue. They will use all available CPU resources and the OS will nicely schedule them for you.

Peer, node-to-node synchronization means you have O(n*(n-1)/2) communication paths among all nodes.

The "two-adjacent nodes" means you still have 2*n communication paths and work has to "somehow" trickle among the nodes. If the nodes are all initially seeded with work, then someone did a lot of planning to balance the workload. If you're going to do that much planning, why ask the nodes to synchronize at all?

If the queues are not carefully balanced to begin with than every even node could be slow. Every odd node could be fast. The odd nodes finish first, check for work from two even nodes, and those nodes are (a) not done and (b) don't have more work to do, either. What now? Half the nodes are working, half are idle. All due to poor planning in the initial distribution of work.

Master-slave means you have n communication paths. Further, the balancing is automatic since all idle nodes have equal access to work. There's no such thing as a biased initial distribution that leads to poor overall performance.

use multiprocessing module

继续阅读：module parallel-processing python

Implementing Load Balancing Using Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？