开发者

Poor division of labor using multiprocessing in python?

Consider the following program:

import Queue
from multiprocessing import Queue as Q
from multiprocessing import Process, Value, Array
from time import time
from numpy import sum, zeros
from numpy.random import normal

def enche(queue, cnts, tims, t0, proc):
    while True:
        try:
            queue.get(False)
            x = sum(normal(size = 1000))
            cnts[proc] = cnts[proc] + 1
        except Queue.Empty:
            tims[proc] = time() - t0.value
            break

def main(m, nth):
    queue = Q()
    tmp = [queue.put(0) for i in xrange(m)]
    cnts = Array('i', zeros(nth, dtype = "int"))
    tims = Array('d', zeros(nth))
    t0  = Value('d', time())
    procs = [ Process(target = enche , args = (queue,cnts, tims, t0,i)) for i in xrange(nth) ]
    for p in procs:
        p.start()
    for p in procs:
        p.join()
    for k in xrange(nth):
        print "%2.0f/100, %4.1f ms,"%(cnts[k]/float(m) * 100, tims[k]*1000), 
    print "" 


if __name__ == "__main__":
    for k in xrange(100):
        main(100, 4)

In essence, the function main(m,nth) should do a task - sum a thousand normal random numbers m times - and divide this task in nth processes.

I then count and print what percentage of the task was done by each process and how long each process lasted. This the typical results I get:

 4/100,  1.6 ms,  3/100,  2.2 ms, 88/100, 10.9 ms,  5/100,  2.7 ms, 
23/100,  3.6 ms,  1/100,  1.5 ms,  3/100,  2.0 ms, 73/100, 14.9 ms, 
 8/100,  3.9 ms,  1/100,  3.4 ms,  8/100,  5.3 ms, 83/100, 21.0 ms, 
 6/100,  4.1 ms,  0/100,  3.2 ms, 76/100, 19.0 ms, 18/100,  8.8 ms, 
 5/100,  1.7 ms, 90/100,  9.2 ms,  5/100,  2.4 ms,  0/100,  2.6 ms, 
 2/100,  1.5 ms,  8/100,  2.4 ms, 89/100,  8.9 ms,  1/100,  2.6 ms, 
 8/100,  2.3 ms, 11/100,  2.7 ms,  3/100,  2.4 ms, 78/100,  8.5 ms, 
52/100,  5.8 ms, 11/100,  3.2 ms,  4/100,  2.6 ms, 33/100,  5.9 ms, 
 6/100,  2.3 ms,  8/100,  2.9 ms, 24/100,  4.3 ms, 62/100,  7.7 ms, 
13/100,  3.0 ms, 11/100,  2.6 ms,  1/100,  2.5 ms, 75/100,  8.5 ms, 
 6/100,  2.1 ms,  3/100,  2.1 ms,  6/100,  2.7 ms, 85/100,  9.0 ms, 
 8/100,  2.3 ms,  3/100,  2.0 ms, 27/100,  4.4 ms, 62/100,  7.6 ms, 
18/100,  3.2 ms,  0/100,  1.9 ms,  6/100,  3.1 ms, 76/100,  9.2 ms, 
 3/100,  1.9 ms, 19/100,  3.4 ms, 30/100,  6.0 ms, 48/100,  7.9 ms, 
10/100,  2.4 ms,  0/100,  1.5 ms,  8/100,  2.8 ms, 82/100,  8.8 ms, 
 6/100,  1.8 ms, 36/100,  4.7 ms, 32/100,  4.7 ms, 26/100,  4.7 ms, 
 8/100,  2.0 ms, 46/100,  5.9 ms, 46/100,  5.9 ms,  0/100,  2.6 ms, 
 3/100,  1.6 ms, 54/100,  6.2 ms,  1/100,  2.4 ms, 42/100,  6.1 ms, 
12/100,  2.6 ms,  8/100,  2.4 ms, 34/100,  5.3 ms, 46/100,  6.3 ms, 
 1/100,  1.4 ms,  9/100,  2.3 ms,  1/100,  2.3 ms, 89/100,  9.2 ms, 
13/100,  2.6 ms,  5/100,  2.2 ms,  3/100,  2.4 ms, 79/100,  8.7 ms, 
11/100,  2.3 ms,  7/100,  2.6 ms,  8/100,  2.9 ms, 74/100,  8.8 ms, 
 8/100,  1.8 ms,  9/100,  2.7 ms,  0/100,  1.8 ms, 83/100,  9.3 ms, 
13/100,  2.4 ms,  4/100,  2.0 ms,  3/100,  2.5 ms, 80/100,  9.9 ms, 
10/100,  2.3 ms, 72/100,  8.0 ms,  6/100,  2.6 ms, 12/100,  4.1 ms, 
27/100,  4.3 ms,  0/100,  1.5 ms,  5/100,  2.5 ms, 68/100,  8.4 ms, 
 3/100,  1.5 ms,  8/100,  2.3 ms, 86/100,  8.3 ms,  3/100,  2.7 ms, 
 7/100,  1.9 ms, 10/100,  2.5 ms,  5/100,  2.5 ms, 78/100,  8.2 ms, 
14/100,  2.5 ms,  3/100,  2.0 ms,  5/100,  2.5 ms, 78/100,  8.2 ms, 
53/100,  7.4 ms,  3/100,  1.9 ms,  1/100,  2.4 ms, 43/100,  7.5 ms, 
 3/100,  1.7 ms,  9/100,  2.5 ms,开发者_如何学Go 47/100,  6.0 ms, 41/100,  6.0 ms, 
 3/100,  1.5 ms, 37/100,  4.7 ms, 33/100,  4.7 ms, 27/100,  4.7 ms, 
 7/100,  1.8 ms, 36/100,  4.5 ms, 32/100,  4.5 ms, 25/100,  4.5 ms, 
 3/100,  1.4 ms, 51/100,  5.6 ms, 46/100,  5.6 ms,  0/100,  2.3 ms, 
 5/100,  1.8 ms,  9/100,  2.5 ms, 46/100,  5.8 ms, 40/100,  5.9 ms, 
 9/100,  2.0 ms,  3/100,  1.8 ms,  5/100,  2.3 ms, 83/100,  8.8 ms, 
 7/100,  2.1 ms,  3/100,  2.1 ms,  8/100,  2.9 ms, 82/100,  8.6 ms, 
11/100,  2.3 ms,  4/100,  2.0 ms,  3/100,  2.5 ms, 82/100,  9.3 ms, 
19/100,  2.9 ms,  9/100,  2.6 ms, 10/100,  3.0 ms, 62/100,  7.3 ms, 
18/100,  2.8 ms, 16/100,  2.9 ms,  2/100,  2.5 ms, 64/100,  7.9 ms, 
 2/100,  1.5 ms, 51/100,  5.8 ms, 47/100,  5.9 ms,  0/100,  2.4 ms, 
26/100,  4.0 ms,  8/100,  2.3 ms, 11/100,  2.9 ms, 55/100,  7.6 ms, 
 8/100,  2.1 ms,  3/100,  1.9 ms, 79/100,  9.2 ms, 10/100,  3.8 ms, 
14/100,  2.5 ms,  4/100,  2.1 ms, 45/100,  5.6 ms, 37/100,  5.6 ms, 
11/100,  2.2 ms,  9/100,  2.8 ms,  4/100,  2.4 ms, 76/100,  8.9 ms, 
 3/100,  1.4 ms,  4/100,  2.1 ms,  5/100,  2.4 ms, 88/100,  9.5 ms, 
 3/100,  1.4 ms,  7/100,  2.4 ms, 49/100,  5.9 ms, 41/100,  5.9 ms, 
 3/100,  1.4 ms,  3/100,  1.8 ms, 92/100,  9.3 ms,  2/100,  2.6 ms, 
 9/100,  2.3 ms, 16/100,  3.1 ms, 41/100,  5.5 ms, 34/100,  5.5 ms, 
 5/100,  1.7 ms,  8/100,  2.3 ms, 47/100,  5.8 ms, 40/100,  5.7 ms, 
 6/100,  1.7 ms,  5/100,  2.2 ms,  5/100,  2.4 ms, 84/100,  8.6 ms, 
 7/100,  2.0 ms,  6/100,  2.2 ms, 47/100,  5.9 ms, 40/100,  5.8 ms, 
 5/100,  1.9 ms, 32/100,  5.1 ms,  6/100,  2.7 ms, 57/100,  7.7 ms, 
11/100,  2.4 ms, 14/100,  2.7 ms, 42/100,  5.3 ms, 33/100,  5.3 ms, 
50/100,  6.6 ms, 12/100,  2.8 ms,  0/100,  1.9 ms, 38/100,  6.6 ms, 
55/100,  7.0 ms,  1/100,  1.7 ms,  2/100,  2.2 ms, 42/100,  6.9 ms, 
 8/100,  1.9 ms, 48/100,  6.7 ms,  5/100,  2.4 ms, 39/100,  6.7 ms, 
12/100,  2.3 ms, 18/100,  3.0 ms, 30/100,  5.6 ms, 40/100,  6.8 ms, 
 5/100,  1.7 ms,  3/100,  1.9 ms,  5/100,  2.5 ms, 87/100,  8.8 ms, 
50/100,  6.7 ms, 13/100,  2.7 ms,  0/100,  2.0 ms, 37/100,  6.7 ms, 

Why do I get such a poor division of labor? Why sometimes a process last for short times and do nothing and others do almost 90% of the job?

What should I do to improve this?

edit: extra info - I run this in a i7 CPU.


The task it does in parallel is rather small compared to process start overhead, so that one process can do many iterations before other processes get a chance to start running.

If the line:

x = sum(normal(size = 1000))

is changed to:

x = sum(normal(size = 100000))

This is what I get:

25/100, 366.2 ms, 25/100, 364.4 ms, 25/100, 364.1 ms, 25/100, 361.7 ms, 
34/100, 362.1 ms,  0/100,  2.6 ms, 25/100, 357.6 ms, 41/100, 357.8 ms, 
25/100, 365.2 ms, 26/100, 362.7 ms, 25/100, 366.0 ms, 24/100, 355.7 ms, 
21/100, 360.3 ms, 21/100, 355.9 ms, 22/100, 362.3 ms, 36/100, 358.0 ms, 


As Maxim mentionned, your tasks are too small for the OS to swap each process in turn and adjust allocated CPU time for each process. The longer the task, the more each process will have access to the processor.

You might also check the real CPU time that has been allocated to current process by using os.times()

It allows you to know how many real CPU time has been allocated to each process. Of course the process can't do any work if it is not allocated any CPU time :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜