python prime crunching: processing pool is slower?

2023-03-30 21:13 问答作者：

So I've been messing around with python's multiprocessing lib for the last few days and I really like the processing pool. It's easy to implement and I can visualize a lot of uses. I've done a couple of projects I've heard about before to familiarize myself with it and recently finished a program that brute forces games of hangman.

Anywho, I was doing an execution time compairison of summing all the prime numbers between 1 million and 2 million both single threaded and through a processing pool. Now, for the hangman cruncher, putting the games in a processing pool improved execution time by about 8 times (i7 with 8 cores), but when grinding out these primes, it actually increased processing time by almost a factor of 4.

Can anyone tell me why this is? Here is the code for anyone interested in looking at or testing it:

#!/user/bin/python.exe
import math
from multiprocessing import Pool

global primes
primes = []

def log(result):
    global primes

    if result:
        primes.append(result[1])

def isPrime( n ):
    if n < 2:
        return False
    if n == 2:
        return True, n

    max = int(m开发者_开发知识库ath.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True, n


def main():

   global primes

   #pool = Pool()

   for i in range(1000000, 2000000):
       #pool.apply_async(isPrime,(i,), callback = log)
       temp = isPrime(i)
       log(temp)

   #pool.close()
   #pool.join()

   print sum(primes)

   return

if __name__ == "__main__":
    main()

It'll currently run in a single thread, to run through the processing pool, uncomment the pool statements and comment out the other lines in the main for loop.

the most efficient way to use multiprocessing is to divide the work into n equal sized chunks, with n the size of the pool, which should be approximately the number of cores on your system. The reason for this is that the work of starting subprocesses and communicating between them is quite large. If the size of the work is small compared to the number of work chunks, then the overhead of IPC becomes significant.

In your case, you're asking multiprocessing to process each prime individually. A better way to deal with the problem is to pass each worker a range of values, (probably just a start and end value) and have it return all of the primes in that range it found.

In the case of identifying large-ish primes, the work done grows with the starting value, and so You probably don't want to divide the total range into exactly n chunks, but rather n*k equal chunks, with k some reasonable, small number, say 10 - 100. that way, when some workers finish before others, there's more work left to do and it can be balanced efficiently across all workers.

Edit: Here's an improved example to show what that solution might look like. I've changed as little as possible so you can compare apples to apples.

#!/user/bin/python.exe
import math
from multiprocessing import Pool

global primes
primes = set()

def log(result):
    global primes

    if result:
        # since the result is a batch of primes, we have to use 
        # update instead of add (or for a list, extend instead of append)
        primes.update(result)

def isPrime( n ):
    if n < 2:
        return False
    if n == 2:
        return True, n

    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True, n

def isPrimeWorker(start, stop):
    """
    find a batch of primes
    """
    primes = set()
    for i in xrange(start, stop):
        if isPrime(i):
            primes.add(i)

    return primes



def main():

    global primes

    pool = Pool()

    # pick an arbitrary chunk size, this will give us 100 different 
    # chunks, but another value might be optimal
    step = 10000

    # use xrange instead of range, we don't actually need a list, just
    # the values in that range.
    for i in xrange(1000000, 2000000, step):
        # call the *worker* function with start and stop values.
        pool.apply_async(isPrimeWorker,(i, i+step,), callback = log)

    pool.close()
    pool.join()

    print sum(primes)

    return

if __name__ == "__main__":
    main()

继续阅读：multiprocessing pool python

python prime crunching: processing pool is slower?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？