开发者

Multiprocessing a function with several inputs

In Python the multiprocessing module can be used to run a function over a range of values in parallel. For example, this produces a list of the first 10开发者_JS百科0000 evaluations of f.

def f(i):
    return i * i

def main():
    import multiprocessing
    pool = multiprocessing.Pool(2)
    ans = pool.map(f, range(100000))

    return ans

Can a similar thing be done when f takes multiple inputs but only one variable is varied? For example, how would you parallelize this:

def f(i, n):
    return i * i + 2*n

def main():
    ans = []
    for i in range(100000):
        ans.append(f(i, 20))

    return ans


You can use functools.partial()

def f(i, n):
    return i * i + 2*n

def main():
    import multiprocessing
    pool = multiprocessing.Pool(2)
    ans = pool.map(functools.partial(f, n=20), range(100000))

    return ans


There are several ways to do this. In the example given in the question, you could just define a wrapper function

def g(i):
    return f(i, 20)

and pass this wrapper to map(). A more general approach is to have a wrapper that takes a single tuple argument and unpacks the tuple to multiple arguments

def g(tup):
    return f(*tup)

or use a equivalent lambda expression: lambda tup: f(*tup).


If you use my fork of multiprocessing, called pathos, you can get pools that take multiple arguments… and also take lambda functions. The nice thing about it is that you don't have to alter your programming constructs to fit working in parallel.

>>> def f(i, n):
...   return i * i + 2*n
... 
>>> from itertools import repeat
>>> N = 10000
>>>
>>> from pathos.pools import ProcessPool as Pool
>>> pool = Pool()
>>>
>>> ans = pool.map(f, xrange(1000), repeat(20))
>>> ans[:10]
[40, 41, 44, 49, 56, 65, 76, 89, 104, 121]
>>>
>>> # this also works
>>> ans = pool.map(lambda x: f(x, 20), xrange(1000))
>>> ans[:10]
[40, 41, 44, 49, 56, 65, 76, 89, 104, 121]


This technique is know as Currying: https://en.wikipedia.org/wiki/Currying

Another way to do it without using functools.partial using the classical map command inside pool.map:

def f(args):
   x, fixed = args
   # FUNCTIONALITY HERE

pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)
pool.map(f, map(lambda x: (x, fixed), arguments))


You can use poor man's currying (aka wrap it):

new_f = lambda x: f(x, 20)

then call new_f(i).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜