Multiprocessing a function with several inputs
In Python the multiprocessing
module can be used to run a function over a range of values in parallel. For example, this produces a list of the first 10开发者_JS百科0000 evaluations of f.
def f(i):
return i * i
def main():
import multiprocessing
pool = multiprocessing.Pool(2)
ans = pool.map(f, range(100000))
return ans
Can a similar thing be done when f takes multiple inputs but only one variable is varied? For example, how would you parallelize this:
def f(i, n):
return i * i + 2*n
def main():
ans = []
for i in range(100000):
ans.append(f(i, 20))
return ans
You can use functools.partial()
def f(i, n):
return i * i + 2*n
def main():
import multiprocessing
pool = multiprocessing.Pool(2)
ans = pool.map(functools.partial(f, n=20), range(100000))
return ans
There are several ways to do this. In the example given in the question, you could just define a wrapper function
def g(i):
return f(i, 20)
and pass this wrapper to map()
. A more general approach is to have a wrapper that takes a single tuple argument and unpacks the tuple to multiple arguments
def g(tup):
return f(*tup)
or use a equivalent lambda expression: lambda tup: f(*tup)
.
If you use my fork of multiprocessing
, called pathos
, you can get pools that take multiple arguments… and also take lambda
functions. The nice thing about it is that you don't have to alter your programming constructs to fit working in parallel.
>>> def f(i, n):
... return i * i + 2*n
...
>>> from itertools import repeat
>>> N = 10000
>>>
>>> from pathos.pools import ProcessPool as Pool
>>> pool = Pool()
>>>
>>> ans = pool.map(f, xrange(1000), repeat(20))
>>> ans[:10]
[40, 41, 44, 49, 56, 65, 76, 89, 104, 121]
>>>
>>> # this also works
>>> ans = pool.map(lambda x: f(x, 20), xrange(1000))
>>> ans[:10]
[40, 41, 44, 49, 56, 65, 76, 89, 104, 121]
This technique is know as Currying: https://en.wikipedia.org/wiki/Currying
Another way to do it without using functools.partial
using the classical map
command inside pool.map
:
def f(args):
x, fixed = args
# FUNCTIONALITY HERE
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)
pool.map(f, map(lambda x: (x, fixed), arguments))
You can use poor man's currying (aka wrap it):
new_f = lambda x: f(x, 20)
then call new_f(i)
.
精彩评论