开发者

Summing with a for loop faster than with reduce?

I wanted to see how much faster reduce was than using a for loop for simple numerical operations. Here's what I found (using the standard timeit library):

In [54]: print(setup)
from operator import add, iadd
r = range(100)

In [55]: print(stmt1)    
c = 0
for i in r:
    c+=i        

In [56]: timeit(stmt1, setup)
Out[56]: 8.948904991149902
In [58]: print(stmt3)    
reduce(add, r)    

In [59]: timeit(stmt3, setup)
Out[59]: 13.316915035247803

Looking a little more:

In [68]: timeit("1+2", setup)
Out[68]: 0.04145693778991699

In [69]: timeit("add(1,2)", setup)
Out[69]: 0.22807812690734863

What's going on here? Obviously, reduce does loop faster than for, but the function call seems to dominate. Shouldn't the reduce version run almost entirely in C? Using iadd(c,i) in the for loop version makes it run in ~24 seconds. Why开发者_运维知识库 would using operator.add be so much slower than +? I was under the impression + and operator.add run the same C code (I checked to make sure operator.add wasn't just calling + in python or anything).

BTW, just using sum runs in ~2.3 seconds.

In [70]: print(sys.version)
2.7.1 (r271:86882M, Nov 30 2010, 09:39:13) 
[GCC 4.0.1 (Apple Inc. build 5494)]


The reduce(add, r) must invoke the add() function 100 times, so the overhead of the function calls adds up -- reduce uses PyEval_CallObject to invoke add on each iteration:

for (;;) {
    ...
    if (result == NULL)
        result = op2;
    else {
        # here it is creating a tuple to pass the previous result and the next
        # value from range(100) into func add():
        PyTuple_SetItem(args, 0, result);
        PyTuple_SetItem(args, 1, op2);
        if ((result = PyEval_CallObject(func, args)) == NULL)
            goto Fail;
    }

Updated: Response to question in comments.

When you type 1 + 2 in Python source code, the bytecode compiler performs the addition in place and replaces that expression with 3:

f1 = lambda: 1 + 2
c1 = byteplay.Code.from_code(f1.func_code)
print c1.code

1           1 LOAD_CONST           3
            2 RETURN_VALUE         

If you add two variables a + b the compiler will generate bytecode which loads the two variables and performs a BINARY_ADD, which is far faster than calling a function to perform the addition:

f2 = lambda a, b: a + b
c2 = byteplay.Code.from_code(f2.func_code)
print c2.code

1           1 LOAD_FAST            a
            2 LOAD_FAST            b
            3 BINARY_ADD           
            4 RETURN_VALUE         


It could be the overhead of copying args and return values (i.e. "add(1, 2)"), opposed to simply operating on numeric literals


edit: Switching out zeroes instead of array multiply closes the gap big time.

from functools import reduce
from numpy import array, arange, zeros
from time import time

def add(x, y):
    return x + y

def sum_columns(x):
    if x.any():
        width = len(x[0])
        total = zeros(width)
    for row in x:
        total += array(row)
    return total

l = arange(3000000)
l = array([l, l, l])

start = time()
print(reduce(add, l))
print('Reduce took {}'.format(time() - start))

start = time()
print(sum_columns(l))
print('For loop took took {}'.format(time() - start))

Gets you down almost no difference at all.

Reduce took 0.03230619430541992 For loop took took 0.058577775955200195

old: If reduce is used for adding together NumPy arrays by index, it can be faster than a for loop.

from functools import reduce
from numpy import array, arange
from time import time

def add(x, y):
    return x + y

def sum_columns(x):
    if x.any():
        width = len(x[0])
        total = array([0] * width)
    for row in x:
        total += array(row)
    return total

l = arange(3000000)
l = array([l, l, l])

start = time()
print(reduce(add, l))
print('Reduce took {}'.format(time() - start))

start = time()
print(sum_columns(l))
print('For loop took took {}'.format(time() - start))

With the result of

[      0       3       6 ..., 8999991 8999994 8999997]
Reduce took 0.024930953979492188
[      0       3       6 ..., 8999991 8999994 8999997]
For loop took took 0.3731539249420166
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜