Functional assignment in numpy

2023-03-08 11:05 问答作者：

Suppose I have two arrays

A = [ 6, 4, 5, 7, 9 ]
ind = [ 0, 0, 2, 1, 2 ]

and a function f.

I want to build a new array B of size the numbe开发者_如何学运维r of distinct elements in ind with B[i] the result of f with parameter the subarray of A indexed by i.

For this example, if I take f = sum, then

B = [10, 7, 14]

or f = max

B = [6, 7, 9]

Is there a more efficient way than a for loop in numpy ?

Thanks

For the special case of f = sum:

In [32]: np.bincount(ind,A)
Out[32]: array([ 10.,   7.,  14.])

Assuming:

f is a ufunc
You have enough memory to make a 2D array of shape len(A) x len(A)

You could make a 2D array B:

B=np.zeros((len(A),max(ind)+1))

and fill in various locations in B with values from A, such that the first column of B only gets values from A when ind == 0, and the second column of B only gets values from A when ind == 1, etc:

B[zip(*enumerate(ind))]=A

you'd end up with an array like

[[ 6.  0.  0.]
 [ 4.  0.  0.]
 [ 0.  0.  5.]
 [ 0.  7.  0.]
 [ 0.  0.  9.]]

You could then apply f along axis=0 to obtain your desired result. There is a third assumption used here:

The extra zeros in B do not affect the desired result.

If you can stomach these assumptions then:

import numpy as np

A = np.array([ 6, 4, 5, 7, 9 ])
ind = np.array([ 0, 0, 2, 1, 2 ])

N=100
M=10
A2 = np.array([np.random.randint(M) for i in range(N)])
ind2 = np.array([np.random.randint(M) for i in range(N)])

def use_extra_axis(A,ind,f):
    B=np.zeros((len(A),max(ind)+1))
    B[zip(*enumerate(ind))]=A
    return f(B)

def use_loop(A,ind,f):
    n=max(ind)+1
    B=np.empty(n)
    for i in range(n):
        B[i]=f(A[ind==i])
    return B

def fmax(arr):
    return np.max(arr,axis=0)

if __name__=='__main__':
    print(use_extra_axis(A,ind,fmax))
    print(use_loop(A,ind,fmax))

For certain values of M and N (e.g. M=10, N=100), using an extra axis may be faster than using a loop:

% python -mtimeit -s'import test,numpy' 'test.use_extra_axis(test.A2,test.ind2,test.fmax)'
10000 loops, best of 3: 162 usec per loop

% python -mtimeit -s'import test,numpy' 'test.use_loop(test.A2,test.ind2,test.fmax)'
1000 loops, best of 3: 222 usec per loop

However, as N grows larger (say M=10, N=10000), using a loop may be faster:

% python -mtimeit -s'import test,numpy' 'test.use_extra_axis(test.A2,test.ind2,test.fmax)'
100 loops, best of 3: 13.9 msec per loop
% python -mtimeit -s'import test,numpy' 'test.use_loop(test.A2,test.ind2,test.fmax)'
100 loops, best of 3: 4.4 msec per loop

Incorporating thouis's excellent idea of using a sparse matrix:

def use_sparse_extra_axis(A,ind,f):
    B=scipy.sparse.coo_matrix((A, (range(len(A)), ind))).toarray()
    return f(B)

def use_sparse(A,ind,f):
    return [f(v) for v in scipy.sparse.coo_matrix((A, (ind, range(len(A))))).tolil().data]

Which implementation is best depends on the parameters N and M:

N=1000, M=100
·───────────────────────·────────────────────·
│ use_sparse_extra_axis │ 1.15 msec per loop │
│        use_extra_axis │ 2.79 msec per loop │
│              use_loop │ 3.47 msec per loop │
│            use_sparse │ 5.25 msec per loop │
·───────────────────────·────────────────────·

N=100000, M=10
·───────────────────────·────────────────────·
│ use_sparse_extra_axis │ 35.6 msec per loop │
│              use_loop │ 43.3 msec per loop │
│            use_sparse │ 91.5 msec per loop │
│        use_extra_axis │  150 msec per loop │
·───────────────────────·────────────────────·

N=100000, M=50
·───────────────────────·────────────────────·
│            use_sparse │ 94.1 msec per loop │
│              use_loop │  107 msec per loop │
│ use_sparse_extra_axis │  170 msec per loop │
│        use_extra_axis │  272 msec per loop │
·───────────────────────·────────────────────·

N=10000, M=50
·───────────────────────·────────────────────·
│              use_loop │ 10.9 msec per loop │
│            use_sparse │ 11.7 msec per loop │
│ use_sparse_extra_axis │ 15.1 msec per loop │
│        use_extra_axis │ 25.4 msec per loop │
·───────────────────────·────────────────────·

I don't think you can get away from the loop, but perhaps making use of scipy's sparse matrices will be more efficient.

[f(v) for v in scipy.sparse.coo_matrix((A, (ind, range(len(A))))).tolil().data]

Another possibility,

from operator import itemgetter
from itertools import groupby

A = [ 6, 4, 5, 7, 9 ]
ind = [ 0, 0, 2, 1, 2 ]

z = zip(ind,A)
z.sort()

fst,snd = itemgetter(0), itemgetter(1)
g = groupby(z,fst)

f = sum
# or 
# f = max

for i in g:
    print i[0],f(snd(j) for j in i[1])

At least for adding, this works

import numpy as np

def op_at(f, ind, vals):
    base = np.zeros(np.max(ind)+1)
    f.at(base, ind, vals)
    return base

print op_at(np.add, [ 0, 0, 2, 1, 2], [ 6, 4, 5, 7, 9])

> [ 10.   7.  14.]

Unfortunately it doesn't appear to work for max.

继续阅读：numpy python

Functional assignment in numpy

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？