开发者

averaging matrix efficiently

in Python, given an n x p matrix, e.g. 4 x 4, how can I return a matrix that's 4 x 2 that simply averages the first two columns and the last two columns for all 4 rows of the matrix?

e.g. given:

a = array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])

return a matrix that has the average of a[:, 0] and a[:, 1] and the average of a[:, 2] and a[:, 3]. I want this to work for an arbitrary matrix of n x p assuming that the number of columns I am averaging of n is obviously evenly divisible by n.

let me clarify: for each row, I want to take the average of the first two columns, then the average of the last two columns. So it would be:

1 + 2 / 2, 3 + 4 / 2 <- row 1 of new matrix 5 + 6 / 2, 7 + 8 / 2 <- row 2 of new matrix, etc.

which should yield a 4 by 2 matrix rather than 4 x 4.

th开发者_如何学编程anks.


How about using some math? You can define a matrix M = [[0.5,0],[0.5,0],[0,0.5],[0,0.5]] so that A*M is what you want.

from numpy import array, matrix

A = array([[1, 2, 3, 4], 
           [5, 6, 7, 8], 
           [9, 10, 11, 12], 
           [13, 14, 15, 16]])
M = matrix([[0.5,0],
            [0.5,0],
            [0,0.5],
            [0,0.5]])
print A*M

Generating M is pretty simple too, entries are 1/n or zero.


reshape - get mean - reshape

>>> a.reshape(-1, a.shape[1]//2).mean(1).reshape(a.shape[0],-1)
array([[  1.5,   3.5],
       [  5.5,   7.5],
       [  9.5,  11.5],
       [ 13.5,  15.5]])

is supposed to work for any array size, and reshape doesn't make a copy.


It's a bit unclear what should happen for matrices with n > 4, but this code will do what you want:

a = N.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=float)
avg = N.vstack((N.average(a[:,0:2], axis=1), N.average(a[:,2:4], axis=1))).T

This yields avg =

array([[  1.5,   3.5],
       [  5.5,   7.5],
       [  9.5,  11.5],
       [ 13.5,  15.5]])


Here's a way to do it. You only need to change groupsize to make it work with other sizes like you said, though I'm not fully sure what you want.

groupsize = 2
out = np.hstack([np.mean(x,axis=1,out=np.zeros((a.shape[0],1))) for x in np.hsplit(a,groupsize)])

yields

array([[  1.5,   3.5],
   [  5.5,   7.5],
   [  9.5,  11.5],
   [ 13.5,  15.5]])

for out. Hopefully it gives you some ideas on how to do exactly what it is that you want to do. You can make groupsize dependent on the dimensions of a for instance.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜