numpy: column-wise dot product
Given a 2D numpy
array, I need to compute the dot product of every column with itself, and store the result in a 1D array. The following works:
In [45]: A = np.array([[1,2,3,4],[5,6,7,8]])
In [46]: np.array([np.dot(A[:,i], A[:,i]) for i in xrange(A.shape[1])])
Out[46]: array([26, 40, 58, 80])
Is there a simple way to avoid the Python loop? The above is hardly the end of 开发者_JS百科the world, but if there's a numpy
primitive for this, I'd like to use it.
edit In practice the matrix has many rows and relatively few columns. I am therefore not overly keen on creating temporary arrays larger than O(A.shape[1])
. I also can't modify A
in place.
How about:
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> (A*A).sum(axis=0)
array([26, 40, 58, 80])
EDIT: Hmm, okay, you don't want intermediate large objects. Maybe:
>>> from numpy.core.umath_tests import inner1d
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> inner1d(A.T, A.T)
array([26, 40, 58, 80])
which seems a little faster anyway. This should do what you want behind the scenes, as A.T is a view (which doesn't make its own copy, IIUC), and inner1d seems to loop the way it needs to.
VERY BELATED UPDATE: Another alternative would be to use np.einsum
:
>>> A = np.array([[1,2,3,4],[5,6,7,8]])
>>> np.einsum('ij,ij->j', A, A)
array([26, 40, 58, 80])
>>> timeit np.einsum('ij,ij->j', A, A)
100000 loops, best of 3: 3.65 us per loop
>>> timeit inner1d(A.T, A.T)
100000 loops, best of 3: 5.02 us per loop
>>> A = np.random.randint(0, 100, (2, 100000))
>>> timeit np.einsum('ij,ij->j', A, A)
1000 loops, best of 3: 363 us per loop
>>> timeit inner1d(A.T, A.T)
1000 loops, best of 3: 848 us per loop
>>> (np.einsum('ij,ij->j', A, A) == inner1d(A.T, A.T)).all()
True
You can compute the square of all elements and sum up column-wise using
np.sum(np.square(A),0);
(I'm not entirely sure about the second parameter of the sum
function, which identifies the axis along which to take the sum, and I have no numpy currently installed. Maybe you'll have to experiment :) ...)
EDIT
Looking at DSM's post, it seems that you should use axis=0
. Using the square
function might be a little more performant than using A*A
.
From linear algebra, the dot product of row i with row j is the i,j th entry of AA^T. Similarly, the dot product of column i with column j is the i,jth entry of (A^T)A.
So if you want the dot product of each column vector of A with itself, you could use ColDot = np.dot(np.transpose(A), A).diagonal()
. On the other hand, if you want the dot product of each row with itself, you could use RowDot = np.dot(A, np.transpose(A)).diagonal()
.
Both lines return an array.
精彩评论