开发者

Most efficient way to pull specified rows from a 2-d array?

I have a 2-D numpy array with 100,000+ rows. I need to return a subset of those rows (and I need to perform that operations many 1,000s of times, so efficiency is important).

A mock-up example is like this:

import numpy as np
a = np.array([[1,5.5],
             [2,4.5],
             [3,9.0],
             [4,8.01]])
b = np.array([2,4])

So...I wan开发者_JS百科t to return the array from a with rows identified in the first column by b:

c=[[2,4.5],
   [4,8.01]]

The difference, of course, is that there are many more rows in both a and b, so I'd like to avoid looping. Also, I played with making a dictionary and using np.nonzero but still am a bit stumped.

Thanks in advance for any ideas!

EDIT: Note that, in this case, b are identifiers rather than indices. Here's a revised example:

import numpy as np
a = np.array([[102,5.5],
             [204,4.5],
             [343,9.0],
             [40,8.01]])
b = np.array([102,343])

And I want to return:

c = [[102,5.5],
     [343,9.0]]


EDIT: Deleted my original answer since it was a misunderstanding of the question. Instead try:

ii = np.where((a[:,0] - b.reshape(-1,1)) == 0)[1]
c = a[ii,:]

What I'm doing is using broadcasting to subtract each element of b from a, and then searching for zeros in that array which indicate a match. This should work, but you should be a little careful with comparison of floats, especially if b is not an array of ints.

EDIT 2 Thanks to Sven's suggestion, you can try this slightly modified version instead:

ii = np.where(a[:,0] == b.reshape(-1,1))[1]
c = a[ii,:]

It's a bit faster than my original implementation.

EDIT 3 The fastest solution by far (~10x faster than Sven's second solution for large arrays) is:

c = a[np.searchsorted(a[:,0],b),:]

Assuming that a[:,0] is sorted and all values of b appear in a[:,0].


A slightly more concise way to do this is

c = a[(a[:,0] == b[:,None]).any(0)]

The usual caveats for floating point comparisons apply.

Edit: If b is not too small, the following slightly quirky solution performs better:

b.sort()
c = a[b[np.searchsorted(b, a[:, 0]) - len(b)] == a[:,0]]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜