Most efficient way to pull specified rows from a 2-d array?
I have a 2-D numpy array with 100,000+ rows. I need to return a subset of those rows (and I need to perform that operations many 1,000s of times, so efficiency is important).
A mock-up example is like this:
import numpy as np
a = np.array([[1,5.5],
[2,4.5],
[3,9.0],
[4,8.01]])
b = np.array([2,4])
So...I wan开发者_JS百科t to return the array from a with rows identified in the first column by b:
c=[[2,4.5],
[4,8.01]]
The difference, of course, is that there are many more rows in both a and b, so I'd like to avoid looping. Also, I played with making a dictionary and using np.nonzero but still am a bit stumped.
Thanks in advance for any ideas!
EDIT: Note that, in this case, b are identifiers rather than indices. Here's a revised example:
import numpy as np
a = np.array([[102,5.5],
[204,4.5],
[343,9.0],
[40,8.01]])
b = np.array([102,343])
And I want to return:
c = [[102,5.5],
[343,9.0]]
EDIT: Deleted my original answer since it was a misunderstanding of the question. Instead try:
ii = np.where((a[:,0] - b.reshape(-1,1)) == 0)[1]
c = a[ii,:]
What I'm doing is using broadcasting to subtract each element of b
from a
, and then searching for zeros in that array which indicate a match. This should work, but you should be a little careful with comparison of floats, especially if b is not an array of ints.
EDIT 2 Thanks to Sven's suggestion, you can try this slightly modified version instead:
ii = np.where(a[:,0] == b.reshape(-1,1))[1]
c = a[ii,:]
It's a bit faster than my original implementation.
EDIT 3 The fastest solution by far (~10x faster than Sven's second solution for large arrays) is:
c = a[np.searchsorted(a[:,0],b),:]
Assuming that a[:,0]
is sorted and all values of b
appear in a[:,0]
.
A slightly more concise way to do this is
c = a[(a[:,0] == b[:,None]).any(0)]
The usual caveats for floating point comparisons apply.
Edit: If b
is not too small, the following slightly quirky solution performs better:
b.sort()
c = a[b[np.searchsorted(b, a[:, 0]) - len(b)] == a[:,0]]
精彩评论