Most efficient way to pull specified rows from a 2-d array?

2023-02-21 10:58 问答作者：

I have a 2-D numpy array with 100,000+ rows. I need to return a subset of those rows (and I need to perform that operations many 1,000s of times, so efficiency is important).

A mock-up example is like this:

import numpy as np
a = np.array([[1,5.5],
             [2,4.5],
             [3,9.0],
             [4,8.01]])
b = np.array([2,4])

So...I wan开发者_JS百科t to return the array from a with rows identified in the first column by b:

c=[[2,4.5],
   [4,8.01]]

The difference, of course, is that there are many more rows in both a and b, so I'd like to avoid looping. Also, I played with making a dictionary and using np.nonzero but still am a bit stumped.

Thanks in advance for any ideas!

EDIT: Note that, in this case, b are identifiers rather than indices. Here's a revised example:

import numpy as np
a = np.array([[102,5.5],
             [204,4.5],
             [343,9.0],
             [40,8.01]])
b = np.array([102,343])

And I want to return:

c = [[102,5.5],
     [343,9.0]]

EDIT: Deleted my original answer since it was a misunderstanding of the question. Instead try:

ii = np.where((a[:,0] - b.reshape(-1,1)) == 0)[1]
c = a[ii,:]

What I'm doing is using broadcasting to subtract each element of b from a, and then searching for zeros in that array which indicate a match. This should work, but you should be a little careful with comparison of floats, especially if b is not an array of ints.

EDIT 2 Thanks to Sven's suggestion, you can try this slightly modified version instead:

ii = np.where(a[:,0] == b.reshape(-1,1))[1]
c = a[ii,:]

It's a bit faster than my original implementation.

EDIT 3 The fastest solution by far (~10x faster than Sven's second solution for large arrays) is:

c = a[np.searchsorted(a[:,0],b),:]

Assuming that a[:,0] is sorted and all values of b appear in a[:,0].

A slightly more concise way to do this is

c = a[(a[:,0] == b[:,None]).any(0)]

The usual caveats for floating point comparisons apply.

Edit: If b is not too small, the following slightly quirky solution performs better:

b.sort()
c = a[b[np.searchsorted(b, a[:, 0]) - len(b)] == a[:,0]]

继续阅读：arrays mask numpy python

Most efficient way to pull specified rows from a 2-d array?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？