Numpy table - advanced multiple criteria selection

2023-02-18 21:44 问答作者：

I have a table that goes something like this:

IDs    Timestamp     Values

124    300.6          1.23
124    350.1         -2.4
309    300.6          10.3
12     123.4          9.00
18     350.1          2.11
309    350.1          8.3

       ...

and I'd like to select all the rows that belong to a group of IDs. I know that I can do something like

table[table.IDs == 124]

to select all of one ID's row, and I could do

table[(table.IDs == 124) | (table.IDs == 309)]

to get two IDs' rows. But imagine I have ~100,000 rows with over 1,000 unique IDs (which are distinct from row indices), and I want to select all the rows that match a set of 10 IDs. Intuitively I'd like to do this:

# id_list: a list of 10 IDs
table[ table.IDs in id_list ]

but Python rejects this syntax. The only way I can think of is to do the following:

table[ (table.IDs == id_list[0]) |
       (table.ID开发者_高级运维s == id_list[1]) |
       (table.IDs == id_list[2]) |
       (table.IDs == id_list[3]) |
       (table.IDs == id_list[4]) |
       (table.IDs == id_list[5]) |
       (table.IDs == id_list[6]) |
       (table.IDs == id_list[7]) |
       (table.IDs == id_list[8]) |
       (table.IDs == id_list[9]) ]

which seems very inelegant to me - too much code and no flexibility for different lengths of lists. Is there a way around my problem, such as using list comprehensions, or the .any() function? Any help is appreciated.

You can do it like this:

subset = table[np.array([i in id_list for i in table.IDs])]

If you have a more recent version of numpy, you can use the in1d function to make it a bit more compact:

subset = table[np.in1d(table.IDs, id_list)]

See also this question: numpy recarray indexing based on intersection with external array

Here's a solution that will probably profile faster than any python for loop. However, I don't think it will do better than in1d. Use it only if you can afford a temporary 2D integer array of ids.size by table.IDs.size. Here, ids is the numpy array of id_list.

result = table[~np.all(table.IDs[None]-ids[None].T, 0)]

继续阅读：numpy python selection

Numpy table - advanced multiple criteria selection

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？