Fastest way to check if a number is in a list of numbers

2023-02-15 15:31 问答作者：

I need to check whe开发者_运维知识库ther an ID (a long integer) is in a list of ~10,000 IDs. I need to do this about 10^9 times over on a loop, and speed is relatively important. Is using a c++ set the quickest way to do this? Something like:

set<long> myset;

// (Populate myset)

long id = 123456789;

if(myset.find(id) != myset.end()) {
     // id is in set
}

Or is there a quicker way?

The quickest way, if your long has a limited range, is a bitmap (e.g. vector<bool>). If that's not doable, a unordered_set (hash_set) should be attempted. And if that keeps hitting worst-case, then use a set

Hm, depending on how you generate the numbers and how many there are, it might be faster to use an std::vector ,sort it (you can even keep it sorted while inserting the numbers), and the use binary search to check if the number is in there.

Generally, a set works fine, but there are tradeoffs. The vector has less memory overhead, and since all numbers are stored in a continuous block of memory, it might outperform a set in some situations, but you would have to test that.

You can build a hash table and check in O(1) if the ID exist.

The standard, for best intentions, decided that vector<bool> should be specialized to be implemented as a bitset.

A bit-set is fast enough, and you have the choice also of std::bitset which is fixed size, and boost::dynamic_bitset of which the size is runtime defined, and is built on top of vector<unsigned int> anyway (It may be a template on what integral type is used).

There is no point optimising further to save you some bit-twiddling so the suggestion would be to use one of these.

By the way, I have seen a "scattered" bitmap, whereby if the value falls within a certain range it uses one, otherwise it will use a tree-style lookup. (This technique can also be used for Z-table (normal distribution CDF type) functions where you "cache" the table in memory for up to 95% or 99% of the density and use the slow-calculation for the extreme values (and I once actually had to do that).

If you really want to push it to the top, you also have the option to use a two stage approach.

Use a bloom filter or similar probabilistic algorithm to find out if the value is definitively NOT in the set or "maybe" in the set.
To see if step 1 produced a false positive, you then need to execute your more costly second stage only to those not filtered out with step 1. Over your many (you mentioned 10^9) queries, you should be done more quickly (if not too many queries are a hit...).

See http://en.wikipedia.org/wiki/Bloom_filter for details. Also see: Search algorithm with minimum time complexity

Fastest way to check if a number is in a list of numbers

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？