开发者

Which search data structure works best for sorted integer data?

I have a sorted integers of over a billion, which data structure do you think can exploited the sorted behavior? Main goal is to search items faster...

Options I can think of --

1) regular Binary Search trees with recursively splitting in the middle approach.

2) Any other balanced Binary search trees should work well, but does not exploit the sorted heuristics..

Thanks in advance..

[Edit]

Insertions and deletions are very rare...

Also, apart from integers I have to store some other information in the nodes, I think plain arrays cant do that unle开发者_StackOverflow社区ss it is a list right?


This really depends on what operations you want to do on the data.

If you are just searching the data and never inserting or deleting anything, just storing the data in a giant sorted array may be perfectly fine. You could then use binary search to look up elements efficiently in O(log n) time. However, insertions and deletions can be expensive since with a billion integers O(n) will hurt. You could store auxiliary information inside the array itself, if you'd like, by just placing it next to each of the integers.

However, with a billion integers, this may be too memory-intensive and you may want to switch to using a bit vector. You could then do a binary search over the bitvector in time O(log U), where U is the number of bits. With a billion integers, I assume that U and n would be close, so this isn't that much of a penalty. Depending on the machine word size, this could save you anywhere from 32x to 128x memory without causing too much of a performance hit. Plus, this will increase the locality of the binary searches and can improve performance as well. this does make it much slower to actually iterate over the numbers in the list, but it makes insertions and deletions take O(1) time. In order to do this, you'd need to store some secondary structure (perhaps a hash table?) containing the data associated with each of the integers. This isn't too bad, since you can use this sorted bit vector for sorted queries and the unsorted hash table once you've found what you're looking for.

If you also need to add and remove values from the list, a balanced BST can be a good option. However, because you specifically know that you're storing integers, you may want to look at the more complex van Emde Boas tree structure, which supports insertion, deletion, predecessor, successor, find-max, and find-min all in O(log log n) time, which is exponentially faster than binary search trees. The implementation cost of this approach is high, though, since the data structure is notoriously tricky to get right.

Another data structure you might want to explore is a bitwise trie, which has the same time bounds as the sorted bit vector but allows you to store auxiliary data along with each integer. Plus, it's super easy to implement!

Hope this helps!


The best data structure for searching sorted integers is an array.

You can search it with log(N) operations, and it is more compact (less memory overhead) than a tree.

And you don't even have to write any code (so less chance of a bug) -- just use bsearch from your standard library.


With a sorted array the best you can archieve is with an interpolation search, that gives you log(log(n)) average time. It is essentially a binary search but don't divide the array in 2 sub arrays of the same size. It's really fast and extraordinary easy to implement.

http://en.wikipedia.org/wiki/Interpolation_search

Don't let the worst case O(n) bound scares you, because with 1 billion integers it's pratically impossible to obtain.


O(1) solutions:

  • Assuming 32-bit integers and a lot of ram:

A lookup table with size 2³² roughly (4 billion elements), where each index corresponds to the number of integers with that value.

  • Assuming larger integers:

A really big hash table. The usual modulus hash function would be appropriate if you have a decent distribution of the values, if not, you might want to combine the 32-bit strategy with a hash lookup.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜