fastest integer sort implementation for 200-300 bit integers?
What is the fastest integer sort implementation for 200-300 bit sized integers? Exact int size is fixed; I have up to 2 gigabytes with such integers (all in RAM).
I 开发者_Go百科hear that it is possible to sort such set in average at O(n log log M) or even at O(n sqrt(log log M)) time, wher n is number of integers and M is the largest integer. Memory usage is limited (I may use up to 0.5-1 GB addtionally). Sorting can be done in-place; in can be unstable (reorder dups).
Is there C/C++ implementation of such sort method, e.g. of Han & Thorup (2002)?
A Radix Sort can be used to sort data with fixed size keys. As this condition is not often met the technique isn't discussed much, but it can be O(n) when the key size is factored out.
If memory usage is truly limited. I would separate each byte and store them into a trie data structure from most significant to least significant byte. If you insert the bytes in sorted order you can then iterate the trie and have all your data sorted.
Signature sort is good with large word sizes with 'O (n lg lg n)' expetcted time complexity, but with small word sizes you can get the same complexity with von Emde Boas sort. Also recently even faster sorting algorithm was published from Han and Thorup with 'O (n sqrt(lg lg n))' expected time complexity. I'm not sure if u can find implementations of these algorithms online, but there are probably some great articles and lectures on MIT and Harvard.
I think the most reasonable thing to do is to create an array of pointers to the bigints, and sort the array of pointers. I would suggest some sort of templated quicksort, with a smart compare function.
The compare function should be able to decide most of the time by looking at the most significant 4 bytes. If they don't match, then the compare is decided. If they do match then you look at the next 4 bytes until end of int.
I am guessing that the data range, is probably large enough, that a radix sort would be impractical. Quick sort is generally faster enough if you data is random, and has cache performance that beats most non-radix sorts.
精彩评论