Mapping function
I have a set of 128bit number and the size of set < 2^32 ...so theoretically I can have a mapping function that maps all the 128bit numbers to 32 bit number ....how can开发者_运维技巧 I construct the mapping function ???
Seems like you are looking for a minimal perfect hash which maps n keys to n consecutive integers.
The wiki page link in the above sentence mentions two libraries which implement this.
Also see this for more detail: http://burtleburtle.net/bob/hash/perfect.html
Without knowing the nature of the input data, it's impossible to give the optimal hashing algorithm. But if the input is evenly distributed then you could use the lower 32 bits of the input. This means the possibility of collisions, so you have to deal with that.
The generic construction is to keep all your 128-bit values in a big array, sorted in ascending order. Then, each value is "mapped" to its index in the array. To "compute" the map, you do a binary search in the array, to get the precise index of the value in the array. With 232 values, the array has size 64 GB, and the binary search entails 35-or-so lookups in the array.
In all generality you cannot do really better than that. However, if your 128-bit values have a reasonably uniform spread (it depends from where they come), then the big array structure can be compressed by a large margin, especially if you can guarantee that all inputs to your map will always be part of the set of 128-bit values; my bet is that you can trim it down to a couple of gigabytes -- but the lookup will be more expensive.
For a more practical solution, you will have to work with the structure of your 128-bit values: where they come from, what they represent...
Set a position of your number as division of it's value on 2^32.
精彩评论