Hashing method that allows increasing number of buckets without messing up the previous data mapping
Is there an algorithm/method that lets me increase the number of buckets without rebuilding the data/ re-hashing.
The problem in practice: Say you have a bunch of users that are identified by a string, "username". Then you hash these "usernames" to a list of buckets.
This is done by something like:
String username = "user";
int index = username.hash();
int bucketIndex = index % bucketlist.size();
So in this scheme if I one would want to increase the number of "buckets", one would also need to m开发者_Python百科ove the data in the buckets. So that it matches the new bucket index that one gets with doing modulo with a different number.
This is really just a mapping. Where to find the bucket that belongs to a given user.
Possible dumb solutions: Have both the old bucket size and the new bucket size. And then try to look in two buckets. Then slowly move all the users so that it matches by using new bucketlist.size(). This would not require a total stop, while hashing and moving.
What's needed: It is really the moving of all users that is bad. And looking in many buckets to find the correct one is also not ideal.
And the whole point is to be able to pinpoint which bucket in the list to use just by using an algorithm.
And it is not possible to have the size of the bucket list as part of the username.
It does not need to be hashing like it is done here if it roughly does the same.
I don't know if there is any sensible answer to this...
Any way to pre-size your hash set to something that will fit your data - thus eliminating or nearly eliminating the need to rehash? Also, even if you get some overlap, hashing with linked lists per node or something like it will not hurt too bad as long as the collisions don't get too deep.
I think that you're looking for linear hashing.
You can also consider any of the many kinds of balanced binary trees. They have the nice property that you can continue to grow them without rearranging the world at any point.
精彩评论