When should I do rehashing of entire hash table?
How do I decid开发者_如何学运维e when should I do rehashing of entire hash table?
This depends a great deal on how you're resolving collisions. If you user linear probing, performance usually starts to drop pretty badly with a load factor much higher than 60% or so. If you use double hashing, a load factor of 80-85% is usually pretty reasonable. If you use collision chaining, performance usually remains reasonable with load factors up to around 150% or or more.
I've sometimes even created a hash table with balanced trees for collision resolution. In this case, you can almost forget about re-hashing -- the performance doesn't start to deteriorate noticeably until the number of items exceeds the table size by at least a couple orders of magnitude.
Generally, you have a hash table containing N elements distributed in an array of M slots.
There is a percent value (called "growthFactor") defined by the user when instantiating the hash table that is used in this way:
if (growthRatio < (N/M))
Rehash();
the rehash means your array of M slots should be resized to contain more elements (a prime number greater than the current size (or 2x greater) is ideal) and that your elements must be distributed in the new larger array.
Such value should set to something between 0.6 and 0.8.
A rule of thumb is to resize the table once it's 3/4 full.
精彩评论