Does it make sense to resize an Hash Table down? And When?
My Hash Table implementation has a function to resize the table when the load reaches about 70%. My Hash Table is implemented with separate chaining for collisions.
Does it make sense that I should resize the开发者_C百科 hash table down at any point or should I just leave it like it is? Otherwise, if I increase the size (by almost double, actually I follow this: Link) when the load is 70%, should I resize it down when the load gets 30% or below?
Hash tables don't have to have prime-number lengths if you have a good quality hash function (see here). You can make them powers of two, which substantially speeds up index computations.
Why is this relevant to the question? Because when you shrink a power-of-two hashtable, you can leave all entries in the bottom half where they are and simply append the linked list in slot i
(from the upper half) onto the linked list in slot i - n/2
.
If memory is cheap, leave it alone. If memory is expensive, resize with hysterisis as you have suggested. When done, profile the result to make sure it performs well and haven't done something silly.
Are you writing the hash table for general purpose use, or is there a specific purpose for it? I suggest not resizing smaller for a general implementation. This will keep your table simple and keep it from memory thrashing under conditions where the table is filled and emptied often. If you end up running into a condition where the hash table needs to be reduced in size, extend it at that point in time.
First idea: The only reason for growing a hashtable is because hashtable performance decreases if there are too many collisions. Growing the table when its load exceeds 70% is a good rule of the thumb to prevent this from happening but its just a rule of the thumb. Much better is to keep track of the number of collisions and only grow the hashtable if they exceed a certain limit or once a certain collision ratio is hit. After all, why would you want to grow a hashtable that is loaded by 90%, yet has not a single collision? It would have no advantage.
Second idea: The only reason to shrink a hashtable is to save memory, yet shrinking it could increase the number of collisions and thus decrease the lookup performance. This is a classical speed vs memory trade off and why should you solve it yourself? Leave it to whoever is using your code. Just never shrink on your own but offer a shrink method. If low memory usage is a requirement, whoever is using your code can call shrink regularly. If maximum performance if a requirement, whoever is using your code should never call shrink. Everyone else can use some kind of heuristic to decide if and when to call shrink.
Third idea: When growing or shrinking, always grow/shrink in such a way that after the operation a certain load factor is guaranteed. E.g. when growing, always grow so that afterwards the load factor is 50% and when shrinking, always shrink in such a way that afterwards the load factor is 70%. Of course, that says nothing about the number of collisions, so adding an element immediately after growing/shrinking may cause the hashtable to grow again but that is unavoidable as simulating the effect of a grow/shrink is usually too expensive. Also shrink will often be called once no further modifications are planed, thus it should rather save memory than avoid having to grow again in the future.
Last idea: For every decision you make, you will make the hashtable better for some usage cases and worse for other ones. If you know how your hashtable is going to be used, this won't be a problem. Yet if you don't, and usually you don't, why making these decisions yourself? Just delegate them. Allow the user of your code to customize all the small details, e.g. how much to grow or shrink, either by allowing all these factors to be set when your hashtable is being created or by allowing your hashtable to have delegate functions (callback functions that you can always ask when unsure what to do). That way every user of your code can customize your code even at runtime for whatever usage scenario they require it.
精彩评论