开发者

What algorithms are available to resize a hash table?

I have implemented my own hash table functions in C, but currently it doesn't support resizing. I was 开发者_如何学运维wondering what algorithms do exist apart from the brute-force way of creating a new empty hash table and moving everything there?


There is incremental resizing.

From Wikipedia:

Incremental resizing

Some hash table implementations, notably in real-time systems, cannot pay the price of enlarging the hash table all at once, because it may interrupt time-critical operations. If one cannot avoid dynamic resizing, a solution is to perform the resizing gradually:

During the resize, allocate the new hash table, but keep the old table unchanged. In each lookup or delete operation, check both tables. Perform insertion operations only in the new table. At each insertion also move r elements from the old table to the new table. When all elements are removed from the old table, deallocate it.

To ensure that the old table will be completely copied over before the new table itself needs to be enlarged, it is necessary to increase the size of the table by a factor of at least (r + 1)/r during the resizing.

So this is not some clever way of moving all of the elements from the old table into the new table (and if there is one, I haven't seen it); rather, it eases the burden of resizing by allowing the migration to happen gradually.


Wikipedia has some words of wisdom on the subject.

Also, it's not a solution, but could be a part of one - if you're under windows you might use the VirtualAlloc family of functions which allow you to reserve address space without actually committing memory pages. That is, in laymans terms, you would do something like a "malloc" and tell it to "reserve 1000MB, but only make the first 10 available". So if you write past the 10MB, you'd get the usual crash. But when the time comes to expand, you just say "OK, give me another 10MB after the first ones". And the next 10MB is made available at the address directly after the first 10MB. It's like resizing an array. The actual RAM in use will be only as much as you need, but the memory addresses will be reserved in advance so that other memory allocation operations don't use them.


The usual cop-out is to leave it up to the client code to guess the best number of buckets up front. That's serviceable, the client usually has a reasonable guess as to how many elements will end up in the table. If you want to do it automatically then you first have to declare an array of primes for bucket sizes. When you see the load factor of a bucket getting too high, pick the next prime in the array, recreate the bucket list and move the elements from the old buckets to the new table.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜