Hash table optimized for full iteration + key replacement

2023-03-05 06:18 问答作者：

I have a hash table where the vast majority of accesses at run-time follow one of the following patterns:

Iterate through all key/value pairs. (The speed of this operation is critical.)
Modify keys (i.e. remove a key/value pair & add another with the same value but a different key. Detect duplicate keys & combine values if necessary.) This is done in a loop, affecting many thousands of keys, but with no other operations intervening.

I would also like it to consume as little memory as possible.

Other standard operations must be available, though they are used less frequently, e.g.

Insert a new key/value pair
Given a key, look up the corresponding value
Change the value associated with an existi开发者_Python百科ng key

Of course all "standard" hash table implementations, including standard libraries of most high-level-languages, have all of these capabilities. What I am looking for is an implementation that is optimized for the operations in the first list.

Issues with common implementations:

Most hash table implementations use separate chaining (i.e. a linked list for each bucket.) This works but I am hoping for something that occupies less memory with better locality of reference. Note: my keys are small (13 bytes each, padded to 16 bytes.)
Most open addressing schemes have a major disadvantage for my application: Keys are removed and replaced in large groups. That leaves deletion markers that increase the load factor, requiring the table to be re-built frequently.

Schemes that work, but are less than ideal:

Separate chaining with an array (instead of a linked list) per bucket:
Poor locality of reference, resulting from memory fragmentation as small arrays are reallocated many times
Linear probing/quadratic hashing/double hashing (with or without Brent's Variation):
Table quickly fills up with deletion markers
Cuckoo hashing
Only works for <50% load factor, and I want a high LF to save memory and speed up iteration.

Is there a specialized hashing scheme that would work well for this case?

Note: I have a good hash function that works well with both power-of-2 and prime table sizes, and can be used for double hashing, so this shouldn't be an issue.

Would Extendable Hashing help? Iterating though the keys by walking the 'directory' should be fast. Not sure if the "modify key for value" operation is any better with this scheme or not.

Based on how you're accessing the data, does it really make sense to use a hash table at all?

Since you're main use cases involve iteration - a sorted list or a btree might be a better data structure.

It doesnt seem like you really need the constant time random data access a hash table is built for.

You can do much better than a 50% load factor with cuckoo hashing.

Two hash functions with four items will get you over 90% with little effort. See this paper:

http://www.ru.is/faculty/ulfar/CuckooHash.pdf

I'm building a pre-computed dictionary using a cuckoo hash and getting a load factor of better than 99% with two hash functions and seven items per bucket.

继续阅读：algorithm hashtable

Hash table optimized for full iteration + key replacement

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？