Speclialized hashtable algorithms for dynamic/static/incremental data

2022-12-19 09:38 问答作者：

I have a number of data sets that have ke开发者_如何学运维y-value pattern - i.e. a string key and a pointer to the data. Right now it is stored in hashtables, each table having array of slots corresponding to hash keys, and on collision forming a linked list under each slot that has collision (direct chaining). All implemented in C (and should stay in C) if it matters.

Now, the data is actually 3 slightly different types of data sets:

Some sets can be changed (keys added, removed, replaced, etc.) at will
For some sets data can be added but almost never replaced/removed (i.e. it can happen, but in practice it is very rare)
For some sets the data is added once and then only looked up, it is never changed once the whole set is loaded.

All sets of course have to support lookups as fast as possible, and consume minimal amounts of memory (though lookup speed is more important than size).

So the question is - is there some better hashtable structure/implementation that would suit the specific cases better? I suspect for the first case the chaining is the best, but not sure about two other cases.

If you are using linked lists for each bucket in your hashtable, you have already accepted relatively poor performance on modern CPUs (linked lists have poor locality and therefore poor CPU cache interaction). So I probably wouldn't worry about optimizing the other special cases. However, here are a few tips if you want to continue down the path you are using:

For the 'frequent changes' data set and the 'almost never change' cases, every time you read an item from the hash table, move it to the front of the linked list chain for that bucket. For some even better ideas this paper, even though it focus on fixed size keys, is a good staring point Fast and Compact Hash Tables for Integer Keys.

For the 'data set never changes' case you should look into the perfect hash generators. If you know your keys at compile time I've had good results with gperf. If your keys are not available until run-time try the C Minimal Perfect Hashing Library.

Those sets that are small (tens of elements) might be fastest using a binary or even linear search over the keys stored in sequential memory!

Obviously the key bodies have to be in the sequential memory, or hashes of them. But if you can get that into one or two L1 cache.lines, it'll fly.

As for the bigger hashes, the direct chaining might lose out to open addressing?

You could explore "cache conscious" hash tables and tries.

The wikipedia article discusses cache-lines in detail, describing the various trade-offs to consider.

继续阅读：algorithm c hash hashtable lookup

Speclialized hashtable algorithms for dynamic/static/incremental data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？