stl hash_map slower than simple hash function?

2022-12-15 12:11 问答作者：

I was comparing a simple hash function that I wrote which just multiplies it by a prime mod another prime number (the table size) and it turns out that stl is slower by 100 times. This is the test method that I wrote:

stdext:: hash_map<string, int> hashDict;
for (int l = 0; l < size; ++l){
    has开发者_如何学ChDict[arr[l]] = l;
}
long int before3 = GetTickCount();
int c = 0;
while (c < size){
    hashDict[arr[c]];
    c++;
}
long int after3 = GetTickCount();
cout << "for stl class, the time is " << (after3 - before3) / 1000.0 << '\n';
cout << "the average is " << ((after3 - before3) / 1000.0 ) /long (size) << '\n';

The size of the dictionary is about 200k elements and the table size of the hash function I wrote has 3m entries, so maybe it has to do with the table size of the stl class being very small. Does anyone know what the tablesize of the stl function is and the collision rates.etc?

The VS2008 STL implementation uses the following hash function for strings:

size_t _Val = 2166136261U;
while(_Begin != _End)
    _Val = 16777619U * _Val ^ (size_t)*_Begin++;

This is no less efficient than yours, certainly not 100x, and I doubt the Builder version is much different. The difference is either in measurement (GetTickCount() is not very precise) or in operations other than computing hash values.

I don't know about C++ Builder, but some STL implementations have a lot of extra checks and assertions built into the debug version. Have you tried profiling an optimized release build?

If you post a minimal but complete example we can help you figure out what is going on, but without more code there's really not much to say.

To post this as an answer: I think you're getting wrong values. You're not actually using the hashed values, just referencing them, and so the compiler may well be optimizing out the accesses to your hash table and not the hash_map.

Except in extreme and rare circumstances, you aren't going to beat a professional implementation by a large integral factor while still using the same algorithm, which you are. You're at least as likely to win the big prize in the lottery than to do it by a factor of 100.

Change your test code to use the values. Sum them up and print out the sum. That's fast and forces the program to look up each value.

I can confirm there can be huge differences between debug builds and release builds. In my example, I have program that uses hash_map and list. The debug version ran over 8 minutes. The release version ran in less than one second. That is 480x difference in time at least.

继续阅读：c++builder hash

stl hash_map slower than simple hash function?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？