Should I cache the hash code of an STL string used as a hash key?
I've doing some performance analysis on the software I develop, and I've found that lookups on a global dictionary of URL's takes about 10% of the application's "load" phase time. The dictionary is implemented as a C++ STL std::map, which has O(lg n) lookups. I'm going to move it to a hash_map, which has roughly fixed time lookups. The stl string class doesn't have a hash code property, and it certainly doesn't cache a hash code. That means that each lookup requires re-generating the hash code.
开发者_运维问答I'm skeptical that caching the hash code is worth the effort. It would mean changing many lines of code to use a new string class with a cached hash code property. Given that the current implementation does log(n) full string comparisons on every lookup, I think reducing it to basically one string traversal (by the hash function) per lookup is a big win.
Does anyone have experience with caching string hash codes? Has it ever proven worth the effort?
I don't have experience with caching hash codes, but I've done some work recently converting std::map
to std::tr1::unordered_map
. Two thoughts come to mind. First, try profiling that relatively simple change first, because it sometimes makes things worse, depending on what your code is doing. It might give you enough speedup on its own before you try optimizing further. Secondly, what does your profiler say about the other 90% of your initialization time? Even if you optimized the global dictionary stuff down to 0 time, you will at most improve performance by 10%.
One word of warning.
While a hash map can have fixed time lookups, it also can end up having O(N) lookups. While it's not a common case, it does happen.
So while you always have to pay for the O(log N) time in a map, you are also guaranteed that it will not be worse.
When you compare the hash map to the map, also try a Trie, or related data structure (whatever you can get off the shelf):
Trie implementation
Unfortunately you may then spend a lot of time worrying about cache-friendliness. In that respect a Trie is similar to the tree you already have, and a hash map will probably be better-behaved than a naively-allocated tree.
Also, I'm a little confused by the question. If you're looking up the same string object multiple times, such that caching its hash value is worthwhile, shouldn't you just be caching the result of the lookup? The whole point of a hash table is that different objects which are value-equal hash to the same value. If you aren't computing the same hash several times from distinct strings containing the same characters, then your hash table probably isn't doing its job.
If you mean caching the values of the keys already in the hash table, that's up to the hash table.
You'll of course need to profile to check your results. Change to a hash map, and then see where most of your time is spent. Unless you're hashing keys left and right, I doubt most of your time will be spent there. Hashing is intended to be a fast operation, otherwise a hash map would have no advantages over an ordered container.
The compiler itself will know if a string hasn't been changed, and can probably cache the result for you (within the same scope). That said, you don't want to inherit from std::string
; STL classes weren't made for that.
Rather, make a std::pair
and pass that around:
std::pair<const std::string, const size_t> string_hash_pair;
You'd then need to overload the (going by Boost here, not TR1; I don't know how similar they are) hash_value
function for your type, in the same namespace as the pair is defined:
size_t hash_value(const string_hash_pair& pPair)
{
return pPair.second; // don't actually hash
}
And that's it. Note that in the pair, both string
and size_t
are immutable. This is because if the string
changes, your hash is wrong. So we make it const
, and we may as well make the hash const
too.
You'll want a helper function:
string_hash_pair make_string_hash(const std::string& pStr)
{
return std::make_pair(pStr, boost::hash_value(pStr));
}
Now if you're going to be using a string for look-ups, just make a pair out of it and you get constant-time hashing.
That said, I really doubt this much work is necessary. Hashing functions really are trivial, usually. Also, don't make your own. Use a pre-existing tried-and-tested hash; it's quite easy to make a crappy hash.
I did some comparisons of a set and unordered set with 4k - 64k strings in my dictionary.
I found that a std::set and unordered_set had about the same runtime in my situation because hash_value calculation took about 80% of the run time for the unordered set.
It dwarfed the lookup savings (used boost::hash_value for std::string FWIW)
YMMV, and for general cases I'd say profile and don't be fooled by theoretical scalings that don't account for cpu architecture etc. A hash map can run slower due to the hash cost, and will consume more memory.
My use case is that I store information for a long time and it gets updates regularly that doesn't change the information_id hash but may change other content.
Every update is then passed to my lookup function to decide if I need to notify externally for this update.
The list of information_ids to notify are in this lookup and can change independently of the information.
By caching the hash for the information_id, it is likely to get reused 10's of times during the lifetime of the information.
My two line change to cache the hash improved unordered_set's runtime by > x8
Test set: Benched on MSVC 2012 update 4 1M entries looked up 10 times each against a 4k and 64k dictionary: All but 10 checks are misses in 4k, 500 hits for 64k (more aardvarks :))
set : 1373 ms / 1938 ms
multiset: 1376 ms / 1913 ms
unordered_set initial 64k bucket / 0.5 load factor: 168 ms / 362 ms
unordered_set 4k / 1.0: 331 ms / 452 ms
c.f pre-cache
unordered_set 64k/0.5: 1519 ms / 1881 ms
FWIW Same things run against MinGW 4.9.1 -O3
set : 2003 ms / 2490 ms
multiset: 1978 ms / 2306 ms
unordered_set initial 64k bucket / 0.5 load factor: 140 ms / 605 ms
unordered_set 4k / 1.0: 318 ms / 683 ms
c.f pre-cache
unordered_set 64k/0.5: 1619 ms / 2455 ms
精彩评论