开发者

Hashing of pointer values

Sometimes you need to take a hash function of a pointer; not the object the pointer points to, but the pointer itself. Lots of the time, folks just punt an开发者_开发知识库d use the pointer value as an integer, chop off some high bits to make it fit, maybe shift out known-zero bits at the bottom. Thing is, pointer values aren't necessarily well-distributed in the code space; in fact, if your allocator is doing its job, there's an excellent chance they're all clustered close together.

So, my question is, has anyone developed hash functions that are good for this? Take a 32- or 64-bit value that's maybe got 12 bits of entropy in it somewhere and spread it evenly across a 32-bit number space.


This page lists several methods that might be of use. One of them, due to Knuth, is a simple as multiplying (in 32 bits) by 2654435761, but "Bad hash results are produced if the keys vary in the upper bits." In the case of pointers, that's a rare enough situation.

Here are some more algorithms, including performance tests.

It seems that the magic words are "integer hashing".


They'll likely exhibit locality, yes - but in the lower bits, which means objects will be distributed through the hashtable. You'll only see collisions if a pointer's address is a multiple of the hashtable's length from another pointer.


If you know the lowest possible pointer address (which is often the case if you're working within a large buffer), just convert the pointer to an integer by subtracting the lowest possible pointer value; eg. that could be the buffer's base address. -Remember: pointer subtracted from pointer equals an offset (integer). So: Don't "chop off" bits; it's much better to convert to an offset. This will result in that the offset value is much smaller than a pointer value. It may help further to shift the pointer value right twice (eg. divide by 4) in some cases as well, before hashing it. The problem with pointers is often that small blocks of memory is likely to be allocated on the same address (eg. a block being freed and another block is taking the freed block's place).


Why not just use an existing hash function?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜