开发者

.Net GetHashcode Bit Shifting Operation

I was looking through some of the .net source yesterday and saw several implementations of GetHashcode with something along the lines of this:

(i1 << 5) + i ^ i2

I understand what the code is doing and why. What I want to know is why they used (i1 << 5) + i instead of (i1 << 5) - i.

Most frameworks I've seen use -i because that's equivalent to multiplying by 31 which is prime, but the Microsoft way is equivalent to multiplying by 33 which has 11 and 3 as factors and thus isn't prime.

Is there a known justification for this? Any rea开发者_如何学运维sonable hypotheses?


I asked the same question on math.stackexchange.com: Curious Properties of 33.

The conjecture among mathematicians and the research I did on the topic leads me to believe that the answer is this:

Okay, I found out why Microsoft uses 33. That's called the Bernstein Hash. It turns out that 33 has some magical properties that produce a good distribution of hash codes and there's very little theoretical knowledge as to why.

Basically, in entropy and speed comparisons, Bernstein does well enough and is quite snappy. Dan Bernstein, the guy who came up with the constant 33, wasn't able to explain what property of 33 produced such a good distribution of hashes.

Several papers have been written comparing hash functions and have corroborated this finding without further explaining the benefit of using 33. Further, I couldn't find why Java uses 31 instead. It appears to be a mathematical and programming mystery to date.


I don't remember if 31 is one of those primes, but there are certain primes which get used as capacities by Dictionary<K,V>. And if you use the left field doesn't influence the chosen bucket anymore and the hash degenerates.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜