Hashing function of any live object (hash table)?

2023-04-09 08:45 问答作者：

Not sure whether it is sensible reopen my开发者_StackOverflow中文版 earlier thread on Hashing URL. Nonetheless, I am still curious know how this work undercover.

Assumption: We have a hashtable with n (where n < Infinity) element where asymptotic time complexity is o(1); we (CLR) have achieved this while applying some hashing function ( Hn-1 hash function where n>1).

Question: Can someone explain me how CLR map Key to the hash code when we seek (retrieve) any element (if different hashing functions are used)? How CLR track (if it) the hashing function of any live object (hash table)?

Thanks in advance.

Conceptually, there are two hash functions. The first hash function, as you probably have guessed, is the key object's GetHashCode method. The second hash function is a hash of the key returned by the first hash function.

So, imagine a hash table that has a capacity of 1,024 items, and you're going to insert two keys: K1 and K2.

K1.GetHashCode() returns 1,023. K2.GetHashCode() returns 65,535

The code then divides the returned key by the hash table size and takes the remainder. So both of the keys map to position 1,023 in the hash table.

K1 is added to the table. When it comes time to add K2, there is a collision. So the code resorts to the second hash function. That second hash function is probably a "bit mixer" (often the last stage in calculating a hash code) of some sort that randomizes the bits in the returned key. Conceptually, the code would look something like this:

int hashCode = K2.GetHashCode();
int slot = hashCode % 1024;
if (table[slot] != null)
{
    int secondHashCode = BitMixer(hashCode);
    slot = secondHashCode % 1024;
}

The point here is that the code doesn't have to keep track of multiple hash functions for the different keys. It knows that it can call Key.GetHashCode() to get the object's hash code. From there, it can call its own bit mixer function or functions to generate additional hash codes.

A hash code does not uniquely identify an object. It's just used to quickly put that object into a bucket. The elements in one bucket may but need not be equal, but elements in different buckets must be unequal.

Conceptually you can think of the default GetHashCode() implementation on reference types as using a field in every instance containing a random value for the hashcode which gets initialized on object creation. The actual implementation is a bit more complex but that doesn't matter here.

Since there are only 2 billion different hash codes, the O(1) runtime of most hash table implementations will break down if you have more elements than that. And of course the distribution must be good, i.e. there must not be too many hash collisions, but having a few is no big problem.

For types with value semantics you override both Equals and GetHashCode consistently to use the fields which determine equality.

Not sure if I understand you question well, but every object in .NET implements GetHashCode function which returns a hash code usable (and used) in dictionaries / hashtables, so the object itself is responsible for generating a good hash code.

Of course, there may (and will) be conficts as the hash code is an int. The conflicts are handled / resolved by the dictionary / hashtable.

Every object implements the GetHashCode() function and Equals() function. The default implementations for these are related to the object references. For example a.Equals(b) would return the same as object.ReferenceEquals(a,b). This would mean if two object references are equal so is their Hash Codes.

There are cases that you need to provide a different semantic to the Equals() function. In these cases you must maintain the contract that if a.Equals(b) then a.GetHashCode() == b.GetHashCode().

Hashing functions used are many and each with its own advantages and disadvantages. There is a useful explanation here. The actual function used is not something you should worry about, what is most important to keep the average o(1) lookup time in the Hashtable is (ideally) ensure that the objects which will be inserted have their GetHashCode() result is as close to uniformly distributed as possible.

继续阅读：c#-3.0 c#-4.0 hash hashmap

Hashing function of any live object (hash table)?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？