开发者

Hashing function of any live object (hash table)?

Not sure whether it is sensible reopen my开发者_StackOverflow中文版 earlier thread on Hashing URL. Nonetheless, I am still curious know how this work undercover.

Assumption: We have a hashtable with n (where n < Infinity) element where asymptotic time complexity is o(1); we (CLR) have achieved this while applying some hashing function ( Hn-1 hash function where n>1).

Question: Can someone explain me how CLR map Key to the hash code when we seek (retrieve) any element (if different hashing functions are used)? How CLR track (if it) the hashing function of any live object (hash table)?

Thanks in advance.


Conceptually, there are two hash functions. The first hash function, as you probably have guessed, is the key object's GetHashCode method. The second hash function is a hash of the key returned by the first hash function.

So, imagine a hash table that has a capacity of 1,024 items, and you're going to insert two keys: K1 and K2.

K1.GetHashCode() returns 1,023. K2.GetHashCode() returns 65,535

The code then divides the returned key by the hash table size and takes the remainder. So both of the keys map to position 1,023 in the hash table.

K1 is added to the table. When it comes time to add K2, there is a collision. So the code resorts to the second hash function. That second hash function is probably a "bit mixer" (often the last stage in calculating a hash code) of some sort that randomizes the bits in the returned key. Conceptually, the code would look something like this:

int hashCode = K2.GetHashCode();
int slot = hashCode % 1024;
if (table[slot] != null)
{
    int secondHashCode = BitMixer(hashCode);
    slot = secondHashCode % 1024;
}

The point here is that the code doesn't have to keep track of multiple hash functions for the different keys. It knows that it can call Key.GetHashCode() to get the object's hash code. From there, it can call its own bit mixer function or functions to generate additional hash codes.


A hash code does not uniquely identify an object. It's just used to quickly put that object into a bucket. The elements in one bucket may but need not be equal, but elements in different buckets must be unequal.

Conceptually you can think of the default GetHashCode() implementation on reference types as using a field in every instance containing a random value for the hashcode which gets initialized on object creation. The actual implementation is a bit more complex but that doesn't matter here.

Since there are only 2 billion different hash codes, the O(1) runtime of most hash table implementations will break down if you have more elements than that. And of course the distribution must be good, i.e. there must not be too many hash collisions, but having a few is no big problem.


For types with value semantics you override both Equals and GetHashCode consistently to use the fields which determine equality.


Not sure if I understand you question well, but every object in .NET implements GetHashCode function which returns a hash code usable (and used) in dictionaries / hashtables, so the object itself is responsible for generating a good hash code.

Of course, there may (and will) be conficts as the hash code is an int. The conflicts are handled / resolved by the dictionary / hashtable.


Every object implements the GetHashCode() function and Equals() function. The default implementations for these are related to the object references. For example a.Equals(b) would return the same as object.ReferenceEquals(a,b). This would mean if two object references are equal so is their Hash Codes.

There are cases that you need to provide a different semantic to the Equals() function. In these cases you must maintain the contract that if a.Equals(b) then a.GetHashCode() == b.GetHashCode().

Hashing functions used are many and each with its own advantages and disadvantages. There is a useful explanation here. The actual function used is not something you should worry about, what is most important to keep the average o(1) lookup time in the Hashtable is (ideally) ensure that the objects which will be inserted have their GetHashCode() result is as close to uniformly distributed as possible.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜