开发者

Why does a hashtable degenerate into a Linked List when a hashcode() implementation returns a constant value?

// The worst possible legal hash function - never use!
@Override public int hashCode() { return 42; }

It’s legal because it ensures that equal objects have the same hash code. It’s atrocious because it ensures that every object has the same hash code. Therefore, every object hashes to the same bucket, and hash tables degenerate to linked lists. Programs that should run in linear time instead run in quadratic time.

Am trying to figure the above out (quote from pg 47, Item 9, Joshua Bloch's Effective Java).

The way I see it is as follows (consider the following code):

Map<String, String> h = new HashMap<String,String>();
h.put("key1", "value1");
h.put("key1", "value2");

What happens with the second h.put("key1",...) command is as follows: 1. Get the hashcode of key1 2. Get to the bucket representing the above hashcode 3. Within that bucket, for each object, invoke the equals method to find whether an identical object exists.

This is kind of faster, because first you find the 'group' (bucket) of objects and then the actual object.

Now, when the hashcode implementation is such that it returns the same integer (such as 42 above) for ALL objects, then there is only one bucket, and the equals method needs to be invoked one-by-one on each object in the entire hashmap/hashtable. This is as bad as a linked list because if the objects where in a linked list, then too, one would have to go through them one by one comparing (calling equals) each object.

Is that why, it was sa开发者_JS百科id, that the hashtables degenerate into a linked list ?

(I apologize for the verbosity of the above text. I am not clear enough in my concepts to have stated it more succinctly)


HashTable is an array with mapping function (hashCode). When inserting into the array you calculate the position and insert the element there.

BUT, the hashCode does not guarantee, that every element will have a different position, so some objects might collide (have the same address) and the hashTable has to resolve it. There are two common approaches, how to do that.

Separate chaining

In separate chaining (used in Java) every index of the array contains a linked list - so every bucket (position) has a infinite capacity. Hence if your hashCode returns only one value, you are using only one liked list => hashTable is a linked list.

Linear probing

Second approach is a linear probing. In linear probing the inner array is really normal array of elements. When you find out, that the position is already occupied, you iterate over the array and place the new element at the first empty position.

So I your impl of hashCode generates contant value for every element, you are generating only colisions, hence you are trying to place all the elements to the same index and because is always occupied, you iterate over all aready placed elements and place the new element at the end of this structure. If you read again, what you are doing, you must see, that you are using only a different (you can say implicit) implementation of a linked list.

Why not to do it

You really should not return constant values, because hashtables are built to provide O(1) expected complexity of search and insert operations (because of the hash function, which returns a different address for (almost) every different object). If you return only one value, the implementation degrades to linked list with O(n) for both operations.


Yes, your understanding seems accurate. However, it is not like a linked list. The actual internal implementation of entries that share a common bucket is a plain old linked list. The bucket holds the Map.Entry at the head of the list and each entry has a forward pointer to the next occupant of its bucket. (For the implementation of HashMap that's built into Java of course.)


Hash tables -- when used correctly -- offer constant-time lookups on average. In terms of time complexity, constant time is as good as it gets.

Linked lists offer linear-time lookups. Linear time (i.e. look at every element in turn) is as bad as it gets.

When a hash table is misused in the manner described by Bloch, its lookup behaviour degenerates into that of a linked list, simply because it effectively becomes a linked list.

Similar things can be said about other operations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜