开发者

Is it a good idea to generate and store a random number as a hash value for my class?

I have a clas开发者_开发知识库s that essentially wraps a list, and lists apparently can't have hash values. My idea was to generate a random number and store that as the hash value.


If you have two instances that are equivalent then they must return the same __hash__ value. If you generate one randomly you can't guarantee this will be the case, so you will get strange behaviour.

Can you use a tuple instead of a list? Can you just ignore the list in the computation of the hash? It's OK if non-equal objects have a collision.

You should see DictionaryKeys about why lists aren't allowed to have hash values.


Not a good idea. The general contract of hash code is that if Object A equals Object B, A.hashCode() equals B.hashCode(). With what you're proposing this wont hold.

You could try using

  • list length as the hash code
  • hash code of the first item in the list as the hashcode
  • sum of all hashcodes of all items as the hashcode

or something else along these lines.


As mentioned by others, a if 2 objects are considered "equal" then they must have the same hash value.

How you define equals for your class is up to you. If you only care about referential equality then you could generate a random number on __init__, but it better be the case that MyWrapper([1,2,3]) == MyWrapper([1,2,3]) is False.

The reason you shouldn't use the contents of the list as a hash like @iluxa suggests is because if you use your class as a key in a dictionary, then change the contents such that the hash value changes, it would not be able to find that key in the dictionary because it stored the old hash value and is trying to look up the new one.

To sum up:

  1. If a == b then hash(a) == hash(b) must be true.
  2. If a != b then hash(a) != hash(b) should be true most of the time for performance of lookups, but it is not required to be the case.
  3. The value of a hash should not change between the time of adding it to a dict (or any other hash-based lookup structure that doesn't recalculate hash values on look-up) and trying to find it.

The last part is often just simplified to the value of the hash should not change ever.


It is reasonable to support hashing for mutable containers - but the general rule is that you're hashing the current value, not the container.

EDIT That's reasonable in principle, though Python generally guards against it.

There's even a specific hashing technique for this kind of task. Zobrist hashing calculates a hash incrementally, based on each change to the container, in such a way that no matter how you reach a particular value you get the same hash. This is used in chess programs to detect cases where the board reaches the same state through different sequences of moves, as an optimisation for the game tree searching.

The question here is why do you want to find a list using a hash-based lookup?

If the value varies, but you want to find the same container anyway, then this may work (with collision issues) - but it still seems pointless. The random hash value is just a kind of identifier, identifying which object you're referring to. The reference/pointer to the object is a better identifier. And how will you know which hash value to search for if you don't already have a reference to the object anyway?

If you want a collection of containers, a list of lists is a better solution than a dictionary of randomly-hashed lists.

EDIT A possible exception is if you want a collection of lists where the same list may be added more than once, but will only be present once in the container - a set of list instances. In that case, the hash should IMO be based on the reference to the object, but I can't remember how that's done.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜