开发者

Keys are not unique for a python dictionary!

A stupid newbie question here For a python dictionary q len(set(q.keys())) !开发者_如何学C= len(q.keys()). Is that even possible?


This can happen if you violate a requirement of dict, and change its hash.

When an object is used in a dict, its hash value must not change, and its equality to other objects must not change. Other properties may change, as long as they don't affect how it appears to the dict.

(This does not mean that a hash value is never allowed to change. That's a common misconception. Hash values themselves may change. It's only dict which requires that key hashes be immutable, not __hash__ itself.)

The following code adds an object to a dict, then changes its hash out from under the dict. q[a] = 2 then adds a as a new key in the dict, even though it's already present; since the hash value changed, the dict doesn't find the old value. This reproduces the peculiarity you saw.

class Test(object):
    def __init__(self, h):
        self.h = h
    def __hash__(self):
        return self.h

a = Test(1)
q = {}
q[a] = 1
a.h = 2
q[a] = 2

print q

# True:
print len(set(q.keys())) != len(q.keys())


The underlying code for dictionaries and sets is substantially the same, so you can usually expect that len(set(d.keys()) == len(d.keys()) is an invariant.

That said, both sets and dicts depend on __eq__ and __hash__ to identify unique values and to organize them for efficient search. So, if those return inconsistent results (or violate the rule that "a==b implies hash(a)==hash(b)", then there is no way to enforce the invariant:

>>> from random import randrange
>>> class A():
    def __init__(self, x):
        self.x = x
    def __eq__(self, other):
        return bool(randrange(2))
    def __hash__(self):
        return randrange(8)
    def __repr__(self):
        return '|%d|' % self.x


>>> s = [A(i) for i in range(100)]
>>> d = dict.fromkeys(s)
>>> len(d.keys())
29
>>> len(set(d.keys()))
12
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜