Object of custom type as dictionary key
What must I do to use my objects of a custom type as keys in a Python dictionary (where I don't want the "object id" to act as the key) , e.g.
class MyThing:
def __init__(self,name,location,length):
self.name = name
self.location = location
self.length = length
I'd want to use MyThing's as keys that are considered the same if name and location are the same. From C#/Java I'm used to having to override and provide an equals and hashcode method, and promise not to mutate anything the hashcode depends on.
What must I d开发者_StackOverflow中文版o in Python to accomplish this ? Should I even ?
(In a simple case, like here, perhaps it'd be better to just place a (name,location) tuple as key - but consider I'd want the key to be an object)
You need to add 2 methods, note __hash__
and __eq__
:
class MyThing:
def __init__(self,name,location,length):
self.name = name
self.location = location
self.length = length
def __hash__(self):
return hash((self.name, self.location))
def __eq__(self, other):
return (self.name, self.location) == (other.name, other.location)
def __ne__(self, other):
# Not strictly necessary, but to avoid having both x==y and x!=y
# True at the same time
return not(self == other)
The Python dict documentation defines these requirements on key objects, i.e. they must be hashable.
An alternative in Python 2.6 or above is to use collections.namedtuple()
-- it saves you writing any special methods:
from collections import namedtuple
MyThingBase = namedtuple("MyThingBase", ["name", "location"])
class MyThing(MyThingBase):
def __new__(cls, name, location, length):
obj = MyThingBase.__new__(cls, name, location)
obj.length = length
return obj
a = MyThing("a", "here", 10)
b = MyThing("a", "here", 20)
c = MyThing("c", "there", 10)
a == b
# True
hash(a) == hash(b)
# True
a == c
# False
You override __hash__
if you want special hash-semantics, and __cmp__
or __eq__
in order to make your class usable as a key. Objects who compare equal need to have the same hash value.
Python expects __hash__
to return an integer, returning Banana()
is not recommended :)
User defined classes have __hash__
by default that calls id(self)
, as you noted.
There is some extra tips from the documentation.:
Classes which inherit a
__hash__()
method from a parent class but change the meaning of__cmp__()
or__eq__()
such that the hash value returned is no longer appropriate (e.g. by switching to a value-based concept of equality instead of the default identity based equality) can explicitly flag themselves as being unhashable by setting__hash__ = None
in the class definition. Doing so means that not only will instances of the class raise an appropriate TypeError when a program attempts to retrieve their hash value, but they will also be correctly identified as unhashable when checkingisinstance(obj, collections.Hashable)
(unlike classes which define their own__hash__()
to explicitly raise TypeError).
I noticed in python 3.8.8 (maybe ever earlier) you don't need anymore explicitly declare __eq__()
and __hash__()
to have to opportunity to use your own class as a key in dict.
class Apple:
def __init__(self, weight):
self.weight = weight
def __repr__(self):
return f'Apple({self.weight})'
apple_a = Apple(1)
apple_b = Apple(1)
apple_c = Apple(2)
apple_dictionary = {apple_a : 3, apple_b : 4, apple_c : 5}
print(apple_dictionary[apple_a]) # 3
print(apple_dictionary) # {Apple(1): 3, Apple(1): 4, Apple(2): 5}
I assume from some time Python manages it on its own, however I can be wrong.
The answer for today as I know other people may end up here like me, is to use dataclasses in python >3.7. It has both hash and eq functions.
@dataclass(frozen=True)
example (Python 3.7)
@dataclass
had been previously mentioned at: https://stackoverflow.com/a/69313714/895245 but here's an example.
This awesome new feature, among other good things, automatically defines a __hash__
and __eq__
method for you, making it just work as usually expected in dicts and sets:
dataclass_cheat.py
from dataclasses import dataclass, FrozenInstanceError
@dataclass(frozen=True)
class MyClass1:
n: int
s: str
@dataclass(frozen=True)
class MyClass2:
n: int
my_class_1: MyClass1
d = {}
d[MyClass1(n=1, s='a')] = 1
d[MyClass1(n=2, s='a')] = 2
d[MyClass1(n=2, s='b')] = 3
d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] = 4
d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] = 5
d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] = 6
assert d[MyClass1(n=1, s='a')] == 1
assert d[MyClass1(n=2, s='a')] == 2
assert d[MyClass1(n=2, s='b')] == 3
assert d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] == 4
assert d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] == 5
assert d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] == 6
# Due to `frozen=True`
o = MyClass1(n=1, s='a')
try:
o.n = 2
except FrozenInstanceError as e:
pass
else:
raise 'error'
As we can see in this example, the hashes are being calculated based on the contents of the objects, and not simply on the addresses of instances. This is why something like:
d = {}
d[MyClass1(n=1, s='a')] = 1
assert d[MyClass1(n=1, s='a')] == 1
works even though the second MyClass1(n=1, s='a')
is a completely different instance from the first with a different address.
frozen=True
is mandatory, otherwise the class is not hashable, otherwise it would make it possible for users to inadvertently make containers inconsistent by modifying objects after they are used as keys. Further documentation: https://docs.python.org/3/library/dataclasses.html
Tested on Python 3.10.7, Ubuntu 22.10.
精彩评论