Determine whether a key is present in a dictionary [duplicate]
I have a Python dictionary like mydict = {'name':'abc','city':'xyz','country','def'}
.
How should I check if a key is in the dictionary or not? I know these ways already:
if mydict.has_key('name'):
if 'name' in mydict:
if 'name' in mydict:
is the preferred, pythonic version. Use of has_key()
is discouraged, and this method has been removed in Python 3.
In the same vein as martineau's response, the best solution is often not to check. For example, the code
if x in d:
foo = d[x]
else:
foo = bar
is normally written
foo = d.get(x, bar)
which is shorter and more directly speaks to what you mean.
Another common case is something like
if x not in d:
d[x] = []
d[x].append(foo)
which can be rewritten
d.setdefault(x, []).append(foo)
or rewritten even better by using a collections.defaultdict(list)
for d
and writing
d[x].append(foo)
In terms of bytecode, in
saves a LOAD_ATTR
and replaces a CALL_FUNCTION
with a COMPARE_OP
.
>>> dis.dis(indict)
2 0 LOAD_GLOBAL 0 (name)
3 LOAD_GLOBAL 1 (d)
6 COMPARE_OP 6 (in)
9 POP_TOP
>>> dis.dis(haskey)
2 0 LOAD_GLOBAL 0 (d)
3 LOAD_ATTR 1 (haskey)
6 LOAD_GLOBAL 2 (name)
9 CALL_FUNCTION 1
12 POP_TOP
My feelings are that in
is much more readable and is to be preferred in every case that I can think of.
In terms of performance, the timing reflects the opcode
$ python -mtimeit -s'd = dict((i, i) for i in range(10000))' "'foo' in d"
10000000 loops, best of 3: 0.11 usec per loop
$ python -mtimeit -s'd = dict((i, i) for i in range(10000))' "d.has_key('foo')"
1000000 loops, best of 3: 0.205 usec per loop
in
is almost twice as fast.
My answer is "neither one".
I believe the most "Pythonic" way to do things is to NOT check beforehand if the key is in a dictionary and instead just write code that assumes it's there and catch any KeyErrors that get raised because it wasn't.
This is usually done with enclosing the code in a try...except
clause and is a well-known idiom usually expressed as "It's easier to ask forgiveness than permission" or with the acronym EAFP, which basically means it is better to try something and catch the errors instead for making sure everything's OK before doing anything. Why validate what doesn't need to be validated when you can handle exceptions gracefully instead of trying to avoid them? Because it's often more readable and the code tends to be faster if the probability is low that the key won't be there (or whatever preconditions there may be).
Of course, this isn't appropriate in all situations and not everyone agrees with the philosophy, so you'll need to decide for yourself on a case-by-case basis. Not surprisingly the opposite of this is called LBYL for "Look Before You Leap".
As a trivial example consider:
if 'name' in dct:
value = dct['name'] * 3
else:
logerror('"%s" not found in dictionary, using default' % name)
value = 42
vs
try:
value = dct['name'] * 3
except KeyError:
logerror('"%s" not found in dictionary, using default' % name)
value = 42
Although in the case it's almost exactly the same amount of code, the second doesn't spend time checking first and is probably slightly faster because of it (try...except block isn't totally free though, so it probably doesn't make that much difference here).
Generally speaking, testing in advance can often be much more involved and the savings gain from not doing it can be significant. That said, if 'name' in dict:
is better for the reasons stated in the other answers.
If you're interested in the topic, this message titled "EAFP vs LBYL (was Re: A little disappointed so far)" from the Python mailing list archive probably explains the difference between the two approached better than I have here. There's also a good discussion about the two approaches in the book Python in a Nutshell, 2nd Ed by Alex Martelli in chapter 6 on Exceptions titled Error-Checking Strategies. (I see there's now a newer 3rd edition, publish in 2017, which covers both Python 2.7 and 3.x).
精彩评论