Does the 'in' clause used on python dictionaries call the keys() function every time?
Let's say I have a
dict = {...} #lots of words in dictionary
I have to do a
for word in ...: #lo开发者_JAVA百科ng list of words
if word in dict:
#do something
My question is, does the 'if word in dict' call the dict.keys() function every time and hence a lot slower than if I added another variable at the top that is dict_keys = dict.keys()? The result of what I am talking about would be this.
dict = {...}
dict_keys = dict.keys()
for word in ...:
if word in dict_keys:
#do something
Thanks
no. foo in mydict
is in fact a lot faster than foo in keys_list
, since dict
s are hash tables, so finding the element inside it is O(1)
. While foo in keys_list
would be O(n)
(slower as the number of keys grows bigger)
But you could always test yourself:
$ python -m timeit -s "x = range(1000)" "15 in x"
1000000 loops, best of 3: 0.311 usec per loop
$ python -m timeit -s "x = dict.fromkeys(xrange(1000))" "15 in x"
10000000 loops, best of 3: 0.0515 usec per loop
So checking directly on the dict
for 1000 elements is one order of magnitude faster, not considering the time of the .keys()
Actually, builtin dicts just call their __contains__
which directly invokes the C function dict_has_key
which is very fast. So, doing it the first way is better because it doesn't force you to evaluate the entire sequence of keys within your dict as calling keys()
does.
Short answer: no, more like __contains__
, which is O(1) (i.e., cheap).
It sounds like you're actually trying to iterate over the intersection of the set of words
in "# long list of words" and the set of keys in the dictionary. So use the built-in sets and its overloading of the &
operator for intersections:
for word in set(some_dict) & set(wordlist):
# do something
Just based on simple experimentation with code like:
_w= ['a', 'b', 'c', 'd', 'e']
d= {}
w= []
for k in xrange(123):
for word in _w:
wk= word+ str(k)
d[wk]= k
w.append(wk)
def m1():
for word in w:
if word in d:
pass
def m2(n= 1):
dk= d.keys()
for word in w:
if word in dk:
pass
and timings:
In []: len(w)
Out[]: 5
In []: %timeit m1()
1000000 loops, best of 3: 657 ns per loop
In []: %timeit m2()
1000000 loops, best of 3: 1.55 us per loop
In []: len(w)
Out[]: 615
In []: %timeit m1()
10000 loops, best of 3: 49.2 us per loop
In []: %timeit m2()
100 loops, best of 3: 5.62 ms per loop
The conclusion will be that m1() is a clear winner (as expected ;-).
OK, this is not really a proof (in a strict sense) of the superiority of m1()
. There still exist a slight chance that someone will figure out a set of keys where m1()
and m2()
execution times are near each other (but I can't figure out any case where m1()
would be actually much worse than m2()
). In practice m1()
approach allways wins. It's allways engouraged to try out first hard facts of the alternative approaches, then if something contradicts your expectations you'll be more prepared to find why it happens.
精彩评论