开发者

Does the 'in' clause used on python dictionaries call the keys() function every time?

Let's say I have a

dict = {...} #lots of words in dictionary  

I have to do a

for word in ...: #lo开发者_JAVA百科ng list of words
    if word in dict:
        #do something

My question is, does the 'if word in dict' call the dict.keys() function every time and hence a lot slower than if I added another variable at the top that is dict_keys = dict.keys()? The result of what I am talking about would be this.

dict = {...}
dict_keys = dict.keys()

for word in ...:
    if word in dict_keys:
        #do something

Thanks


no. foo in mydict is in fact a lot faster than foo in keys_list, since dicts are hash tables, so finding the element inside it is O(1). While foo in keys_list would be O(n) (slower as the number of keys grows bigger)

But you could always test yourself:

$ python -m timeit -s "x = range(1000)" "15 in x"
1000000 loops, best of 3: 0.311 usec per loop
$ python -m timeit -s "x = dict.fromkeys(xrange(1000))" "15 in x"
10000000 loops, best of 3: 0.0515 usec per loop

So checking directly on the dict for 1000 elements is one order of magnitude faster, not considering the time of the .keys()


Actually, builtin dicts just call their __contains__ which directly invokes the C function dict_has_key which is very fast. So, doing it the first way is better because it doesn't force you to evaluate the entire sequence of keys within your dict as calling keys() does.


Short answer: no, more like __contains__, which is O(1) (i.e., cheap).


It sounds like you're actually trying to iterate over the intersection of the set of words in "# long list of words" and the set of keys in the dictionary. So use the built-in sets and its overloading of the & operator for intersections:

for word in set(some_dict) & set(wordlist):
    # do something


Just based on simple experimentation with code like:

_w= ['a', 'b', 'c', 'd', 'e']
d= {}
w= []
for k in xrange(123):
    for word in _w:
        wk= word+ str(k)
        d[wk]= k
        w.append(wk)

def m1():
    for word in w:
        if word in d:
            pass

def m2(n= 1):
    dk= d.keys()
    for word in w:
        if word in dk:
            pass

and timings:

In []: len(w)
Out[]: 5
In []: %timeit m1()
1000000 loops, best of 3: 657 ns per loop
In []: %timeit m2()
1000000 loops, best of 3: 1.55 us per loop

In []: len(w)
Out[]: 615
In []: %timeit m1()
10000 loops, best of 3: 49.2 us per loop
In []: %timeit m2()
100 loops, best of 3: 5.62 ms per loop

The conclusion will be that m1() is a clear winner (as expected ;-).

OK, this is not really a proof (in a strict sense) of the superiority of m1(). There still exist a slight chance that someone will figure out a set of keys where m1() and m2() execution times are near each other (but I can't figure out any case where m1() would be actually much worse than m2()). In practice m1() approach allways wins. It's allways engouraged to try out first hard facts of the alternative approaches, then if something contradicts your expectations you'll be more prepared to find why it happens.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜