Does the 'in' clause used on python dictionaries call the keys() function every time?

2023-02-08 12:16 问答作者：

Let's say I have a

dict = {...} #lots of words in dictionary

I have to do a

for word in ...: #lo开发者_JAVA百科ng list of words
    if word in dict:
        #do something

My question is, does the 'if word in dict' call the dict.keys() function every time and hence a lot slower than if I added another variable at the top that is dict_keys = dict.keys()? The result of what I am talking about would be this.

dict = {...}
dict_keys = dict.keys()

for word in ...:
    if word in dict_keys:
        #do something

Thanks

no. foo in mydict is in fact a lot faster than foo in keys_list, since dicts are hash tables, so finding the element inside it is O(1). While foo in keys_list would be O(n) (slower as the number of keys grows bigger)

But you could always test yourself:

$ python -m timeit -s "x = range(1000)" "15 in x"
1000000 loops, best of 3: 0.311 usec per loop
$ python -m timeit -s "x = dict.fromkeys(xrange(1000))" "15 in x"
10000000 loops, best of 3: 0.0515 usec per loop

So checking directly on the dict for 1000 elements is one order of magnitude faster, not considering the time of the .keys()

Actually, builtin dicts just call their __contains__ which directly invokes the C function dict_has_key which is very fast. So, doing it the first way is better because it doesn't force you to evaluate the entire sequence of keys within your dict as calling keys() does.

Short answer: no, more like __contains__, which is O(1) (i.e., cheap).

It sounds like you're actually trying to iterate over the intersection of the set of words in "# long list of words" and the set of keys in the dictionary. So use the built-in sets and its overloading of the & operator for intersections:

for word in set(some_dict) & set(wordlist):
    # do something

Just based on simple experimentation with code like:

_w= ['a', 'b', 'c', 'd', 'e']
d= {}
w= []
for k in xrange(123):
    for word in _w:
        wk= word+ str(k)
        d[wk]= k
        w.append(wk)

def m1():
    for word in w:
        if word in d:
            pass

def m2(n= 1):
    dk= d.keys()
    for word in w:
        if word in dk:
            pass

and timings:

In []: len(w)
Out[]: 5
In []: %timeit m1()
1000000 loops, best of 3: 657 ns per loop
In []: %timeit m2()
1000000 loops, best of 3: 1.55 us per loop

In []: len(w)
Out[]: 615
In []: %timeit m1()
10000 loops, best of 3: 49.2 us per loop
In []: %timeit m2()
100 loops, best of 3: 5.62 ms per loop

The conclusion will be that m1() is a clear winner (as expected ;-).

OK, this is not really a proof (in a strict sense) of the superiority of m1(). There still exist a slight chance that someone will figure out a set of keys where m1() and m2() execution times are near each other (but I can't figure out any case where m1() would be actually much worse than m2()). In practice m1() approach allways wins. It's allways engouraged to try out first hard facts of the alternative approaches, then if something contradicts your expectations you'll be more prepared to find why it happens.

继续阅读：optimization python

Does the 'in' clause used on python dictionaries call the keys() function every time?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？