How does Python handle checking 'if object in list'

2023-03-21 12:45 问答作者：

I'm wondering because I need to have have a function that is disgustingly fast at checking if a word is in a dictionary list - I'm considering leaving the dictionary as a large st开发者_开发百科ring and running regex against instead. This needs to be absurdly fast. So I just need a basic overview of how python handles checking if a string is in a list of strings and if its beyond-reasonable fast.

If you want a blazingly fast membership test, then a list is the wrong data structure. Take a look at the implementation of list_contains in listobject.c, line 437. It iterates over the list in order, comparing the item with each element in turn. The later the item appears in the list, the longer it will take to find it, and if the item is missing, then the whole list must be scanned.

Use a set instead. Sets are implemented internally by a hash table, so looking up an object involves computing its hash and then scanning a few table entries (usually just one). For the particular case of looking up a string, see set_lookkey_string in setobject.c, line 156.

A set of strings will have O(1) lookup time: effectively constant regardless of the size of the set. Making a set from your list of strings is easy:

my_set = set(my_list)
if my_word in my_set:
    print "it's there!"

If you need real fast checking, use a set:

words = set(words_list)
if "hello" in words:
    print("hello found!"")

A set is faster because it uses a hash-algorithm, instead of a direct search approach.

According to this site, x in s is O(n). Therefore, it checks each entry (in the worst case).

At any rate, do not use a regex. Using sets or lists is a much more intuitive way to represent the data and regexes will not perform better than O(n).

If you're using a regular list, consider a set instead.

If you want to implement your own fine-tuned membership test for your container object, override __contains__.

You probably want to use a Set if you're worried about time. A Set is much like a list, but it checks for membership based on hashing.

Use a set. If you need case-insensitive checking, just store the words into the set downcased. Then when checking if a certain word is in the set, downcase the word before checking membership.

The general rule is: normalize entries when building the set, and normalize an item before checking against the set. Another example of normalization is collapsing consecutive whitespace chars into a single space and stripping leading/trailing whitespace.

Running a regex against your word list is a very bad idea; it scales very badly. Using dict(), set() or frozenset() will scale a lot better:

s = set(['one','two','three'])
'two' in s     ## true

b='four'
b in s         ## false

s.add('four')
b in s         ## true

继续阅读：algorithm list performance python

How does Python handle checking 'if object in list'

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？