开发者

Search list: match only exact word/string

How to match exact string/word while searching a list. I have tried, but its not correct. below I have given the sample list, my code and the test results

list = ['Hi, hello', 'hi mr 12345', 'welcome sir']

my code:

for str in list:
  if s in str:
    print str

test results:

s = "hello" ~ expected output: 'Hi, hello' ~ output I get: 'Hi, hello'
s = "123" ~ expected output: *nothing* ~ output I get: 'hi mr 12345'
s = "12345" ~ expected output: 'hi mr 12345' ~ output I get: 'hi mr 12345'
s = "come" ~ expected output: *nothing* ~ output I get: 'welcome sir'
s = "welcome" ~ expected output: 'welcome sir' ~ output I g开发者_运维技巧et: 'welcome sir'
s = "welcome sir" ~ expected output: 'welcome sir' ~ output I get: 'welcome sir'

My list contains more than 200K strings


It looks like you need to perform this search not only once so I would recommend to convert your list into dictionary:

>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> d = dict()
>>> for item in l:
...     for word in item.split():
...             d.setdefault(word, list()).append(item)
...

So now you can easily do:

>>> d.get('hi')
['hi mr 12345']
>>> d.get('come')    # nothing
>>> d.get('welcome')
['welcome sir']

p.s. probably you have to improve item.split() to handle commas, point and other separators. maybe use regex and \w.

p.p.s. as cularion mentioned this won't match "welcome sir". if you want to match whole string, it is just one additional line to proposed solution. but if you have to match part of string bounded by spaces and punctuation regex should be your choice.


>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> search = lambda word: filter(lambda x: word in x.split(),l)
>>> search('123')
[]
>>> search('12345')
['hi mr 12345']
>>> search('hello')
['Hi, hello']


if you search for exact match:

for str in list:
  if set (s.split()) & set(str.split()):
    print str


Provided s only ever consists of just a few words, you could do

s = s.split()
n = len(s)
for x in my_list:
    words = x.split()
    if s in (words[i:i+n] for i in range(len(words) - n + 1)):
        print x

If s consists of many words, there are more efficient, but also much more complex algorithm for this.


use regular expression here to match exact word with word boundary \b

 import re
 .....
 for str in list:
 if re.search(r'\b'+wordToLook+'\b', str):
    print str

\b only matches a word which is terminated and starts with word terminator e.g. space or line break

or do something like this to avoid typing the word for searching again and again.

import re
list = ['Hi, hello', 'hi mr 12345', 'welcome sir']
listOfWords = ['hello', 'Mr', '123']
reg = re.compile(r'(?i)\b(?:%s)\b' % '|'.join(listOfWords))
for str in list:
   if reg.search(str):
      print str

(?i) is to search for without worrying about the case of words, if you want to search with case sensitivity then remove it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜