Search list: match only exact word/string
How to match exact string/word while searching a list. I have tried, but its not correct. below I have given the sample list
, my code
and the test results
list = ['Hi, hello', 'hi mr 12345', 'welcome sir']
my code:
for str in list: if s in str: print str
test results:
s = "hello" ~ expected output: 'Hi, hello' ~ output I get: 'Hi, hello' s = "123" ~ expected output: *nothing* ~ output I get: 'hi mr 12345' s = "12345" ~ expected output: 'hi mr 12345' ~ output I get: 'hi mr 12345' s = "come" ~ expected output: *nothing* ~ output I get: 'welcome sir' s = "welcome" ~ expected output: 'welcome sir' ~ output I g开发者_运维技巧et: 'welcome sir' s = "welcome sir" ~ expected output: 'welcome sir' ~ output I get: 'welcome sir'
My list contains more than 200K strings
It looks like you need to perform this search not only once so I would recommend to convert your list into dictionary:
>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> d = dict()
>>> for item in l:
... for word in item.split():
... d.setdefault(word, list()).append(item)
...
So now you can easily do:
>>> d.get('hi')
['hi mr 12345']
>>> d.get('come') # nothing
>>> d.get('welcome')
['welcome sir']
p.s. probably you have to improve item.split()
to handle commas, point and other separators. maybe use regex and \w
.
p.p.s. as cularion mentioned this won't match "welcome sir". if you want to match whole string, it is just one additional line to proposed solution. but if you have to match part of string bounded by spaces and punctuation regex
should be your choice.
>>> l = ['Hi, hello', 'hi mr 12345', 'welcome sir']
>>> search = lambda word: filter(lambda x: word in x.split(),l)
>>> search('123')
[]
>>> search('12345')
['hi mr 12345']
>>> search('hello')
['Hi, hello']
if you search for exact match:
for str in list:
if set (s.split()) & set(str.split()):
print str
Provided s
only ever consists of just a few words, you could do
s = s.split()
n = len(s)
for x in my_list:
words = x.split()
if s in (words[i:i+n] for i in range(len(words) - n + 1)):
print x
If s
consists of many words, there are more efficient, but also much more complex algorithm for this.
use regular expression here to match exact word with word boundary \b
import re
.....
for str in list:
if re.search(r'\b'+wordToLook+'\b', str):
print str
\b only matches a word which is terminated and starts with word terminator e.g. space or line break
or do something like this to avoid typing the word for searching again and again.
import re
list = ['Hi, hello', 'hi mr 12345', 'welcome sir']
listOfWords = ['hello', 'Mr', '123']
reg = re.compile(r'(?i)\b(?:%s)\b' % '|'.join(listOfWords))
for str in list:
if reg.search(str):
print str
(?i) is to search for without worrying about the case of words, if you want to search with case sensitivity then remove it.
精彩评论