Find substring in string but only if whole words?

2023-01-24 19:50 问答作者：

What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?

Perhaps an example will demonstrate what I mean:

string1 = "ADDLESHAW GODDARD"
string2 = "ADDLESHAW GODDARD LLP"
assert string_found(string1, string2)  # this is True
string1 = "ADVANCE"
string2 = "ADVANCED BUSINESS EQUIPMENT LTD"
assert not string_found(string1, string2)  # this should be False

How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:

def string_found(string1, string2):
 开发者_StackOverflow中文版  if string2.find(string1 + " "):
      return True
   return False

But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)

You can use regular expressions and the word boundary special character \b (highlight by me):

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

def string_found(string1, string2):
    if re.search(r"\b" + re.escape(string1) + r"\b", string2):
        return True
    return False

Demo

If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:

def string_found(string1, string2):
    string1 = " " + string1.strip() + " "
    string2 = " " + string2.strip() + " "
    return string2.find(string1)

The simplest and most pythonic way, I believe, is to break the strings down into individual words and scan for a match:

string = "My Name Is Josh"
substring = "Name"

for word in string.split():
    if substring == word:
        print("Match Found")

For a bonus, here's a oneliner:

any(substring == word for word in string.split())

Here's a way to do it without a regex (as requested) assuming that you want any whitespace to serve as a word separator.

import string

def find_substring(needle, haystack):
    index = haystack.find(needle)
    if index == -1:
        return False
    if index != 0 and haystack[index-1] not in string.whitespace:
        return False
    L = index + len(needle)
    if L < len(haystack) and haystack[L] not in string.whitespace:
        return False
    return True

And here's some demo code (codepad is a great idea: Thanks to Felix Kling for reminding me)

I'm building off aaronasterling's answer.

The problem with the above code is that it will return false when there are multiple occurrences of needle in haystack, with the second occurrence satisfying the search criteria but not the first.

Here's my version:

def find_substring(needle, haystack):
  search_start = 0
  while (search_start < len(haystack)):
    index = haystack.find(needle, search_start)
    if index == -1:
      return False
    is_prefix_whitespace = (index == 0 or haystack[index-1] in string.whitespace)
    search_start = index + len(needle)
    is_suffix_whitespace = (search_start == len(haystack) or haystack[search_start] in string.whitespace)
    if (is_prefix_whitespace and is_suffix_whitespace):
      return True
  return False

One approach using the re, or regex, module that should accomplish this task is:

import re

string1 = "pizza pony"
string2 = "who knows what a pizza pony is?"

search_result = re.search(r'\b' + string1 + '\W', string2)

print(search_result.group())

Excuse me REGEX fellows, but the simpler answer is:

text = "this is the esquisidiest piece never ever writen"
word = "is"
" {0} ".format(text).lower().count(" {0} ".format(word).lower())

The trick here is to add 2 spaces surrounding the 'text' and the 'word' to be searched, so you guarantee there will be returning only counts for the whole word and you don't get troubles with endings and beginnings of the 'text' searched.

Thanks for @Chris Larson's comment, I test it and updated like below:

import re

string1 = "massage"
string2 = "muscle massage gun"
try:
    re.search(r'\b' + string1 + r'\W', string2).group()
    print("Found word")
except AttributeError as ae:
    print("Not found")

def string_found(string1,string2):
    if string2 in string1 and string2[string2.index(string1)-1]==" 
    " and string2[string2.index(string1)+len(string1)]==" ":return True
    elif string2.index(string1)+len(string1)==len(string2) and 
    string2[string2.index(string1)-1]==" ":return True
    else:return False

继续阅读：python search string substring

Find substring in string but only if whole words?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？