Python - extracting a list of sub strings
Ho开发者_开发问答w to extract a list of sub strings based on some patterns in python?
for example,
str = 'this {{is}} a sample {{text}}'.
expected result : a python list which contains 'is' and 'text'
>>> import re
>>> re.findall("{{(.*?)}}", "this {{is}} a sample {{text}}")
['is', 'text']
Assuming "some patterns" means "single words between double {}'s":
import re
re.findall('{{(\w*)}}', string)
Edit: Andrew Clark's answer implements "any sequence of characters at all between double {}'s"
You can use the following:
res = re.findall("{{([^{}]*)}}", a)
print "a python list which contains %s and %s" % (res[0], res[1])
Cheers
A regex-based solution is fine for your example, although I would recommend something more robust for more complicated input.
import re
def match_substrings(s):
return re.findall(r"{{([^}]*)}}", s)
The regex from inside-out:
[^}] matches anything that's not a '}'
([^}]*) matches any number of non-} characters and groups them
{{([^}]*)}} puts the above inside double-braces
Without the parentheses above, re.findall would return the entire match (i.e. ['{{is}}', '{{text}}']. However, when the regex contains a group, findall will use that, instead.
You could use a regular expression to match anything that occurs between {{ and }}. Will that work for you?
Generally speaking, for tagging certain strings in a large body of text, a suffix tree will be useful.
加载中,请稍侯......
精彩评论