How to use python regex to match words beginning with hash and question mark?
This should be easy and this regex works fine to search for words beginning wit开发者_JAVA百科h specific characters, but I can't get it to match hashes and question marks.
This works and matches words beginning a:
r = re.compile(r"\b([a])(\w+)\b")
But these don't match: Tried:
r = re.compile(r"\b([#?])(\w+)\b")
r = re.compile(r"\b([\#\?])(\w+)\b")
r = re.compile( r"([#\?][\w]+)?")
even tried just matching hashes
r = re.compile( r"([#][\w]+)?"
r = re.compile( r"([/#][\w]+)?"
text = "this is one #tag and this is ?another tag"
items = r.findall(text)
expecting to get:
[('#', 'tag'), ('?', 'another')]
\b
matches the empty space between a \w
and \W
(or between a \W
and \w
) but there is no \b
before a #
or ?
.
In other words: remove the first word boundary.
Not:
r = re.compile(r"\b([#?])(\w+)\b")
but
r = re.compile(r"([#?])(\w+)\b")
you are using Python, regex is the last thing to come to mind
>>> text = "this is one #tag and this is ?another tag"
>>> for word in text.split():
... if word.startswith("#") or word.startswith("?"):
... print word
...
#tag
?another
The first \b
won't match before #
or ?
, use (?:^|\s)
instead.
Also, the \b
at the end is unnecessary, because \w+
is a greedy match.
r = re.compile(r"(?:^|\s)([#?])(\w+)")
text = "#head this is one #tag and this is ?another tag, but not this?one"
print r.findall(text)
# Output: [('#', 'head'), ('#', 'tag'), ('?', 'another')]
精彩评论