开发者

How to use python regex to match words beginning with hash and question mark?

This should be easy and this regex works fine to search for words beginning wit开发者_JAVA百科h specific characters, but I can't get it to match hashes and question marks.

This works and matches words beginning a:

r = re.compile(r"\b([a])(\w+)\b")

But these don't match: Tried:

r = re.compile(r"\b([#?])(\w+)\b")
r = re.compile(r"\b([\#\?])(\w+)\b")
r = re.compile( r"([#\?][\w]+)?")

even tried just matching hashes

r = re.compile( r"([#][\w]+)?"
r = re.compile( r"([/#][\w]+)?"

text = "this is one #tag and this is ?another tag"
items = r.findall(text)

expecting to get:

[('#', 'tag'), ('?', 'another')]


\b matches the empty space between a \w and \W (or between a \W and \w) but there is no \b before a # or ?.

In other words: remove the first word boundary.

Not:

r = re.compile(r"\b([#?])(\w+)\b")

but

r = re.compile(r"([#?])(\w+)\b")


you are using Python, regex is the last thing to come to mind

>>> text = "this is one #tag and this is ?another tag"
>>> for word in text.split():
...   if word.startswith("#") or word.startswith("?"):
...     print word
...
#tag
?another


The first \b won't match before # or ?, use (?:^|\s) instead.

Also, the \b at the end is unnecessary, because \w+ is a greedy match.

r = re.compile(r"(?:^|\s)([#?])(\w+)")

text = "#head this is one #tag and this is ?another tag, but not this?one"
print r.findall(text)
# Output: [('#', 'head'), ('#', 'tag'), ('?', 'another')]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜