The influence of ? in the regex string
Consider the following Python code:
>>> re.search(r'.*(99)', 'aa99bb').groups()
('99',)
>>> re.search(r'.*(99)?', 'aa99bb').groups()
(None,)
I don't understand why I don't catch开发者_开发问答 99 in the second example.
This is because the .*
first matches the entire string. At that point, it's not possible to match 99
any more, and since the group is optional, the regex engine stops because it has found a successful match.
If on the other hand the group is mandatory, the regex engine has to backtrack into the .*
.
Compare the following debug sessions from RegexBuddy (the part of the string matched by .*
is highlighted in yellow, the part matched by (99)
in blue):
.*(99)
:
.*(99)?
:
Depending on your need, a good choice might be [^9]*(99)?
.
No backtracking, instead matches anything other than 9 followed by an optional 99. Doesn't work if you want to ignore 9s before the 99 though.
>>> re.search(r'[^9]*(99)?', 'aa99bb').groups()
('99',)
>>> re.search(r'[^9]*(99)?', 'aa9x99bb').groups()
(None,)
精彩评论