Silly RegEx Confusion
Well, I've been using regular expressions with good success for a while, but I've run into a snag.
I have two string patterns that I would like to distinguish:
AAA(CR)(LF)*
vs
AAA BBBBB(CR)(LF)*
Where A is a letter, B could be any character except (CR)
/(LF)
, and (CR)
/(LF)
are carriage-return and line-feed (i.e., 0h0D/0h0A
).
I've tried the following pattern:
"[A-Z ]+.+\x0D\x0A\*"
But, aggravatingly, this matches both of the patterns above! Shouldn't the .+
prevent 开发者_如何学Pythonthe first pattern from being matched? As far as I understand, +
is a greedy-match of one or more of the previous tokens... Where am I going wrong?
Thanks,
BrianYour regex matches AAA(CR)(LF)
because the first two characters match [A-Z ]+
and then the third A matches .+
.
Although +
indicates a greedy match, the regex engine will backtrack after finding AAA
and discovering that the rest of the expression can't match. So it tries with AA
and discovers that it can match the rest of the string.
Unless I misunderstood...
"[A-Z]+\x0D\x0A\*"
or
"[A-Z]+ .+\x0D\x0A\*"
精彩评论