How vertical bars literal determine the formal regular expression in python?
According to the python doc, vertical bars literal are used as an 'or' operator. It matches A|B,where A and B can be arbitrary REs.
For example, if the regular expression is as following: ABC|DEF,it matches strings like these:
"ABC", "DEF"
But what if I want to match strings as following:
"ABCF", "ADEF"
Perhaps what I want is something like A(BC)|(DE)F which means:
- match "A" first, 开发者_如何学JAVA
- then string "BC" or "DE",
- then char "F".
I know the above expression is not right since brackets have other meanings in regular expression, just to express my idea.
Thanks!
These will work:
A(BC|DE)F
A(?:BC|DE)F
The difference is the number of groups generated: 1 with the first, 0 with the second.
Yours will match either ABC
or DEF
, with 2 groups, one containing nothing and the other containing the matched fragment (BC
or DE
).
The only difference between parentheses in Python regexps (and perl-compatible regexps in general), and parentheses in formal regular expressions, is that in Python, parens store their result. Everything matched by a regular expression inside parentheses is stored as a "submatch" or "group" that you can access using the group
method on the match object returned by re.match
, re.search
, or re.finditer
. They are also used in backreferences, a feature of Python RE/PCRE that violates normal regular expression rules, and that you probably don't care about.
If you don't care about the whole submatch extraction deal, it's fine to use parens like this. If you do care, there is a non-capturing version of parens that are exactly the same as formal regular expressions: (?:...)
instead of (...)
.
This, and more, is described in the official docs
精彩评论