开发者

How vertical bars literal determine the formal regular expression in python?

According to the python doc, vertical bars literal are used as an 'or' operator. It matches A|B,where A and B can be arbitrary REs.

For example, if the regular expression is as following: ABC|DEF,it matches strings like these:

"ABC", "DEF"

But what if I want to match strings as following:

"ABCF", "ADEF"

Perhaps what I want is something like A(BC)|(DE)F which means:

I know the above expression is not right since brackets have other meanings in regular expression, just to express my idea.

Thanks!


These will work:

A(BC|DE)F
A(?:BC|DE)F

The difference is the number of groups generated: 1 with the first, 0 with the second.

Yours will match either ABC or DEF, with 2 groups, one containing nothing and the other containing the matched fragment (BC or DE).


The only difference between parentheses in Python regexps (and perl-compatible regexps in general), and parentheses in formal regular expressions, is that in Python, parens store their result. Everything matched by a regular expression inside parentheses is stored as a "submatch" or "group" that you can access using the group method on the match object returned by re.match, re.search, or re.finditer. They are also used in backreferences, a feature of Python RE/PCRE that violates normal regular expression rules, and that you probably don't care about.

If you don't care about the whole submatch extraction deal, it's fine to use parens like this. If you do care, there is a non-capturing version of parens that are exactly the same as formal regular expressions: (?:...) instead of (...).

This, and more, is described in the official docs

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜