What does `?` mean in this Perl regex?
I have a Perl regex. But I'm not sure wha开发者_如何学Pythont "?" means in this context.
m#(?:\w+)#
What does ?
mean here?
In this case, the ?
is actually being used in connection with the :
. Put together, ?:
at the beginning of a grouping means to group but not capture the text/pattern within the parentheses (as in, it will not be stored in any backreferences like \1
or $1
, so you will not be able to access the grouped text directly).
More specifically, a ?
has three distinct meanings in regex:
The
?
quantifier signifies "zero or one repetitions" of an expression. One of the canonical examples I've seen iss?he
which will match bothshe
andhe
since the?
makes thes
"optional"When a quantifier (
+
,*
,?
, or the general{n,m}
) is followed by a?
then the match is non-greedy (i.e. it will match the shortest string starting from that position that allows the match to proceed)A
?
at the beginning of a parenthesized group signifies that you want to perform a special action. As in this case,:
means to group but not capture. The exact list of actions available will vary somewhat from one regex engine to another, but here's a list (not necessarily all-inclusive) of some of them:A. Non-capturing group:
(?:text)
B. Lookaround:(?=a)
for a lookahead,?!
for negative lookahead, or?<=
and?<!
for lookbehinds (positive and negative, respectively).
C. Conditional Matches:(?(condition)then|else)
.
D. Atomic Grouping:a(?>bc|b)c
(matchesabcc
but notabc
; see the link)
E. Inline enabling/disabling of regex matching modifiers:?i
to enable a mode,?-i
to disable. You can also enable/disable more than one modifier at a time by simply concatenating them, such as?im
(i
is case insensitive andm
is multiline).
F. Named capture groups:(?P<name>pattern)
, which can later be referenced using(?P=name)
. The .NET regex engine uses the syntax(?<name>pattern)
instead.
G. Comments:(?#Comment text)
. I personally think this just adds clutter, but I guess it could serve some use...free-spacing mode might be a better option (the(?x)
modifier).
So essentially, the purpose of the ?
is just contextual. If you wanted zero or more repetitions of a literal (
character you'd have to use \(?
to escape the paren.
$ perldoc perlreref:
(?:...)
Groups subexpressions without capturing (cluster)
You can also use YAPE::Regex::Explain:
C:\\Temp> perl -MYAPE::Regex::Explain -e \ "print YAPE::Regex::Explain->new(qr#(?:\w+)#)->explain" The regular expression: (?-imsx:(?:\w+)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Those are non-capturing parentheses. They're used for grouping (just like normal parentheses) but the group won't be added to the capture array (i.e. it won't be referenceable with \n).
See here: http://www.regular-expressions.info/refadv.html
In short, the sequence (?
starts a regular expression special feature. The things that follow the (?
specify the special feature, in this case, a non-capturing grouping. We cover this in both Intermediate Perl and Effective Perl Programming. The perlre documents Perl regular expressions.
See the regex tutorial that is installed with every version of Perl (in particular, this section).
精彩评论