Why does a positive lookahead lead to captures in my Perl regex?
I can't get why this code work:
$seq = 'GAGAGAGA';
my $regexp = '(?=((G[UCGA][GA]A)|(U[GA]CG)|(CUUG)))'; # zero width match
while ($seq =~ /$regexp/g){ # globally
开发者_运维问答 my $pos = pos($seq) + 1; # position of a zero width matching
print "$1 position $pos\n";
}
I know this is a zero width match and it dosn't put the matched string in $&, but why does it put it in $1?
thank you!
Matches are captured in $1
because of all the internal parentheses. If you don't want capturing, then use
my $regexp = '(?=(?:(?:G[UCGA][GA]A)|(?:U[GA]CG)|(?:CUUG)))';
or even better
my $regexp = qr/(?=(?:(?:G[UCGA][GA]A)|(?:U[GA]CG)|(?:CUUG)))/;
From the perlre documentation:
(?:pattern)
(?imsx-imsx:pattern)
This is for clustering, not capturing; it groups subexpressions like
()
, but doesn't make backreferences as()
does. So@fields = split(/\b(?:a|b|c)\b/)
is like
@fields = split(/\b(a|b|c)\b/)
but doesn't spit out extra fields. It's also cheaper not to capture characters if you don't need to.
Any letters between
?
and:
act as flags modifiers as with(?imsx-imsx)
. For example,/(?s-i:more.*than).*million/i
is equivalent to the more verbose
/(?:(?s-i)more.*than).*million/i
Your regular expression contains a capture (...)
which means the $1
, $2
, etc. variables will be populated with the results of those captures. This works in lookahead assertions too (although not lookbehind assertions, I believe).
As with all captures, if you rewrite as (?:...)
then the contents will not go into a capture variable.
精彩评论