PCRE (recursive) pattern that matches a string containing a correctly parenthesized substring. Why does this one fail?
Well, there are other ways (hmmm... or rather working ways) to do it, but the question is why does this one fail?
/
\A # start of the string
( # group 1
(?: # group 2
[^()]* # something other than parentheses (greedy)
| # or
\( (?1) \) # parenthesized group 1
) 开发者_运维知识库 # -group 2
+ # at least once (greedy)
) # -group 1
\Z # end of the string
/x
Fails to match a string with nested parentheses: "(())"
It doesn't fail
$ perl junk.pl
matched junk >(())<
$ cat junk.pl
my $junk = qr/
\A # start of the string
( # group 1
(?: # group 2
[^()]* # something other than parentheses (greedy)
| # or
\( (?1) \) # parenthesized group 1
) # -group 2
+ # at least once (greedy)
) # -group 1
\Z # end of the string
/x;
if( "(())" =~ $junk ){
print "matched junk >$1<\n";
}
Wow!.. Thank you, junk! It really works... in Perl. But not in PCRE. So, the question is mutating into "What's the difference between Perl and PCRE regex pattern matching?"
And voila! There is an answer:
Recursion difference from Perl
In PCRE (like Python, but unlike Perl), a recursive subpattern call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent matching failure.
Therefore, we just need to swap two subpatterns:
/ \A ( (?: \( (?1) \) | [^()]* )+ ) \Z /x
Thank you!
精彩评论