开发者

PCRE (recursive) pattern that matches a string containing a correctly parenthesized substring. Why does this one fail?

Well, there are other ways (hmmm... or rather working ways) to do it, but the question is why does this one fail?

/
\A              # start of the string
(               # group 1
(?:             # group 2
[^()]*          # something other than parentheses (greedy)
|               # or
\( (?1) \)      # parenthesized group 1
) 开发者_运维知识库              # -group 2
+               # at least once (greedy)
)               # -group 1
\Z              # end of the string
/x

Fails to match a string with nested parentheses: "(())"


It doesn't fail

$ perl junk.pl
matched junk >(())<

$ cat junk.pl
my $junk = qr/
\A              # start of the string
(               # group 1
(?:             # group 2
[^()]*          # something other than parentheses (greedy)
|               # or
\( (?1) \)      # parenthesized group 1
)               # -group 2
+               # at least once (greedy)
)               # -group 1
\Z              # end of the string
/x;

if( "(())" =~ $junk ){
    print "matched junk >$1<\n";
}


Wow!.. Thank you, junk! It really works... in Perl. But not in PCRE. So, the question is mutating into "What's the difference between Perl and PCRE regex pattern matching?"

And voila! There is an answer:

Recursion difference from Perl

 In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
 always treated as an atomic group. That is, once it has matched some of
 the subject string, it is never re-entered, even if it contains untried
 alternatives  and  there  is a subsequent matching failure.

Therefore, we just need to swap two subpatterns:

/ \A ( (?: \( (?1) \) | [^()]* )+ ) \Z /x

Thank you!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜