Why does this regex, using "not" & a backref, require a lazy match?
When using the not ^
operator in combination with a back reference, why do I need to use a lazy match? It seems like the not
should break the match.
For example:
<?php
preg_match('/(t)[^\1]*\1/', 'is this test ok', $matches);
echo $matches[0];
?>
开发者_运维问答Will output this test
, instead of this t
, in spite of the fact that the middle t
does not match [^\1]
. I need to use /(t)[^\1]*?\1/
to match this t
.
Furthermore
preg_match('/t[^t]*t/', 'is this test ok', $matches);
does match only this t
.
What is going on, and what am I misunderstanding?
It doesn't work because the \1
here is not a backreference inside a character class. The \1
is interpreted as the character with ASCII value 1.
You could use a negative lookaround instead to get the effect you want:
'/(t)(?:(?!\1).)*\1/'
You cannot use backreferences inside character classes. [^\1]
means "any character other than 1
".
Instead, use /(t)(?:(?!\1).)*\1/
.
(?:...)
is a non-capturing group
(?!...)
is a "negative look-ahead", asserting that the subexpression doesn't match
(?!\1).
, when \1
is a single character, means "any character that does not match \1
精彩评论