开发者

Why does this regex, using "not" & a backref, require a lazy match?

When using the not ^ operator in combination with a back reference, why do I need to use a lazy match? It seems like the not should break the match.

For example:

<?php
preg_match('/(t)[^\1]*\1/', 'is this test ok', $matches);
echo $matches[0];
?>
开发者_运维问答

Will output this test, instead of this t, in spite of the fact that the middle t does not match [^\1]. I need to use /(t)[^\1]*?\1/ to match this t.

Furthermore

preg_match('/t[^t]*t/', 'is this test ok', $matches);

does match only this t.

What is going on, and what am I misunderstanding?


It doesn't work because the \1 here is not a backreference inside a character class. The \1 is interpreted as the character with ASCII value 1.

You could use a negative lookaround instead to get the effect you want:

'/(t)(?:(?!\1).)*\1/'


You cannot use backreferences inside character classes. [^\1] means "any character other than 1".

Instead, use /(t)(?:(?!\1).)*\1/.

(?:...) is a non-capturing group

(?!...) is a "negative look-ahead", asserting that the subexpression doesn't match

(?!\1)., when \1 is a single character, means "any character that does not match \1

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜