开发者

Perl regexp /(\r\n|\r|\n)/

I want to know how this ambiguous pattern is solved in perl(more generally everything that use libpcre):

/(\r\n|\r|\n)/

Wh开发者_Go百科en the pattern sees \r\n will it match one time or twice? And what is the rules face to this situation?

Thanks


It will match \r\n once because Perl uses a regex-directed engine which evaluates alternations eagerly. See here.

You can easily find out whether the regex flavor you intend to use has a text-directed or regex-directed engine. If backreferences and/or lazy quantifiers are available, you can be certain the engine is regex-directed. You can do the test by applying the regex regex|regex not to the string regex not. If the resulting match is only regex, the engine is regex-directed. If the result is regex not, then it is text-directed. The reason behind this is that the regex-directed engine is "eager".


It will try and match the pipe-separated alternatives in order from left to right. Thus the first alternative will match the entire string "\r\n", and there will only be one match. There's no ambiguity here.


...perl (more generally everything that use libpcre)

Possible misconception here: Perl does not "use libpcre". The PCRE library is a separate project that came along after Perl, and mimics much of Perl's regex functionality. PHP and ActionScript use libpcre, but most "Perl-derived" flavors (like Python, Java, and .NET) implement their regex support natively.

But they all share the trait in question here: they settle for the first alternative that works, rather than hold out for the longest match as a text-directed engine would.


It'll match it once. More here: http://technocage.com/~caskey/dos2unix/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜