Negative lookahead Regular Expression

2023-03-23 04:56 问答作者：

I want to match all strings ending in ".htm" unless it ends in "foo.htm". I'm generally decent with regular expressions, but negative lookaheads have me stumped. Why doesn't this work?

/(?!foo)\.htm$/i.test("/foo.htm");  // returns true. I want false.

What should I be using instead? I think I need a "negative lookbehind" expressio开发者_运维百科n (if JavaScript supported such a thing, which I know it doesn't).

The problem is pretty simple really. This will do it:

/^(?!.*foo\.htm$).*\.htm$/i.test("/foo.htm"); // returns false

What you are describing (your intention) is a negative look-behind, and Javascript has no support for look-behinds.

Look-aheads look forward from the character at which they are placed — and you've placed it before the .. So, what you've got is actually saying "anything ending in .htm as long as the first three characters starting at that position (.ht) are not foo" which is always true.

Usually, the substitute for negative look-behinds is to match more than you need, and extract only the part you actually do need. This is hacky, and depending on your precise situation you can probably come up with something else, but something like this:

// Checks that the last 3 characters before the dot are not foo:
/(?!foo).{3}\.htm$/i.test("/foo.htm"); // returns false

As mentioned JavaScript does not support negative look-behind assertions.

But you could use a workaroud:

/(foo)?\.htm$/i.test("/foo.htm") && RegExp.$1 != "foo";

This will match everything that ends with .htm but it will store "foo" into RegExp.$1 if it matches foo.htm, so you can handle it separately.

Like Renesis mentioned, "lookbehind" is not supported in JavaScript, so maybe just use two regexps in combination:

!/foo\.htm$/i.test(teststring) && /\.htm$/i.test(teststring)

Probably this answer has arrived just a little bit later than necessary but I'll leave it here just in case someone will run into the same issue now (7 years, 6 months after this question was asked).

Now lookbehinds are included in ECMA2018 standard & supported at least in last version of Chrome. However, you might solve the puzzle with or without them.

A solution with negative lookahead:

let testString = `html.htm app.htm foo.tm foo.htm bar.js 1to3.htm _.js _.htm`;

testString.match(/\b(?!foo)[\w-.]+\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]

A solution with negative lookbehind:

testString.match(/\b[\w-.]+(?<!foo)\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]

A solution with (technically) positive lookahead:

testString.match(/\b(?=[^f])[\w-.]+\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]

etc.

All these RegExps tell JS engine the same thing in different ways, the message that they pass to JS engine is something like the following.

Please, find in this string all sequences of characters that are:

Separated from other text (like words);
Consist of one or more letter(s) of english alphabet, underscore(s), hyphen(s), dot(s) or digit(s);
End with ".htm";
Apart from that, the part of sequence before ".htm" could be anything but "foo".

String.prototype.endsWith (ES6)

console.log( /* !(not)endsWith */

    !"foo.html".endsWith("foo.htm"), // true
  !"barfoo.htm".endsWith("foo.htm"), // false (here you go)
     !"foo.htm".endsWith("foo.htm"), // false (here you go)
   !"test.html".endsWith("foo.htm"), // true
    !"test.htm".endsWith("foo.htm")  // true

);

You could emulate the negative lookbehind with something like /(.|..|.*[^f]..|.*f[^o].|.*fo[^o])\.htm$/, but a programmatic approach would be better.

继续阅读：javascript regex regex-lookarounds

Negative lookahead Regular Expression

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？