开发者

Negative lookahead regex not working

input1="caused/VBN by/IN thyroid disorder"

Requirement: find word "caused" that is followed by slash followed by any number of capital alphabets -- and not followed by space + "by/IN.

In the example above "caused开发者_开发知识库/VBN" is followed by " by/IN", so 'caused' should not match.

input2="caused/VBN thyroid disorder" 

"by/IN" doesn't follow caused, so it should match

regex="caused/[A-Z]+(?![\\s]+by/IN)"

caused/[A-Z]+ -- word 'caused' + / + one or more capital letters

(?![\\s]+by) -- negative lookahead - not matching space and by

Below is a simple method that I used to test

public static void main(String[] args){
    String input = "caused/VBN by/IN thyroid disorder";

    String regex = "caused/[A-Z]+(?![\\s]+by/IN)";

    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);

    while(matcher.find()){
        System.out.println(matcher.group());
    }

Output: caused/VB

I don't understand why my negative lookahead regex is not working.


You need to include a word boundary in your regular expression:

String regex = "caused/[A-Z]+\\b(?![\\s]+by/IN)";

Without it you can get a match, but not what you were expecting:

"caused/VBN by/IN thyroid disorder";
 ^^^^^^^^^
 this matches because "N by" doesn't match "[\\s]+by"


The character class []+ match will be adjusted (via backtracking) so that the lookahead will match.

What you have to do is stop the backtracking so that the expression []+ is fully matched.
This can be done a couple of different ways.

  1. A positive lookahead, followed by a consumption
    "caused(?=(/[A-Z]+))\\1(?!\\s+by/IN)"

  2. A standalone sub-expression
    "caused(?>/[A-Z]+)(?!\\s+by/IN)"

  3. A possesive quantifier
    "caused/[A-Z]++(?!\\s+by/IN)"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜