Negative lookahead regex not working
input1="caused/VBN by/IN thyroid disorder"
Requirement: find word "caused"
that is followed by slash followed by any number of capital alphabets -- and not followed by space + "by/IN
.
In the example above "caused开发者_开发知识库/VBN"
is followed by " by/IN"
, so 'caused' should not match.
input2="caused/VBN thyroid disorder"
"by/IN"
doesn't follow caused, so it should match
regex="caused/[A-Z]+(?![\\s]+by/IN)"
caused/[A-Z]+
-- word 'caused' + / + one or more capital letters
(?![\\s]+by)
-- negative lookahead - not matching space and by
Below is a simple method that I used to test
public static void main(String[] args){
String input = "caused/VBN by/IN thyroid disorder";
String regex = "caused/[A-Z]+(?![\\s]+by/IN)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()){
System.out.println(matcher.group());
}
Output: caused/VB
I don't understand why my negative lookahead regex is not working.
You need to include a word boundary in your regular expression:
String regex = "caused/[A-Z]+\\b(?![\\s]+by/IN)";
Without it you can get a match, but not what you were expecting:
"caused/VBN by/IN thyroid disorder"; ^^^^^^^^^ this matches because "N by" doesn't match "[\\s]+by"
The character class []+ match will be adjusted (via backtracking) so that the lookahead will match.
What you have to do is stop the backtracking so that the expression []+ is fully matched.
This can be done a couple of different ways.
A positive lookahead, followed by a consumption
"caused(?=(/[A-Z]+))\\1(?!\\s+by/IN)"
A standalone sub-expression
"caused(?>/[A-Z]+)(?!\\s+by/IN)"
A possesive quantifier
"caused/[A-Z]++(?!\\s+by/IN)"
精彩评论