How to match a pattern with bracket inside it?
I am trying to look for specific phrase inside large text, but the phrase may contain characters like "[", "(", "*", ... like "name1 (name2", but it causes an invalid exception when looking for it. Here is my code :
Pattern myPattern = Pattern.compile( "\\b" + phrase + "\\b" ); // Exception
Matcher myMatcher = myPattern.matcher( la开发者_如何转开发rgeText );
I have tried to use quote(...) to fix such characters but it didn't work :
phrase = Pattern.quote( phrase );
How can i fix this to allow such characters ?
Pattern.quote(phrase)
works just fine:
String largeText = "a()b a()c a()b";
String phrase = "a()b";
Pattern myPattern = Pattern.compile( "\\b" + Pattern.quote(phrase) + "\\b" );
Matcher myMatcher = myPattern.matcher( largeText );
while(myMatcher.find()) {
System.out.println(myMatcher.group());
}
prints:
a()b
a()b
Process phrase to escape all possible regex metacharacters.
Could you please provide a complete example that reproduces this problem? I've tried the following and it works fine:
String largeText = "large text with name1 (name2) and possibly something more";
String phrase = "name1 (name2";
phrase = Pattern.quote( phrase );
Pattern myPattern = Pattern.compile( "\\b" + phrase + "\\b" ); // Exception
System.out.println("The pattern is " + myPattern.pattern());
Matcher myMatcher = myPattern.matcher( largeText );
if (myMatcher.find()) {
System.out.println("A match is found: " + myMatcher.group());
}
The output is:
The pattern is \b\Qname1 (name2\E\b
A match is found: name1 (name2
You may want to just use:
int offset = largeText.indexOf(phrase);
to test the existence/offset of a substring.
To use patterns, this should work:
String longString = "this[that]the other* things";
String phrase = "[that]";
Pattern myPattern = Pattern.compile( "\\b" + Pattern.quote(phrase) + "\\b"));
Matcher m = myPattern.matcher(longString);
if (m.find()) {
System.out.println(m.group());
}
But there's a little problem when using * and ? at the start or end of the phrase.
Those characters are treated like white space characters (not word characters) so if they appear at the beginning or end of a phrase, then to match the boundary they must include all the leading/trailing whitespace.
You may need to special case this by dropping the "\b" if the phrase has those characters at the start or end.
精彩评论