开发者

How to match a pattern with bracket inside it?

I am trying to look for specific phrase inside large text, but the phrase may contain characters like "[", "(", "*", ... like "name1 (name2", but it causes an invalid exception when looking for it. Here is my code :

Pattern myPattern = Pattern.compile( "\\b" + phrase + "\\b" );  // Exception
Matcher myMatcher = myPattern.matcher( la开发者_如何转开发rgeText );

I have tried to use quote(...) to fix such characters but it didn't work :

phrase = Pattern.quote( phrase );

How can i fix this to allow such characters ?


Pattern.quote(phrase) works just fine:

String largeText = "a()b a()c a()b";
String phrase = "a()b";
Pattern myPattern = Pattern.compile( "\\b" + Pattern.quote(phrase) + "\\b" );
Matcher myMatcher = myPattern.matcher( largeText );
while(myMatcher.find()) {
  System.out.println(myMatcher.group());
}

prints:

a()b
a()b


Process phrase to escape all possible regex metacharacters.


Could you please provide a complete example that reproduces this problem? I've tried the following and it works fine:

String largeText = "large text with name1 (name2) and possibly something more";
String phrase = "name1 (name2";
phrase = Pattern.quote( phrase );
Pattern myPattern = Pattern.compile( "\\b" + phrase + "\\b" );  // Exception
System.out.println("The pattern is " + myPattern.pattern());
Matcher myMatcher = myPattern.matcher( largeText );
if (myMatcher.find()) {
  System.out.println("A match is found: " + myMatcher.group());
}

The output is:

The pattern is \b\Qname1 (name2\E\b
A match is found: name1 (name2


You may want to just use:

int offset = largeText.indexOf(phrase);

to test the existence/offset of a substring.

To use patterns, this should work:

String longString = "this[that]the other* things";
String phrase = "[that]";
Pattern myPattern = Pattern.compile( "\\b" + Pattern.quote(phrase) + "\\b"));
Matcher m = myPattern.matcher(longString);
if (m.find()) {
  System.out.println(m.group());
}

But there's a little problem when using * and ? at the start or end of the phrase.

Those characters are treated like white space characters (not word characters) so if they appear at the beginning or end of a phrase, then to match the boundary they must include all the leading/trailing whitespace.

You may need to special case this by dropping the "\b" if the phrase has those characters at the start or end.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜