开发者

Another Java RegEx question

I have the following code:

public static void main(String[] args){
    StringBuilder content = new StringBuilder("abcd efg h i. -  – jk(lmn) qq zz.");
    String patternSource = "[.-–]($| )";
    Pattern pattern = Pattern.compile(patternSource);
    Matcher matcher = pattern.matcher(content);
    System.out.println(matcher.replaceAll(""));
}

where patternSource character class consist of dot, minus sign and \u2013 character (something like long dash). Upon execution in gives me

abcefi-  jk(lmn) qzz

If I change the order of sym开发者_如何学运维bols in my character class in any way, it begans to work normally, and gives

abcd efg h i jk(lmn) qq zz

What the hell?

Tested under JDK/JRE 1.6.0_23


If you have an unescaped hyphen in a character class it has a special meaning as a range of characters: e.g. [A-Z] means all the characters between A and Z.

An exception to this is when the hyphen is at the start or end of the character class, in which case it is treated literally and matches only a hyphen.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜