开发者

Java regexp error: \( is not a valid character

I was using java regexp today and found that you are not allowed to use the following regexp sequence

String pattern开发者_运维技巧 = "[a-zA-Z\\s\\.-\\)\\(]*";

if I do use it it will fail and tell me that \( is not a valid character.

But if I change the regexp to

String pattern = "[[a-zA-Z\\s\\.-]|[\\(\\)]]*";

Then it will work. Is this a bug in the regxp engine or am I not understanding how to work with the engine?

EDIT: I've had an error in my string: there shouldnt be 2 starting [[, it should be only one. This is now corrected


Your regex has two problems.

  1. You've not closed the character class.

  2. The - is acting as a range operator with . on LHS and ( on RHS. But ( comes before . in unicode, so this results in an invalid range.

To fix problem 1, close the char class or if you meant to not include [ in the allowed characters delete one of the [.

To fix problem 2, either escape the - as \\- or move the - to the beginning or to the end of the char class.

So you can use:

String pattern = "[a-zA-Z\\s\\.\\-\\)\\(]*";

or

String pattern = "[a-zA-Z\\s\\.\\)\\(-]*";

or

String pattern = "[-a-zA-Z\\s\\.\\)\\(]*";


You should only use the dash - at the end of the character class, since it is normally used to show a range (as in a-z). Rearrange it:

String pattern = "[[a-zA-Z\\s\\.\\)\\(-]*";

Also, I don't think you have to escape (.) characters inside brackets.

Update: As others pointed out, you must also escape the [ in a java regex character class.


The problem here is that \.-\) ("\\.-\\)" in a Java string literal) tries to define a range from . to ). Since the Unicode codepoint of . (U+002E) is higher than that of ) (U+0029) this is an error.

Try using this pattern and you'll see: [z-a].

The correct solution is to either put the dash - at the end of the character group (at which point it will lose its special meaning) or to escape it.

You also need to close the unclosed open square bracket or escape it, if it was not intended for grouping.

Also, escaping the fullstop . is not necessary inside a character group.


You have to escape the dash and close the unmatched square bracket. So you are going to get two errors with this regex:

java.util.regex.PatternSyntaxException: Illegal character range near index 14

because the dash is used to specify a range, and \) is obviously a not valid range character. If you escape the dash, making it [[a-zA-Z\s\.\-\)\(]* you'll get

java.util.regex.PatternSyntaxException: Unclosed character class near index 19

which means that you have an extra opening square bracket that is used to specify character class. I don't know what you meant by putting an extra bracket here, but either escaping or removing it will make it a valid regex.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜