Java regexp error: \( is not a valid character
I was using java regexp today and found that you are not allowed to use the following regexp sequence
String pattern开发者_运维技巧 = "[a-zA-Z\\s\\.-\\)\\(]*";
if I do use it it will fail and tell me that \( is not a valid character.
But if I change the regexp to
String pattern = "[[a-zA-Z\\s\\.-]|[\\(\\)]]*";
Then it will work. Is this a bug in the regxp engine or am I not understanding how to work with the engine?
EDIT: I've had an error in my string: there shouldnt be 2 starting [[, it should be only one. This is now corrected
Your regex has two problems.
You've not closed the character class.
The
-
is acting as a range operator with.
on LHS and(
on RHS. But(
comes before.
in unicode, so this results in an invalid range.
To fix problem 1, close the char class or if you meant to not include [
in the allowed characters delete one of the [
.
To fix problem 2, either escape the -
as \\-
or move the -
to the beginning or to the end of the char class.
So you can use:
String pattern = "[a-zA-Z\\s\\.\\-\\)\\(]*";
or
String pattern = "[a-zA-Z\\s\\.\\)\\(-]*";
or
String pattern = "[-a-zA-Z\\s\\.\\)\\(]*";
You should only use the dash -
at the end of the character class, since it is normally used to show a range (as in a-z
). Rearrange it:
String pattern = "[[a-zA-Z\\s\\.\\)\\(-]*";
Also, I don't think you have to escape (.)
characters inside brackets.
Update: As others pointed out, you must also escape the [
in a java regex character class.
The problem here is that \.-\)
("\\.-\\)"
in a Java string literal) tries to define a range from .
to )
. Since the Unicode codepoint of .
(U+002E) is higher than that of )
(U+0029) this is an error.
Try using this pattern and you'll see: [z-a]
.
The correct solution is to either put the dash -
at the end of the character group (at which point it will lose its special meaning) or to escape it.
You also need to close the unclosed open square bracket or escape it, if it was not intended for grouping.
Also, escaping the fullstop .
is not necessary inside a character group.
You have to escape the dash and close the unmatched square bracket. So you are going to get two errors with this regex:
java.util.regex.PatternSyntaxException: Illegal character range near index 14
because the dash is used to specify a range, and \) is obviously a not valid range character. If you escape the dash, making it [[a-zA-Z\s\.\-\)\(]*
you'll get
java.util.regex.PatternSyntaxException: Unclosed character class near index 19
which means that you have an extra opening square bracket that is used to specify character class. I don't know what you meant by putting an extra bracket here, but either escaping or removing it will make it a valid regex.
精彩评论