Regex Dark Corners in Java... order of chars alters regex meaning?
I recently came across some odd behavior that involves Java's regex engine.
When writing some validation, I needed to add square brackets to my regex, like so:
"[^a-zA-Z0-9_/.@ ]" // original expression
"[^a-zA-Z0-9_/.@开发者_运维百科 /]/[]" // first modificiation
However... this implementation failed. After experimentation, I discovered that it would then work if I moved the space char
to the end.
"[^a-zA-Z0-9_/.@/]/[ ]" // final working modification
Now the calling code that used this expression used the String.replaceAll(String, String)
method, as listed here.
My question is... does anyone have any good technical idea on why the placing of the space alters the meaning of this regex? It really shouldn't matter.
[EDITED]
From the comments and answers--this is an example where using the built-in String method leads to incorrect behavior that is NOT caught. My Runtime environment does NOT complain at all even though if you read the documentation on String.replaceAll(String, String)
it clearly states that it is the same functionality as Pattern.compile(regex).matcher(str).replaceAll(repl)
I think I will file a bug.
You use the wrong escaping character, it's \
and not /
.
Also, I'm not sure if you wanted your character group to include /
and .
or if you thought that .
needs to be escaped in character groups (it doesn't need to be escaped: it always represents the literal .
in character groups).
When trying to compile [^a-zA-Z0-9_/.@ /]/[]
it gives this exception:
java.util.regex.PatternSyntaxException: Unclosed character class near index 20 [^a-zA-Z0-9_/.@ /]/[] ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.clazz(Pattern.java:2254) at java.util.regex.Pattern.sequence(Pattern.java:1818) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:823)
This indicates that there is a problem with the character class at that point. And in fact: you've got an empty character class []
which is not valid!
[^a-zA-Z0-9_/.@ /]/[]
means "a character not matching (a-z, A-Z, 0-9, _
, /
, .
, @
, or
/
), followed by a slash /
followed by <fails to compile because it is malformed>".
What you want is probably [^a-zA-Z0-9_.@ \]\[]
which is "a character not matching a-z, A-Z, 0-9, _
, .
, @
, ,
]
or [
".
If you write it in a String
literal remember to double the \
(because they have special meanings in String
literals as well!):
Pattern regex = Pattern.compile("[^a-zA-Z0-9_.@ \\]\\[]");
精彩评论