PatternSyntaxException in non-latin locales
I've got a regex that was working perfectly fine until I switched my locale to 'fa' (Persian). I suspect this would happen with Hebrew and Arabic too (not yet sure if it's the characters or the RTL direction that makes it break).
The line of code causing the exception is:
public static final Pattern NAME_REGEX = Pattern.compile(String.format("^[\\w ]{%d,%d}$", 2,24));
(the syntax is fine, it works in English & Spanish) but when the app tries to compile the regex in the 'incompatible' locales, I get the following:
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:605)
at dalvik.system.NativeStart.main(Native Method)
Caused by: java.util.regex.PatternSyntaxException: Syntax error U_REGEX_BAD_INTERVAL near index 8:开发者_Go百科
^[\w ]{٢,٢٤}$
^
at java.util.regex.Pattern.compileImpl(Native Method)
at java.util.regex.Pattern.compile(Pattern.java:400)
at java.util.regex.Pattern.<init>(Pattern.java:383)
at java.util.regex.Pattern.compile(Pattern.java:374)
at com.airg.hookt.config.airGConstant.<clinit>(airGConstant.java:131)
Any help would be appreciated. Thanks
Looks like you're trying to specify the interval using Arabic-Indic digits (U+0660..U+0669
); I would have been very surprised if that had worked. I've never heard of a regex flavor that accepts anything but ASCII digits as part of the regex itself.
Are you also expecting \w
to match letters/digits from Persian, Hebrew, and Arabic scripts? That won't work either, but this time it's because of a shortcoming in Java's regex flavor. If you want to match characters from any writing system, you need to use Unicode properties like \p{L}
and \p{N}
(but see here for more details).
ANSWER
So ... the problem was indeed the String.format
Changing
public static final Pattern NAME_REGEX = Pattern.compile(String.format("^[\\w ]{%d,%d}$", 2,24));
to
public static final Pattern NAME_REGEX = Pattern.compile("^[\\w ]{" + 2 + "," + 24 + "}$");
fixed the crash. Thanks to everyone for their contribution.
精彩评论