Why is my regex failing?
I'm trying create a regex that verifies an xml entity name is valid (see related issue: here).
(:|[A-Z]|_|[a-z]|[\xC0-\xD6]|[\xD8-\xF6]|[\xF8-\x2FF]|[\x370-\x37D]|[\x37F-\x1FFF]|[\x200C-\x200D]|[\x2070-\x218F]|[\x2C00-\x2FEF]|[开发者_JAVA百科\x3001-\xD7FF]|[\xF900-\xFDCF]|[\xFDF0-\xFFFD]|[\x10000-\xEFFFF])
Basically it's verifying that the first character is a valid character. However the token [\xF8-\x2FF]
is bombing out regex validation. Any idea why? I can't figure it out.
UPDATE
The .net parser is throwing an exception that saysrange in reverse order.
You can only use one character per range in a regex and most regex parsers don't understand multiple bytes using the \x
notation. Use the \u
notation instead.
(:|[A-Z]|_|[a-z]|[\xC0-\xD6]|[\xD8-\xF6]|[\xF8-\u02FF]|[\u0370-\u037D]|[\u037F-\u1FFF]|[\u200C-\u200D]|[\u2070-\u218F]|[\u2C00-\u2FEF]|[\u3001-\uD7FF]|[\uF900-\uFDCF]|[\uFDF0-\uFFFD]|[\u10000-\uEFFFF])
The .NET regex documentation states
\x20
Matches an ASCII character using 2-digit hexadecimal. In this case,\x2-
represents a space.
And for unicode:
\u0020
Matches a Unicode character using exactly four hexadecimal digits. In this case\u0020
is a space.
So I've used both above, \x
for the 2-char hex values and \u
for the larger ones.
Because \x2F
is one ASCII character. It is treating [\xF8-\x2FF]
as a match between \xF8-\x2F
(an invalid range) or the character F
.
Use \u
for unicode: [\u00F8-\u02FF]
精彩评论