Why is my regex failing?
I'm trying create a regex that verifies an xml entity name is valid (see related issue: here).
(:|[A-Z]|_|[a-z]|[\xC0-\xD6]|[\xD8-\xF6]|[\xF8-\x2FF]|[\x370-\x37D]|[\x37F-\x1FFF]|[\x200C-\x200D]|[\x2070-\x218F]|[\x2C00-\x2FEF]|[开发者_JAVA百科\x3001-\xD7FF]|[\xF900-\xFDCF]|[\xFDF0-\xFFFD]|[\x10000-\xEFFFF])
Basically it's verifying that the first character is a valid character. However the token [\xF8-\x2FF] is bombing out regex validation. Any idea why? I can't figure it out.
UPDATE
The .net parser is throwing an exception that saysrange in reverse order.You can only use one character per range in a regex and most regex parsers don't understand multiple bytes using the \x notation. Use the \u notation instead.
(:|[A-Z]|_|[a-z]|[\xC0-\xD6]|[\xD8-\xF6]|[\xF8-\u02FF]|[\u0370-\u037D]|[\u037F-\u1FFF]|[\u200C-\u200D]|[\u2070-\u218F]|[\u2C00-\u2FEF]|[\u3001-\uD7FF]|[\uF900-\uFDCF]|[\uFDF0-\uFFFD]|[\u10000-\uEFFFF])
The .NET regex documentation states
\x20Matches an ASCII character using 2-digit hexadecimal. In this case,\x2-represents a space.
And for unicode:
\u0020Matches a Unicode character using exactly four hexadecimal digits. In this case\u0020is a space.
So I've used both above, \x for the 2-char hex values and \u for the larger ones.
Because \x2F is one ASCII character. It is treating [\xF8-\x2FF] as a match between \xF8-\x2F (an invalid range) or the character F.
Use \u for unicode: [\u00F8-\u02FF]
加载中,请稍侯......
精彩评论