开发者

{IsGreek}, is not working for me, nor [\w{0391}-\w{03D2}]

I am bulding an asp.net web application and I want a reg ex for my RegularExpressionValidator for Greek alphabet only. But nothing seems to work for me. I've googled fairly enough till now, but开发者_运维技巧 nothing has helped me. Any ideas?

<asp:Label runat="server" AssociatedControlID="FirstName" Text="Όνομα:"></asp:Label>
<asp:TextBox runat="server" ID="FirstName"></asp:TextBox>
<asp:RequiredFieldValidator runat="server" ID="FirstNameEmptyFieldValidator" Text="*" ControlToValidate="FirstName"></asp:RequiredFieldValidator>
<asp:RegularExpressionValidator runat="server" ValidationExpression="\p{IsGreek}" ID="GreekRegularExpressionValidator" ControlToValidate="FirstName" ErrorMessage="Το κείμενο εμπεριέχει μη ελληνικούς χαρακτήρες"></asp:RegularExpressionValidator>


As this question/answer alludes to, the regex is processed by javascript in the browser, and by the .NET regex class on the back end.

So you have to use a regex that is compatible with both, or you could disable client side validation.


[Α-Ωα-ωάέήίόύ]+

I just used this reg ex and works just fine!!!! But I want to be honest, I don't know if there are any drawbacks using that approach.


Be aware that .NET uses \p{IsGreek} to mean \p{Block:Greek_and_Coptic}, not to mean \p{Script=Greek}, which is what you really need. This is deceptive, because it makes you think .NET can handle script types, but it can’t.

Scant few regex engines apart from Perl and PCRE understand the script Unicode properties, like \p{IsGreek} or just plain \p{Greek}. Javascript and .NET are not among those that handle scripts. Java will understand script properties come JDK7, but not yet; you could use ICU in Java for that, though.

This page says that .NET handles the \p{InBlock} properties, though. So you could construct a character class like this:

[\p{InGreek}\p{InGreekAndCoptic}\p{InGreekExtended}\p{InAncientGreekMusicalNotation}]

However, that will still miss these code points, which are all considered to be in the Greek script:

 ᴦ  7462 1D26 GREEK LETTER SMALL CAPITAL GAMMA
 ᴧ  7463 1D27 GREEK LETTER SMALL CAPITAL LAMDA
 ᴨ  7464 1D28 GREEK LETTER SMALL CAPITAL PI
 ᴩ  7465 1D29 GREEK LETTER SMALL CAPITAL RHO
 ᴪ  7466 1D2A GREEK LETTER SMALL CAPITAL PSI
 ᵝ  7517 1D5D MODIFIER LETTER SMALL BETA
 ᵞ  7518 1D5E MODIFIER LETTER SMALL GREEK GAMMA
 ᵟ  7519 1D5F MODIFIER LETTER SMALL DELTA
 ᵠ  7520 1D60 MODIFIER LETTER SMALL GREEK PHI
 ᵡ  7521 1D61 MODIFIER LETTER SMALL CHI
 ᵦ  7526 1D66 GREEK SUBSCRIPT SMALL LETTER BETA
 ᵧ  7527 1D67 GREEK SUBSCRIPT SMALL LETTER GAMMA
 ᵨ  7528 1D68 GREEK SUBSCRIPT SMALL LETTER RHO
 ᵩ  7529 1D69 GREEK SUBSCRIPT SMALL LETTER PHI
 ᵪ  7530 1D6A GREEK SUBSCRIPT SMALL LETTER CHI
 ᶿ  7615 1DBF MODIFIER LETTER SMALL THETA
 Ω  8486 2126 OHM SIGN

Besides the false negatives it misses that are listed immediately above, the bracketed character class for any of the four Greekish blocks given above also yields these false positives:

 ʹ   884 0374 GREEK NUMERAL SIGN
 ;   894 037E GREEK QUESTION MARK
 ΅   901 0385 GREEK DIALYTIKA TONOS
 ·   903 0387 GREEK ANO TELEIA
 Ϣ   994 03E2 COPTIC CAPITAL LETTER SHEI
 ϣ   995 03E3 COPTIC SMALL LETTER SHEI
 Ϥ   996 03E4 COPTIC CAPITAL LETTER FEI
 ϥ   997 03E5 COPTIC SMALL LETTER FEI
 Ϧ   998 03E6 COPTIC CAPITAL LETTER KHEI
 ϧ   999 03E7 COPTIC SMALL LETTER KHEI
 Ϩ  1000 03E8 COPTIC CAPITAL LETTER HORI
 ϩ  1001 03E9 COPTIC SMALL LETTER HORI
 Ϫ  1002 03EA COPTIC CAPITAL LETTER GANGIA
 ϫ  1003 03EB COPTIC SMALL LETTER GANGIA
 Ϭ  1004 03EC COPTIC CAPITAL LETTER SHIMA
 ϭ  1005 03ED COPTIC SMALL LETTER SHIMA
 Ϯ  1006 03EE COPTIC CAPITAL LETTER DEI
 ϯ  1007 03EF COPTIC SMALL LETTER DEI

That’s why listing the blocks simply is not good enough: you really do need a proper \p{Script=Greek} or \p{IsGreek} property. Unfortunately, for now you'd have to change programming languages to get it, which I’m sure isn’t possible.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜