Greek characters, Regular Expressions, and C#

2022-12-23 17:50 问答作者：

I'm building a CMS for a scientific journal and that uses a lot of Greek characters. I need to validate a field to include a spe开发者_开发知识库cific character set and Greek characters. Here's what I have now:

[^a-zA-Z0-9-()/\s]

How do I get this to include Greek characters in addition to alphanumeric, '(', ')', '-', and '_'?

I'm using C#, by the way.

In .NET languages, you can use \p{IsGreekandCoptic} to match Greek characters. So the resulting regex is

[^a-zA-Z0-9-()/\s\p{IsGreekandCoptic}]

\p{IsGreekandCoptic} matches:

These characters will be matched by \p{IsGreekandCoptic} http://img203.imageshack.us/img203/3760/greekcoptic.png

If you're using a language that uses PCRE for regular expressions and UTF-8, /[\x{0374}-\x{03FF}]+/u should match Greek characters. Greek characters fall between U+0374 and U+03FF (source), and the u modifier tells PCRE to use unicode. As commented below, /\p{Greek}+/u works as well with PCRE.

If you're using Javascript, it uses \uXXXX instead of \x{XXXX}: /[\u0374-\u03FF]+/.

Also see this guide to Unicode Regular Expressions for more information.

For Java, from the Pattern javadoc:

\p{InGreek} A character in the Greek block (simple block)

Being my first response on SO, I can't downvote Daniel's answer on javascript regex.

I know this is very late, but Daniel's answer is incorrect. It excludes the ancient characters below! This is important if you're working on a Bible app that researches words in ancient Greek!

This is the correct regex for finding greek & coptic in js:

/[\u0370-\u03FF]+/gm

http://unicode.org/charts/PDF/U0370.pdf

Excerpt from chart:

0370 Ͱ GREEK CAPITAL LETTER HETA → 2C75 Ⱶ latin capital letter half h

0371 ͱ GREEK SMALL LETTER HETA → 2C76 ⱶ latin small letter half h

0372 Ͳ GREEK CAPITAL LETTER ARCHAIC SAMPI

0373 ͳ GREEK SMALL LETTER ARCHAIC SAMPI

EDIT: Craig points out that Daniel's regex is correct for the OP. While I can't find where the OP specifies which Greek text he's evaluating, I'll concede that my response is only valid for ancient texts.

While I'm editing this, I want to also point out that no regex here matches Greek characters with the kind of accenting that Perseus adds to their texts. So if you happen to install the http://www.perseus.tufts.edu/hopper/, or use any of their public domain resources in an app, be careful with my regex.

继续阅读：internationalization regex unicode utf-8

Greek characters, Regular Expressions, and C#

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？