开发者

What would be regex for matching foreign characters?

I am dealing with developing and Application for European Client and they have their native character set.

Now I need to have regex which would allow foreign characters like eéèêë etc开发者_C百科 and am not sure of how this can be done.

Any Suggestions ?


If all you want to match is letters (including "international" letters) you can use \p{L}.

You can find some information on regex and Unicode here.


If you want to match any Latin character with an accent or diacritic mark in virtually any regular expressions engine, try:

[A-Za-zŽžÀ-ÿ]

It matches any character in the "Printable and Extended ASCII Character" sets following:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
ŽžÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

Matches {char} (ASCII character index, case sensitive):

char(s) index(start) index(end)
[A-Z] 65 90
[a-z] 97 122
Ž 142 ---
ž 158 ---
[À-ÿ] 192 255

Test it at https://regex101.com/r/Xbbtm1/1


\p{L} isn't cross-browser yet. Transpiling down from this will give you massively bloated code if you use it a lot.

Here is a short and sweet answer to generally including non-ascii letters that doesn't add a gazillion lines of JavaScript or plugins. Replace a-zA-Z0-9 or \w in your regex with this, and don't use the u flag:

\u00BF-\u1FFF\u2C00-\uD7FF\w

This inserted into all my JavaScript regexes in place of a-zA-Z0-9 or \w, seems to do the job. My context was in the discerning of UTF-8 in HTML and CSS, and it had to be cross-browser.

I can't believe it is this simple, so am waiting to be proved wrong, after a day's searching of trying to get something to work in Firefox...

I've only tested this using Japanese hirigana with a french accent.


[e\xE8\xE9\xEA\xEB] will match any one of eéèêë

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜