Javascript unicode (greek) regular expressions

2023-02-25 11:44 问答作者：

I would like to use this regular expression new RegExp("\b"+pat+"\b") in greek text but the "\b" metacharacter supports only ASCII characters.

I tried XregExp library but i didnt manage to solve the issue.

Any suggestions 开发者_运维技巧would be greatly appreciated.

I think this was helpful to your answer.,

<script src="xregexp.js"></script>
<script src="xregexp-unicode-base.js"></script>
<script>
    var unicodeWord = XRegExp("^\\p{L}+$");

    unicodeWord.test("Русский"); // true
    unicodeWord.test("日本語"); // true
    unicodeWord.test("العربية"); // true
</script>

<!-- \p{L} is included in the base script, but other categories, scripts,
and blocks require token packages -->
<script src="xregexp-unicode-scripts.js"></script>
<script>
    XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true
</script>

Please refer the following location : http://xregexp.com/plugins/

So the answer is just, that you can not use the JavaScript native mechanisms or any library which uses those mechanisms to match words the way you want to. As you already stated, \b matches words. Words must consists of word characters. And in JavaScript (and actually other regex implementations word characters are a-z, A-Z, 0-9 and _. But many other Languages just implement the \b metacharacter in a different way JavaScript does.

The answer "JavaScript does not support Unicode" is a bit to easy and in fact completely wrong. JavaScript just doesn't use unicode for the character classes. If JavaScript wouldn't support unicode you couldn't even use unicode Characters in String literals and of course this is possible in JavaScript.

According to the ECMA 262 Standard (ECMAScript) (Section 15.10.2.6):

[...] The production Assertion :: \ b evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following:

Let e be x's endIndex.
Call IsWordChar(e–1) and let a be the Boolean result.
Call IsWordChar(e) and let b be the Boolean result.
If a is true and b is false, return true.
If a is false and b is true, return true.
Return false. [..]

The abstract operation IsWordChar takes an integer parameter e and performs the following:

If e == –1 or e == InputLength, return false.
Let c be the character Input[e].
If c is one of the sixty-three characters below, return true. a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _
Return false

This just shows, that the \b uses the Algorithm of "isWordChar" to check if what you try to match is actually a word. Int he definition of "isWordChar" you can see the exact definition of which characters will return true for "isWordChar".

In my Opinion this has absolutely nothing to do with the character set being used. It's neither ASCII nor UNICODE compilant here. It's just these 63 characters.

继续阅读：character-properties javascript regex unicode xregexp

Javascript unicode (greek) regular expressions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？