开发者

JavaScript regular expression to catch kanji

I can't get this javascript function to work the way I want...

// matches a String that contains kanji and/or kana character(s)

String.prototype.isKanjiKana = function(){
    return !!this.match(/^开发者_开发技巧[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]+$/);
}

it does return TRUE if the string is made of kanji and/or kana characters, FALSE if alphabet or other chars are present.

I would like it to return if at least 1 kanji and/or kana characters are present instead that if all of them are.

thank you in advance for any help!


The right answer is not to hardcode ranges. Never ever put magic numbers in your code! That is a maintenance nightmare. It is hard to read, hard to write, hard to debug, hard to maintain. How do you know you got the numbers right? What happens when they add new ones? No, do not use magic numbers. Please.

The right answer is to use named Unicode scripts, which are a fundemental aspect of every Unicode code point:

[\p{Han}\p{Hiragana}\p{Katakana}]

That requires the XRegExp plugin for Javascript.

The real problem is that Javascript regexes on their own are too primitive to support Unicode properties — and therefore, to support Unicode. Maybe that was once an acceptable compromise 15 years ago, but today it is nothing less than intolerably negligent, as you yourself have discovered.

You will also miss a few Common code points specified as kana in the new Script Extensions property, but probably no matter. You could just add \p{Common} to the set above.


Now that Unicode property escapes are part of the ES (2018) spec, the following regex can be used natively if the JS engine supports this feature (expanding on @tchrist's answer):

/[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u

If you want to exclude punctuation from being matched:

/(?!\p{Punctuation})[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u


/[\u3000-\u303f]|[\u3040-\u309f]|[\u30a0-\u30ff]|[\uff00-\uffef]|[\u4e00-\u9faf]|[\u3400-\u4dbf]/
  • Japanese style punctuation: [\u3000-\u303f]
  • Hiragana: [\u3040-\u309f]
  • Katakana: [\u30a0-\u30ff]
  • Roman characters + half-width katakana: [\uff00-\uffef]
  • Kanji: [\u4e00-\u9faf]|[\u3400-\u4dbf]


String.prototype.isKanjiKana = function(){
    return !!this.match(/[\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF]/);
}

Don't anchor it to beginning and end of string with $^ and the + is useless in this case.


/[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]/


Why not just this? It will return true when it contains at least one Kanji.

/[一-龯]/.test(str)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜