开发者

Detect CJK characters in PHP

I've got an input box that allows UTF8 characters -- can I detect whether the characters are in Chinese, Japanese, or Kor开发者_高级运维ean programmatically (part of some Unicode range, perhaps)? I would change search methods depending on if MySQL's fulltext searching would work (it won't work for CJK characters).

Thanks!


// is chinese, japanese or korean language
function isCjk($string) {
    return isChinese($string) || isJapanese($string) || isKorean($string);
}

function isChinese($string) {
    return preg_match("/\p{Han}+/u", $string);
}

function isJapanese($string) {
    return preg_match('/[\x{4E00}-\x{9FBF}\x{3040}-\x{309F}\x{30A0}-\x{30FF}]/u', $string);
}

function isKorean($string) {
    return preg_match('/[\x{3130}-\x{318F}\x{AC00}-\x{D7AF}]/u', $string);
}


CJK characters are restricted to certain Unicode Blocks. You need to check the characters if they are inside these blocks, and should consider surrogates (32bit characters) too.


Do you want to detect whether a character is a (Chinese or Japanese or Korean) character? Or do you want to tell Chinese characters apart from Japanese characters? The former is easy; the latter is in many cases impossible, due to Han Unification.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜