开发者

How can I detect non-western characters?

I want to disallow certain UTF-8 input (server-side), e.g. eastern languages, where example input might be " 伊 ".

However, I do want to continue supporting other latin or "latin-like" characters, such as the welsh ŵ and ŷ, so checking against latin-1 is not possible.

What are my options? (if language specific, PHP preferred)

Thanks very much.


Reasoning: browser support for a lot of non-western characters is often missing (e.g. on a 开发者_如何学Godifferent browser I just see a box in the question above), so for things like display names sometimes it's appropriate to restrict it even if it's not appropriate for message bodies


Just do

preg_match('/[^\\p{Common}\\p{Latin}]/u', $string)

where $string is an UTF-8 string. This will return "1" if there are non-latin characters and will return "0" otherwise.

Example:

var_dump(preg_match('/[^\\p{Common}\\p{Latin}]/u', 'sf..ŷaás??'));  //int(0)
var_dump(preg_match('/[^\\p{Common}\\p{Latin}]/u', 'sf..ŷݤaás??')); //int(1)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜