开发者

How to identify character's language in Ruby/Rails?

Given a character (one letter of a string), how could I identify to which language it belongs ? The options are: English, Russian, Hebrew.

Background: this character was entered by user in a form and 开发者_开发问答then stored in a database.

It can be for example the first letter in one of these words:

  • Hello
  • Привет
  • שלום


The UNICODE standard is divided into "blocks". Go here:

http://www.unicode.org/charts/

http://en.wikipedia.org/wiki/Unicode_block

http://www.unicode.org/versions/Unicode6.0.0/

and find unicode blocks (intervals) for each language.

My guess:

  • English
  • Hebrew
  • Russian

So for you its the matter of simple number comparsion for each character (unicode ordinal value). Very simple.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜