How to identify character's language in Ruby/Rails?
Given a character (one letter of a string), how could I identify to which language it belongs ? The options are: English, Russian, Hebrew.
Background: this character was entered by user in a form and 开发者_开发问答then stored in a database.
It can be for example the first letter in one of these words:
- Hello
- Привет
- שלום
The UNICODE standard is divided into "blocks". Go here:
http://www.unicode.org/charts/
http://en.wikipedia.org/wiki/Unicode_block
http://www.unicode.org/versions/Unicode6.0.0/
and find unicode blocks (intervals) for each language.
My guess:
- English
- Hebrew
- Russian
So for you its the matter of simple number comparsion for each character (unicode ordinal value). Very simple.
精彩评论