开发者

How to detect the language of a given text

In my Rails 3 application, users may write messages in 开发者_开发知识库forum. I would like to identify what the language is for a given message. I'm interested in English, Russian, and Hebrew languages. Is there any built-in library in Ruby/Rails for such a task? If not, any ideas will be appreciated.


Use this: https://github.com/nashby/wtf_lang

"ruby is so awesome!".lang # => "en"
"ruby is so awesome!".full_lang # => "ENGLISH"


You can use the api provided by google to guess it with google translate.

See here for documentation : http://code.google.com/apis/language/translate/v1/using_rest_langdetect.html


Since you're concerned with languages with different character sets you could dig up the character codes that are predominantly in your strings. You could then see if they fall into the code sets that represent hebrew / cryllic characters.


Perhaps you could look at the whatlanguage gem?


Take a look at this blog
http://blog.kenweiner.com/2008/04/server-side-language-detection-with.html
This may be helpful


Language Detection API provides Ruby GEM to detect language.


Just a quick demo of WhatLanguage for anyone interested : http://www.youtube.com/watch?v=lNqZ2cqOReo&list=UUJ_3fstMOH-g4yBxtvgAWkw&index=0&feature=plcp


http://rubygems.org/gems/prose Prose dose it without a gem. Try it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜