开发者

Determining language of twitter posts

What is the best way to determine the language of twitter posts.

There is the language parameter that comes with the streaming API but it doesn't really seem to开发者_高级运维 be very accurate. Even many Japanese posts are labelled as English.

What have others done to sort out the langauges?


I've had very good results with this PHP package: http://pear.php.net/package/Text_LanguageDetect/

It is fast and open source. We use it to select English only posts for a site we run at http://2012twit.com.


google have language detection within their Translate API if using evil external services is a go-er?

http://code.google.com/apis/language/translate/v1/reference.html#detectResult

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜