Determining language of twitter posts
What is the best way to determine the language of twitter posts.
There is the language parameter that comes with the streaming API but it doesn't really seem to开发者_高级运维 be very accurate. Even many Japanese posts are labelled as English.
What have others done to sort out the langauges?
I've had very good results with this PHP package: http://pear.php.net/package/Text_LanguageDetect/
It is fast and open source. We use it to select English only posts for a site we run at http://2012twit.com.
google have language detection within their Translate API if using evil external services is a go-er?
http://code.google.com/apis/language/translate/v1/reference.html#detectResult
精彩评论