开发者

Open source options for non-English term extraction?

I am looking for a open source project that开发者_如何学C does term extraction with multiple languages.

I have already found Yahoo BOSS Term Extraction Web Service, and it is good. However, it does not handle languages other than English.

Are there any open source term extraction projects that support more languages?

Thanks!


From the packages I've used in production or just played around with, the following were the most comprehensive and most actively maintained:

  1. GATE - A computer architecture for a broad range of Natural Language Processing tasks, available under the GNU Public License

  2. Ling-Pipe (Java) - A suite of Java libraries for the linguistic analysis of human language which can link entity mentions to database entries, uncover relations, cluster documents, ...

  3. OpenNLP (Java) - Java machine learning toolkit for natural language processing (NLP). It supports the most common NLP tasks.

  4. NLTK (Python) - NLTK is a leading platform for building Python programs to work with human language data.

  5. Proxem Antelope (.Net) - Advanced Natural Language Object-oriented Processing Environement

  6. Scala-NLP (Scala)

  7. Stanford NLP (Java)

Also, there are some good web APIs, such as:

  1. Zemanta

  2. Open-Calais


GATE - General Architecture for Text Engineering: http://gate.ac.uk/

Will do term extraction, keyword sorting and selection, sentiment analysis, all that good stuff.

Open-source, free, from the UK. Does a whole host of languages, including Arabic.


You can try Linnaeus -- it is kind of directed to extract species names from scientific papers, but I think you can give it your own dictionaries, and use for other domains/tasks.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜