Open source options for non-English term extraction?
I am looking for a open source project that开发者_如何学C does term extraction with multiple languages.
I have already found Yahoo BOSS Term Extraction Web Service, and it is good. However, it does not handle languages other than English.
Are there any open source term extraction projects that support more languages?
Thanks!
From the packages I've used in production or just played around with, the following were the most comprehensive and most actively maintained:
GATE - A computer architecture for a broad range of Natural Language Processing tasks, available under the GNU Public License
Ling-Pipe (Java) - A suite of Java libraries for the linguistic analysis of human language which can link entity mentions to database entries, uncover relations, cluster documents, ...
OpenNLP (Java) - Java machine learning toolkit for natural language processing (NLP). It supports the most common NLP tasks.
NLTK (Python) - NLTK is a leading platform for building Python programs to work with human language data.
Proxem Antelope (.Net) - Advanced Natural Language Object-oriented Processing Environement
Scala-NLP (Scala)
Stanford NLP (Java)
Also, there are some good web APIs, such as:
Zemanta
Open-Calais
GATE - General Architecture for Text Engineering: http://gate.ac.uk/
Will do term extraction, keyword sorting and selection, sentiment analysis, all that good stuff.
Open-source, free, from the UK. Does a whole host of languages, including Arabic.
You can try Linnaeus -- it is kind of directed to extract species names from scientific papers, but I think you can give it your own dictionaries, and use for other domains/tasks.
精彩评论