开发者

Ruby: is there a stemmer that "knows" English irregular verbs?

开发者_开发知识库There is a ruby stemmer https://github.com/aurelian/ruby-stemmer, but it 1) does not stem English irregular verbs 2) fails to build native extensions on Windows. Is there an alternative that fixes at least one of the problems?


I think you should be searching for a lemmatizer (which has information about morphology and can handle irregular words) rather than a stemmer (which usually just lops off the ends of words). See this explanation in Manning, Raghavan, and Schütze's online book on information retrieval.

I haven't tried it out, but a quick search came across this English lemmatizer for Ruby: elemma.

A commonly-used (non-Ruby) English morphological analyzer that can do lemmatization is morpha.


None of the stemmers are able to handle irregular verbs in English.

  • https://github.com/ealdent/uea-stemmer - pure ruby, well-written, 2009 year, has little docs, but a bit more then others, installs on Windows OK
  • https://github.com/romanbsd/fast-stemmer pure C, difficult to read, ought to be quicker then others (I did not tested performance), from 2009 year, has very minimal docs, installs on Windows OK. it's method has side-effects. be careful to create a copy
  • https://github.com/aurelian/ruby-stemmer 2010 year, it fails to build native extensions on Windows. Can handle som other European languages except English
  • http://rubyforge.org/projects/stemmer pure ruby, has not been updated since 2006, and does not have any documentation, installs OK on Windows, I did not figured out how it works
  • http://rubyforge.org/projects/stemmer4r - no docs, 2005 year. did not try


i found this while googling for ruby based NLP http://mendicantbug.com/2009/09/13/nlp-resources-for-ruby/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜