开发者

Detect English verb tenses using NLTK

I am looking for a way given an English text count verb phrases in it in past, present and future tenses. For now I am usi开发者_StackOverflow社区ng NLTK, do a POS (Part-Of-Speech) tagging, and then count say 'VBD' to get past tenses. This is not accurate enough though, so I guess I need to go further and use chunking, then analyze VP-chunks for specific tense patterns. Is there anything existing that does that? Any further reading that might be helpful? The NLTK book is focused mostly on NP-chunks, and I can find quite few info on VP-chunks.


Thee exact answer depends on which chunker you intend to use, but list comprehensions will take you a long way. This gets you the number of verb phrases using a non-existent chunker.

len([phrase for phrase in nltk.Chunker(sentence) if phrase[1] == 'VP'])

You can take a more fine-grained approach to detect numbers of tenses.


You can do this with either the Berkeley Parser or Stanford Parser. But I don't know if there's a Python interface available for either.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜