Text mining: when to use parser, tagger, NER tool?

2023-01-04 09:50 问答作者：

I'm doing a project on mining blog contents and I need help differentiating on which tool to uses. When do I use a parser, when do I use a tagger, and when do I need to use a NER tool?

For instance, I wa开发者_如何学Gont to find out the most talked about topics/subjects between several blogs; do I use a part-of-speech tagger to grab the nouns and do a frequency count? That would probably be insufficient because very generic terms can pop up right? Or do I have a list of categories and these synonyms that I can match on?

BTW, I'm using nltk, but am looking at stanford tagger or parser since a couple of dudes said that it was good.

Instead of trying to reinvent the wheel, you might want to read up on Topic Models, which basically creates clusters of words that frequently occur together. Mallet has a readily available toolkit for doing such a task: http://mallet.cs.umass.edu/topics.php .

To answer your original question, POS tagger, parsers, and NER tools are not typically used for topic identification, but are more heavily used for tasks like information extraction where the goal is to identify within a document the specific actors, events, locations, times, etc... For example if you had a simple sentence like "John gave the apple to Mary." you might use a dependency parser to figure out that John is the subject, the apple is the object, and Mary is the prepositional object; thus you know John is the giver and Mary is the receiver and not vice-versa.

继续阅读：nltk python

Text mining: when to use parser, tagger, NER tool?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？