开发者

Are there any good summarizers for a web-page?

Suppose I give you a URL...can you analyze the words and spit out the "k开发者_如何学Goeywords" of that page? (besides using meta-tags)

Are there good open-source summarizers out there? (preferably Python)


A simple text summarizer: http://pythonwise.blogspot.com/2008/01/simple-text-summarizer.html

Algorithm:

1. For each word, calculate it's frequency in the document
2. For each sentence in the document 
      score(sentence) = sum([freq(word) for word in sentence])
3. Print X top sentences such that their size < MAX_SUMMARY_SIZE


Frequency counts will get you some of the way but Natural Language Processing will provide better results as it uses linguistic techniques to provide more accuracy.

Topia.termextract uses a Parts-Of-Speech (POS) tagging algorithm and is available from PyPi http://pypi.python.org/pypi/topia.termextract/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜