开发者

How do I get tags/keywords from a webpage/feed?

I have to build a tag cloud out of a webpage/feed. Once you get the word frequency table of tags, it's easy to build the tagcloud. But my doubt is how do I retrieve the tags/keywords from the webpage/feed?

This is what I'm doing now:

Get the content -> strip HTML -> split them with \s\n\t(space,newline,开发者_运维百科tab) -> Keyword list

But this does not work great.

Is there a better way?


What you have is a rough 1st order approximation. I think if you then go back through the data and search for frequency of 2-word phrases, then 3 word phrases, up till the total number of words that can be considered a tag, you'll get a better representation of keyword frequency.

You can refine this rough search pattern by specifying certain words that can be contained as part of a phrase (pronouns ect).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜