开发者

Tag Cloud Data Backend

I want to be able to generate tag clouds from free text that comes from any number of different sources. For clarity, I'm not talking about how to display a tag cloud once the critical tags/phrases are already discovered, I'm hoping to be able to discover the meaningful phrases themselves... preferable on a PHP/MySQL stack.

If I had to do this myself, I'd start by establishing some kind of index for words/phrases that gives a "normal" frequency for any word/phrase. eg "Constantinople" occurs once in every 1,000,000 words on average (normal frequency "0.000001"). Then as I analyze a body of text, I'd find the individual words/开发者_如何学编程phrases (another challenge!), find frequencies of each within the input, and measure against the expected freqeuncy. Words that have the highest ratio against expected frequency get boosted priority in the cloud.

I'd like to believe someone else has already done this, WAY better than I could hope to, but I'll be damned if I can find it.

Any recommendations??


You need an inverted index, used by full-text search engines. A text search library like Lucene or Xapian should help, many such libraries have PHP bindings.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜