开发者

What's an effective library for suggesting keywords for content?

Currently designing a CMS for use on my website. I am wondering if there were any free libraries available for creating tags based on the content.

Example

I like trees. Trees are plants that have leaves. Leaves on tree can be multi-colored.

Would produce the tags trees and leaves.

The library should be PHP or JS.

EDIT 1:

I have found a simple library for half my task - http://www.cafewebmaster.com/get-top-100-words-keywor开发者_Go百科ds-text-php

I have edited what the library specifications should be (thanks to guidance from @NullUserException)-

  • Count all words (ignoring case and inflections), throw out stop words and pick the ones with the highest frequency

  • Edit text to make words that are more specific to the genre (may have a lower frequency), be of higher value. For example in the example - 'multi-colored' should become higher value because it is more specific to the subject. However it should include a prefix indicating it relates to the subject (it would become leaves-multi-colored).

EDIT 2:

Algorithm should remove words that have less than 3 characters unless they are in capitals or formatted otherwise


Are the tags on your CMS already defined? If yes you could index your text in memory and search using all known tags against your text. Pick the highest scoring tags and present to the user.

Indexing and searching could be done with http://lucene.apache.org/solr/

Edit: Note that I do suggest that your tags/keywords are defined and manageable from an administration panel (like for example in wordpress). Otherwise you'd end up with thousands of keywords generated from your articles which would never help the end user.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜