How to automatically assign a given text to different categories?
I'm working on this project in which we have some categories such as
Beauty Activities Shopping
Categories are tagged, for example some of the tags are:
Beauty => Haircut, spa, manicure, personal trainer
Activities => personal trainer, biking
Shopping => Jewelery, Shirts, Socks
The tags have an order, which denotes to their relevancy to the category, for example Haircut comes first in beauty because a text with the word haircut in it is most likely to be Beauty related,
As you can see "Personal Trainer" tag belongs to more than one category, so if a text has Personal Trainer in it, it could either be related to Beauty or Activities.
I also record how many times each tag has been found in a text, so each tag has a found value in it.
Now when a new text is to be processed, I search for all tags in it and see how many times they have occurred in the given text. The results for a sample text will look like this:
Haircut => 4
personal trainer => 1
manicure => 1
spa => 0
Looking at this we realize that the text should belong开发者_开发知识库 to Beauty.
Now here are my questions: 1- How do we programmatically decide what category this text belongs to by having the given input, and having the array of tags a category is associated with? Is this a good idea? Is there are more elegant way of doing this?
2- Is this a good way of doing this or is there a better algorithm? I was thinking maybe something like lucene or a more intelligent algorithm could come into play when dealing with this.
If you can define classes, method based on Naive Bayes could do the job. It is one of the commonly used classifers.
If you want classes defined by the program automatically, there is nothing working well right now.
精彩评论