开发者

Matching weighted tags in a closest-first manner

Bit of an open-ended, how would you approach this type of situation, question.

I'm building a system in which the user is asked to select any number of items from a list of categories. For each category they select, they are asked to assign a weight to it (a value or 1-100 of importance). I guess the best way of describing these user-categories is weighted tags. So, I might really enjoy eating bananas, that gets 100, where as apples I quite enjoy, gets 50. I hate plums, so I don't select that.

Certain other entities in the system will be doing exactly the same and will have their own set of tags, each with a weight. In the above scenario, an item may be a "Farm", and their output of each type of fruit is the weighting values. What I want to find is the best matching farms for the user's taste in fruits (for example). This may look something like:

User A: [Tag1: 100, Tag2: 50, Tag4: 10]

Item A: [Tag2: 40, Tag3: 20]

Item B: [Tag1: 100, Tag2: 50, Tag4: 10]

Item C: [Tag3: 20, Tag4: 5]

In this situation, Item B is obviously a perfect match for User A, so would be top of the result set. What I really want, is a system that can order the items in decreasing relevance against a specific user.

I've toyed around with SQL and NoSQL (redis) implementations, attempting a solution, but e开发者_StackOverflow中文版ach time, I find myself iterating through a rather large dataset and doing basic math against each tag in each item to calculate the overall difference. Whilst this works, it's going to be slow, and if we're talking about a system with thousonds of "Items", I'd imagine this would be brought to it's knees fairly quickly.

I can't think of a way to implement this directly in SQL, given that there two many-to-many style relationships involved across three entities (Item, User, Category/Tag). I can't even begin to wrap my head around how the weighting values from the ajoining tables User-Category and Item-Category could be compared in SQL to produce a final output.

I guess what I'm asking for, is a few ideas at how to even approach this idea.

Cheers John


The problem you're trying to solve looks related to the nearest neighbor problem, which for tagged data like you've mentioned can be solved using a variety of data structures. I'm not much of a SQL person, but I bet that if you search for nearest-neighbor algorithms you will find something that looks like what you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜