开发者

Writing a basic recommendation engine [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 7 years ago.

Improve this question

I'm looking to write a basic recommendation engine that will take and stor开发者_JAVA百科e a list of numeric IDs (which relate to books), compare those to other users with a high volume of identical IDs and recommend additional books based on those finds.

After a bit of Googling, I've found this article, which discusses an implementation of a Slope One algorithm, but seems to rely on users rating the items being compared. Ideally, I'd like to achieve this without the need for users to provide ratings. I'm assuming that if the user has this book in their collection, they are fond of it.

While it strikes me that I could default a rating of 10 for each book, I'm wondering if there's a more efficient algorithm I could be using. Ideally I'd like to calculate these recommendations on the fly (avoiding batch calculation). Any suggestions would be appreciated.


A basic algorithm for your task is a collaborative memory-based recommender system. It's quite easy to implement, especially when your items (in your case books) just have IDs and no other features.

But, as you already said, you need some kind of rating from the users for the items. But don't think of a rating like in 1 to 5 stars, but more like a binary choice like 0 (book not read) and 1 (book read), or interested in or not interested in.

Then use an appropriate distance measure to calculate the difference between all users (and their sets of items) and yourself, select the n most similar users to yourself (of whoever the active user is) and pick out their items you haven't rated (or considered, choice 0).

I think in this case, a good distance measure would be the 1-norm distance, or sometimes called the Manhattan distance. But this is a point where you have to experiment with your dataset to get the best results.

A nice introduction to this topic is the paper by Breese et al., Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Available here (PDF). For an research paper, it's an easy read.


The Apriori algorithm can give you recommendations based on what set of items is interesting to the user. You have to define your own notion of interesting set, e.g. items that the user has bought in a single order, items that the user has ever bought, items that the user has commented favorably, items that a user has explored in detail.

The Apriori algorithm requires batch processing, but improvements exist that might not require batch processing. These are AprioriTid and AprioriHybrid (sorry, no link).


@ndg That is very insightful and as someone who works in this area I think you're right in using what amounts to a ~ {0,1} rating system. Most of the differences in star ratings are just noise. You can allow {0,1,2} with a "love it!" button but again users are inconsistent in their use of such buttons so it can be good to limit choice. Hotpot lets users have 10 super-plus-loves which keeps it consistent.

My advice is to be careful about painting in too broad of brushstrokes. In other words a universal algorithm is simplest but you miss the opportunity to be opportunistic.

Take a smallish data set you are very familiar with -- like get some of your friends to use the site -- and note all of the factors that could have a positive or negative influence on user-distance ratings. Then in the modelling process you have to decide which factors and how / how much.

Keep in mind that the number of norms is about the size of the number of curves. And you might want to consider a quasinorm, pseudonorm, or even non-continuous norms.

I don't see any reason to use the Manhattan norm, in fact I would use graph-based norms to calculate the distance between users.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜