开发者

How much data is needed for User Based CF or Item Based CF to give recommendation?

How much data is needed for User CF, Item CF to give recommendation?

I've manually created a small dataset, so I can understand well how the algorithm is working.

I found that for the small dataset I created, Slope-One can give a recommendation, User CF or Item CF can not give recommendatio开发者_运维百科n.

What is the reason behind it?

What is the threshold of the data amount ?


In user and item based CF, the size of the data set can be really small. The important part is the frequency of the mapping between the items and the users in the dataset. If a user exists in the dataset only once, user based cf most probably will not give recommendations. Because one common item will not provide the threshold similarity for two users to become neighbors. The above explanation is just an example case. For a small dataset like 1000 data, both recommenders will give answers for the most similar item and recommend methods. However, for much smaller datasets, it is useful to control the data manually whether there is enough info about the queried user/item id or not. In this link you can find a really very small controlled dataset to create an item based CF and how it works. I hope this answer is helpful.


Movielens, netflix, jester, kddcup dataset are all open for everyone. If you have problem getting dataset, check this http://code.google.com/p/recsyscode/wiki/dataset


  1. For small dataset, user CF and item CF maybe the same, but for large data, if user count is larger than item count (e.g. Netflix dataset and yahoo kddcup2011 dataset), item CF is much faster than User CF.

  2. For the result of Top N recommendation, the accuracy of User CF and Item CF are the same,but the coverage are different, User CF recommendation are good for recommending long tail item, while item CF has a better diversity.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜