开发者

Similarity function for Mahout boolean user-based recommender

I am using Mahout to build a user-based recommendation system which operates with boolean data.

I use GenericBooleanPrefUserBasedRecommender, NearestNUserNeighborhood and now trying to decide about the most suitable user similarity function.

It was suggested to use either LogLikelihoodSimilarity or TanimotoCoefficientSimilarity. I tried both and am getting [subjectively evaluated] meaningful results in both cases. However the RMSE rating for the same data set is better the LogLikehoo开发者_开发百科d. The number of "no recommendation" is similar in both case.

Can anyone recommend which of these similarity function is most suitable for this case?


(I'm the developer.) If I was stranded on a desert island with just one similarity metric for data without ratings/prefs, it would be log-likelihood. I would generally expect it to be the better similarity metric.

The problem with the test you're doing is that, perhaps not at all obviously, it's not meaningful for this kind of recommender / data. RMSE is root-mean-square-error, and it's comparing the actual vs predicted rating for held-out test data. But you have no ratings. They're all "1.0". Really, RMSE is always 0!

It's not, since to have anything to rank on, these recommenders will rank by some meaningful function of the similarities. But they are not estimating ratings / prefs at all. So, RMSE means squat here.

The only metric you can really use is a precision/recall test in this case, I think. Even that is problematic. This and more fun topics are covered in a book which I will shamelessly promote: Mahout in Action

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜