Architecture & Essential Components of StumbleUpon's Recommendation Engine
I would like to know how stumbleupon recommends articles for its users?.
Is it using a neural network or some sort of machine-learning algorithms or is it actually recommending articles based on what the user 'liked' or is it simply recommending articles based on the tags in the interests area?. With tags I mean, using something开发者_JAVA技巧 like item-based collaborative filtering etc.?
First, i have no inside knowledge of S/U's Recommendation Engine. What i do know, i've learned from following this topic for the last few years and from studying the publicly available sources (including StumbleUpon's own posts on their company Site and on their Blog), and of course, as a user of StumbleUpon.
I haven't found a single source, authoritative or otherwise, that comes anywhere close to saying "here's how the S/U Recommendation Engine works", still given that this is arguably the most successful Recommendation Engine ever--the statistics are insane, S/U accounts for over half of all referrals on the Internet, and substantially more than facebook, despite having a fraction of the registered users that facebook has (800 million versus 15 million); what's more S/U is not really a site with a Recommendation Engine, like say, Amazon.com, instead the Site itself is a Recommendation Engine--there is a substantial volume of discussion and gossip among the fairly small group of people who build Recommendation Engines such that if you sift through this, i think it's possible to reliably discren the types of algorithms used, the data sources supplied to them, and how these are connected in a working data flow.
The description below refers to my Diagram at bottom. Each step in the data flow is indicated by a roman numeral. My description proceeds backwards--beginning with the point at which the URL is delivered to the user, hence in actual use step I occurs last, and step V, first.
salmon-colored ovals => data sources
light blue rectangles => predictive algorithms
I. A Web Page recommended to an S/U user is the last step in a multi-step flow
II. The StumbleUpon Recommendation Engine is supplied with data (web pages) from three distinct sources:
web pages tagged with topic tags matching your pre-determined Interests (topics a user has indicated as interests, and which are available to view/revise by clicking the "Settings" Tab on the upper right-hand corner of the logged-in user page);
socially Endorsed Pages (*pages liked by this user's Friends*); and
peer-Endorsed Pages (*pages liked by similar users*);
III. Those sources in turn are results returned by StumbleUpon predictive algorithms (Similar Users refers to users in the same cluster as determined by a Clustering Algorithm, which is perhaps k-means).
IV. The data used fed to the Clustering Engine to train it, is comprised of web pages annotated with user ratings
V. This data set (web pages rated by StumbleUpon users) is also used to train a Supervised Classifier (e.g., multi-layer perceptron, support-vector machine) The output of this supervised classifier is a class label applied to a web page not yet rated by a user.
The single best source i have found which discussed SU's Recommendation Engine in the context of other Recommender Systems is this BetaBeat Post.
精彩评论