Creating a 'People who viewed this also viewed' list
I'm thinking of creating a 'People who viewed this also viewed' list that you see on amazon, yelp and other online sites. Right now I'm thinking of creating a new table with 'product_id', 'last_viewed_product_id', 'hits' where when a user goe开发者_JAVA百科s from a page for product_id=100 to product_id=101, it will create/update this table with product_id=101, last_viewed_product_id=100, and increment the 'hits' value. Are there better methods that are more optimized and less computationally intensive?
Best I'm aware, the "tricks" used by Amazon to make things less computationally intensive is to a) use bayesian stats/averages and b) compute partial aggregates. The latter allows you to not need to count everything (you can instead sum pre-computed aggregates). The former allows you to inject what you infer will be related material.
It seems you're going on the right path - a few suggestions -
For computationally intensive - you probably want to cache your results, so you'll only give out a top 'x' number which is updated once a day or similar to that effect. Real time does not seem significant in this case.
I'm not sure what sort of products you have on your site, but if the variety is significant, you might only want to put items that have related information to show up (so Star Wars would only have Star Wars related items popping up).
So if you have "tags" for your products, or keywords, you may want to use a relationship with that.
You may also want to create a weight on how they got to a product. If they got to the product by clicking on that list that you provided, then those type of items will continue to populate, and not give other products a chance to show up, so give it a low weight. The heavier items would pop up instead.
If you have user IDs for all of your visitors (you can create temporary ones for unregistered users), you can create a history table with columns user_id and product_id, which stores all of the products users have visited. Then when a user opens a product, do a query that searches for the user_ids that have viewed that product recently and then join it to the products those users have opened. Then, just sort the products by which have been opened the most by those user_ids.
Make sure to cache this as the join would slow down any SQL server.
I'm pretty sure that Amazon uses Association Rules for this.
The seminal paper:
http://dl.acm.org/citation.cfm?id=170072
The fast algorithm (FP-Growth):
http://link.springer.com/chapter/10.1007/3-540-47887-6_34#page-1
Haven't seen a PHP library, but there are for Java, Python.
精彩评论