Recommendation system experimentation

2022-12-12 19:35 问答作者：

I'm reading the book Programming Collective Intelligence and trying to apply what I've learned against the Northwind database. Though I'm not yet confident in my understanding of the algorithms presented, I'm starting to get an idea of the general concepts.

Using the Northwind database, I'm attempting to show a list of "customers who bought this also bought XYZ" using the following pseudo-logic:

Find other customers who also purchased my item
Find all other items purchased by those customers
Rank items based on purchase count
Return top N items from previous step

I'm working with the query below:

declare 
    @customerid nchar(5),
    @productid int;

set @开发者_如何学Ccustomerid = 'ALFKI';
set @productid = 59;

-- find other products from customers who
-- also purchased my productid

select top 10
    od.productid, c.categoryname, p.productname, p.unitsonorder, count(od.productid)
from
    [order details] od
        inner join orders o on o.orderid = od.orderid
        inner join products p on p.productid = od.productid
        inner join categories c on c.categoryid = p.categoryid
where
    o.customerid <> @customerid and
    od.productid <> @productid and 
    p.discontinued = 0
group by
    od.productid, c.categoryname, p.productname, p.unitsonorder
order by 5 desc,4 desc

I think my next step would be to break the query up so that I can filter based on recent purchases (as opposed to all historical purchases) and limit customer matching to N customers, rather than ALL customers who purchased my product. Can anyone provide any pointers? Am I headed in the right direction? Should I take a different direction entirely?

At this point, my goal is performance over accuracy, as I know I do not yet have the experience to apply the algorithms to their fullest benefit. I'm merely trying to apply the concept. Once I'm satisfied I understand it well enough, I intend to test this query against a larger database with more realistic customer data.

This is usually called "market basket" analysis or "affinity grouping" and it is not as simple as it sounds, mostly due to large number of combinations. First thing to consider is what is the minimum number of cases that would result in a positive detection. For example let's say we have 1M baskets and 10 people who bought product A also bought product B in the same basket. Is 10 cases enough to say that "people who bought A also bought B"? Due to this, using "recent" purchases is a bit tricky. The idea is to create a table of product pairs, but if we have N products, number of combinations is N*(N-1), so for a store of 1000 product, we would have 999000 combinations, so algorithm which prunes this is not simple.

One more thing to consider is the order of items and value of each one. For example customers buying bicycles may often buy (add) a "LED light". But, if one puts a LED light in a basket, should the system offer a bicycle in the list?

Considering that you are using SQL server, I would point you towards Analysis Services Data Mining which uses "MS Association Algorithm" for this purpose. By using same data, you can compare your results against a "commercial solution".

继续阅读：collective-intelligence recommendation-engine sql-server

Recommendation system experimentation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？