MySQL: Join table of keywords but bring back only the most popular
I have a table of "Products". I also have a table of user labelled keywords for that product. I want to bring back the top keyword for each product based on how many there are.
The keyword table basically consists of the keyword, a primary key, and a foreign key linking it to the Products table.
I presume I have to join the keyword table in (something like below) but I don't know how to order开发者_C百科 it by the most popular.
Here is the SQL I have already - it currently just brings back any keyword as opposed to the top one.
SELECT product_name,keyword_keyword
FROM products
LEFT JOIN keywords ON keyword_pid = product_id
GROUP BY product_id
I am aware this could be done differently and probably more efficiently, but this is how my mind breaks it down:
select a.product_name, b.keyword_keyword, count(*) as keyword_count
into #temp1
from products a
join keywords b on a.product_id = b.keyword_pid
group by a.product_name, b.keyword_keyword
select x.product_name, x.keyword_keyword
from #temp1 x
where x.keyword_count = (select MAX(keyword_count) from #temp1
where product_name = x.product_name)
Here is the progression of SQLs (along with example results) I used to arrive at my proposed solution:
Here are the keyword counts:
SELECT k.*,
COUNT(k.keyword_keyword)
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword
+------------+-------------+-----------------+--------------------------+
| keyword_id | keyword_pid | keyword_keyword | count(k.keyword_keyword) |
+------------+-------------+-----------------+--------------------------+
| 3 | 1 | red | 3 |
| 1 | 1 | widgety | 3 |
| 9 | 2 | curve | 1 |
| 10 | 2 | red | 2 |
| 6 | 2 | screwy | 3 |
| 12 | 3 | red | 1 |
| 7 | 3 | spike | 2 |
+------------+-------------+-----------------+--------------------------+
We need to find the maximum for each (keyword_pid,keyword_keyword)
pair.
There is a tried and true idiom for this:
SELECT t1.*,
t2.*
FROM (SELECT k.*,
COUNT(k.keyword_keyword) cnt
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword) t1
LEFT JOIN (SELECT k.*,
COUNT(k.keyword_keyword) cnt
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword) t2
ON t1.keyword_pid = t2.keyword_pid
AND t1.cnt < t2.cnt
Notice, above, that I repeated the same SELECT
twice. I'm assuming that MySQL caches the result of the first SELECT
, so the second should be very fast.
If I'm wrong, I hope someone will disabuse me of the belief.
+------------+-------------+-----------------+-----+------------+-------------+-----------------+------+
| keyword_id | keyword_pid | keyword_keyword | cnt | keyword_id | keyword_pid | keyword_keyword | cnt |
+------------+-------------+-----------------+-----+------------+-------------+-----------------+------+
| 3 | 1 | red | 3 | NULL | NULL | NULL | NULL |
| 1 | 1 | widgety | 3 | NULL | NULL | NULL | NULL |
| 9 | 2 | curve | 1 | 10 | 2 | red | 2 |
| 9 | 2 | curve | 1 | 6 | 2 | screwy | 3 |
| 10 | 2 | red | 2 | 6 | 2 | screwy | 3 |
| 6 | 2 | screwy | 3 | NULL | NULL | NULL | NULL |
| 12 | 3 | red | 1 | 7 | 3 | spike | 2 |
| 7 | 3 | spike | 2 | NULL | NULL | NULL | NULL |
+------------+-------------+-----------------+-----+------------+-------------+-----------------+------+
The rows where t2.cnt is NULL
are the rows that contain the maximum count for each (keyword_pid,keyword_keyword)
pair (That's part of the idiom for finding the maximum):
SELECT t1.*
FROM (SELECT k.*,
COUNT(k.keyword_keyword) cnt
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword) t1
LEFT JOIN (SELECT k.*,
COUNT(k.keyword_keyword) cnt
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword) t2
ON t1.keyword_pid = t2.keyword_pid
AND t1.cnt < t2.cnt
WHERE t2.cnt IS NULL
+------------+-------------+-----------------+-----+
| keyword_id | keyword_pid | keyword_keyword | cnt |
+------------+-------------+-----------------+-----+
| 3 | 1 | red | 3 |
| 1 | 1 | widgety | 3 |
| 6 | 2 | screwy | 3 |
| 7 | 3 | spike | 2 |
+------------+-------------+-----------------+-----+
The rest is relatively easy. First we join on the products table so we can see what product is associate with what keyword:
SELECT p.*,
t1.*
FROM (SELECT k.*,
COUNT(k.keyword_keyword) cnt
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword) t1
LEFT JOIN (SELECT k.*,
COUNT(k.keyword_keyword) cnt
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword) t2
ON t1.keyword_pid = t2.keyword_pid
AND t1.cnt < t2.cnt
LEFT JOIN product p
ON p.product_id = t1.keyword_pid
WHERE t2.cnt IS NULL
+------------+--------------+------------+-------------+-----------------+-----+
| product_id | product_name | keyword_id | keyword_pid | keyword_keyword | cnt |
+------------+--------------+------------+-------------+-----------------+-----+
| 1 | widget | 3 | 1 | red | 3 |
| 1 | widget | 1 | 1 | widgety | 3 |
| 2 | screw | 6 | 2 | screwy | 3 |
| 3 | nail | 7 | 3 | spike | 2 |
+------------+--------------+------------+-------------+-----------------+-----+
The above is the solution if you want to key ties.
If you want to get rid of ties (randomly), you can do so with another GROUP BY
:
SELECT p.*,
t1.*
FROM (SELECT k.*,
COUNT(k.keyword_keyword) cnt
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword) t1
LEFT JOIN (SELECT k.*,
COUNT(k.keyword_keyword) cnt
FROM keywords k
GROUP BY k.keyword_pid,
k.keyword_keyword) t2
ON t1.keyword_pid = t2.keyword_pid
AND t1.cnt < t2.cnt
LEFT JOIN product p
ON p.product_id = t1.keyword_pid
WHERE t2.cnt IS NULL
GROUP BY p.product_id
+------------+--------------+------------+-------------+-----------------+-----+
| product_id | product_name | keyword_id | keyword_pid | keyword_keyword | cnt |
+------------+--------------+------------+-------------+-----------------+-----+
| 1 | widget | 3 | 1 | red | 3 |
| 2 | screw | 6 | 2 | screwy | 3 |
| 3 | nail | 7 | 3 | spike | 2 |
+------------+--------------+------------+-------------+-----------------+-----+
精彩评论