开发者

MySQL: Join table of keywords but bring back only the most popular

I have a table of "Products". I also have a table of user labelled keywords for that product. I want to bring back the top keyword for each product based on how many there are.

The keyword table basically consists of the keyword, a primary key, and a foreign key linking it to the Products table.

I presume I have to join the keyword table in (something like below) but I don't know how to order开发者_C百科 it by the most popular.

Here is the SQL I have already - it currently just brings back any keyword as opposed to the top one.

SELECT product_name,keyword_keyword 
FROM products 
LEFT JOIN keywords ON keyword_pid = product_id
GROUP BY product_id


I am aware this could be done differently and probably more efficiently, but this is how my mind breaks it down:

select a.product_name, b.keyword_keyword, count(*) as keyword_count 
into #temp1
from products a 
join keywords b on a.product_id = b.keyword_pid 
group by a.product_name, b.keyword_keyword

select x.product_name, x.keyword_keyword
from #temp1 x
where x.keyword_count = (select MAX(keyword_count) from #temp1 
                         where product_name = x.product_name)


Here is the progression of SQLs (along with example results) I used to arrive at my proposed solution:

Here are the keyword counts:

SELECT k.*,
       COUNT(k.keyword_keyword)
FROM   keywords k
GROUP  BY k.keyword_pid,
          k.keyword_keyword  

+------------+-------------+-----------------+--------------------------+
| keyword_id | keyword_pid | keyword_keyword | count(k.keyword_keyword) |
+------------+-------------+-----------------+--------------------------+
|          3 |           1 | red             |                        3 | 
|          1 |           1 | widgety         |                        3 | 
|          9 |           2 | curve           |                        1 | 
|         10 |           2 | red             |                        2 | 
|          6 |           2 | screwy          |                        3 | 
|         12 |           3 | red             |                        1 | 
|          7 |           3 | spike           |                        2 | 
+------------+-------------+-----------------+--------------------------+

We need to find the maximum for each (keyword_pid,keyword_keyword) pair. There is a tried and true idiom for this:

SELECT t1.*,
       t2.*
FROM   (SELECT k.*,
               COUNT(k.keyword_keyword) cnt
        FROM   keywords k
        GROUP  BY k.keyword_pid,
                  k.keyword_keyword) t1
       LEFT JOIN (SELECT k.*,
                         COUNT(k.keyword_keyword) cnt
                  FROM   keywords k
                  GROUP  BY k.keyword_pid,
                            k.keyword_keyword) t2
         ON t1.keyword_pid = t2.keyword_pid
            AND t1.cnt < t2.cnt  

Notice, above, that I repeated the same SELECT twice. I'm assuming that MySQL caches the result of the first SELECT, so the second should be very fast. If I'm wrong, I hope someone will disabuse me of the belief.

+------------+-------------+-----------------+-----+------------+-------------+-----------------+------+
| keyword_id | keyword_pid | keyword_keyword | cnt | keyword_id | keyword_pid | keyword_keyword | cnt  |
+------------+-------------+-----------------+-----+------------+-------------+-----------------+------+
|          3 |           1 | red             |   3 |       NULL |        NULL | NULL            | NULL | 
|          1 |           1 | widgety         |   3 |       NULL |        NULL | NULL            | NULL | 
|          9 |           2 | curve           |   1 |         10 |           2 | red             |    2 | 
|          9 |           2 | curve           |   1 |          6 |           2 | screwy          |    3 | 
|         10 |           2 | red             |   2 |          6 |           2 | screwy          |    3 | 
|          6 |           2 | screwy          |   3 |       NULL |        NULL | NULL            | NULL | 
|         12 |           3 | red             |   1 |          7 |           3 | spike           |    2 | 
|          7 |           3 | spike           |   2 |       NULL |        NULL | NULL            | NULL | 
+------------+-------------+-----------------+-----+------------+-------------+-----------------+------+

The rows where t2.cnt is NULL are the rows that contain the maximum count for each (keyword_pid,keyword_keyword) pair (That's part of the idiom for finding the maximum):

SELECT t1.*
FROM   (SELECT k.*,
               COUNT(k.keyword_keyword) cnt
        FROM   keywords k
        GROUP  BY k.keyword_pid,
                  k.keyword_keyword) t1
       LEFT JOIN (SELECT k.*,
                         COUNT(k.keyword_keyword) cnt
                  FROM   keywords k
                  GROUP  BY k.keyword_pid,
                            k.keyword_keyword) t2
         ON t1.keyword_pid = t2.keyword_pid
            AND t1.cnt < t2.cnt
WHERE  t2.cnt IS NULL  

+------------+-------------+-----------------+-----+
| keyword_id | keyword_pid | keyword_keyword | cnt |
+------------+-------------+-----------------+-----+
|          3 |           1 | red             |   3 | 
|          1 |           1 | widgety         |   3 | 
|          6 |           2 | screwy          |   3 | 
|          7 |           3 | spike           |   2 | 
+------------+-------------+-----------------+-----+

The rest is relatively easy. First we join on the products table so we can see what product is associate with what keyword:

SELECT p.*,
       t1.*
FROM   (SELECT k.*,
               COUNT(k.keyword_keyword) cnt
        FROM   keywords k
        GROUP  BY k.keyword_pid,
                  k.keyword_keyword) t1
       LEFT JOIN (SELECT k.*,
                         COUNT(k.keyword_keyword) cnt
                  FROM   keywords k
                  GROUP  BY k.keyword_pid,
                            k.keyword_keyword) t2
         ON t1.keyword_pid = t2.keyword_pid
            AND t1.cnt < t2.cnt
       LEFT JOIN product p
         ON p.product_id = t1.keyword_pid
WHERE  t2.cnt IS NULL  

+------------+--------------+------------+-------------+-----------------+-----+
| product_id | product_name | keyword_id | keyword_pid | keyword_keyword | cnt |
+------------+--------------+------------+-------------+-----------------+-----+
|          1 | widget       |          3 |           1 | red             |   3 | 
|          1 | widget       |          1 |           1 | widgety         |   3 | 
|          2 | screw        |          6 |           2 | screwy          |   3 | 
|          3 | nail         |          7 |           3 | spike           |   2 | 
+------------+--------------+------------+-------------+-----------------+-----+

The above is the solution if you want to key ties. If you want to get rid of ties (randomly), you can do so with another GROUP BY:

SELECT p.*,
       t1.*
FROM   (SELECT k.*,
               COUNT(k.keyword_keyword) cnt
        FROM   keywords k
        GROUP  BY k.keyword_pid,
                  k.keyword_keyword) t1
       LEFT JOIN (SELECT k.*,
                         COUNT(k.keyword_keyword) cnt
                  FROM   keywords k
                  GROUP  BY k.keyword_pid,
                            k.keyword_keyword) t2
         ON t1.keyword_pid = t2.keyword_pid
            AND t1.cnt < t2.cnt
       LEFT JOIN product p
         ON p.product_id = t1.keyword_pid
WHERE  t2.cnt IS NULL
GROUP  BY p.product_id  

+------------+--------------+------------+-------------+-----------------+-----+
| product_id | product_name | keyword_id | keyword_pid | keyword_keyword | cnt |
+------------+--------------+------------+-------------+-----------------+-----+
|          1 | widget       |          3 |           1 | red             |   3 | 
|          2 | screw        |          6 |           2 | screwy          |   3 | 
|          3 | nail         |          7 |           3 | spike           |   2 | 
+------------+--------------+------------+-------------+-----------------+-----+
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜