开发者

MySQL subquery to refer to field in parent query

I am building a query that performs some filtering on rating data.

Suppose I have a simple table called ratings like the following, storing data from an online rating tool:

+----------------+----------------+--------+
| page_title     | timestamp      | rating |
+----------------+----------------+--------+
| Abc            | 20110417092134 | 1      |
| Abc            | 20110418110831 | 2      |
| Def  开发者_开发技巧          | 20110417092205 | 3      |
+----------------+----------------+--------+

I need to extract pages with a high frequency of low values in the latest 10 ratings, and limit this query to pages that produced a volume of at least 20 ratings in the preceding week. This is the ridiculously long query I came up with:

SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg, 
(
     SELECT COUNT(*) FROM
     (
         SELECT * FROM ratings a2 WHERE a2.page_title = a1.page_title 
         AND DATE(timestamp) <= '2011-04-24' ORDER BY timestamp DESC LIMIT 10
     ) 
     AS latest WHERE rating >=1 AND rating <=2 ORDER BY timestamp DESC
)
AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17" 
GROUP BY a1.page_title HAVING COUNT(*) > 20

the top level query looks for pages with more than 20 ratings in the week terminating on 2011-04-24, the subquery is supposed to retrieve the number of ratings with values between [1,2] from the latest 10 ratings of each article from the top level query.

MySQL complains that a1.page_title in the WHERE clause of the subsubquery is an unknown column, I suspect this is because a1 is not defined as an alias in the second-level query, but only in the top-level query, but I am clueless how to fix this.

(edited)

I am adding as an explanation of my suspect above regarding cross-level referencing another query which works absolutely fine, note that here a1 is not defined in the subquery but it is in the immediate parent:

SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg, 
(
    SELECT COUNT(*) FROM ratings a2 WHERE DATE(timestamp) <= '2011-04-24'
    AND DATE(timestamp) >= '2011-04-17' AND rating >=1 
    AND rating <=2 AND a2.page_title = a1.page_title
) AS lowest FROM ratings a1 
WHERE DATE(a1.timestamp) <= '2011-04-17' AND DATE(a1.aa_timestamp) >= '2011-04-11' 
GROUP BY a1.page_title HAVING COUNT(*) > 20


I think you might consider joining two in line views it might make things eaiser.

SELECT * 
FROM   (SELECT COUNT(*), 
               a2.page_title 
        FROM   ratings a2 
        WHERE  DATE(timestamp) <= '2011-04-24' 
               AND DATE(timestamp) >= '2011-04-17' 
               AND rating >= 1 
               AND rating <= 2 

        GROUP  BY a2.page_title) current 
       JOIN 
        (SELECT a1.page_title, 
                    COUNT(*)       AS rvol, 
                    AVG(a1.rating) AS theavg 
             FROM   ratings a1 
             WHERE  DATE(a1.timestamp) <= '2011-04-17' 
                    AND DATE(a1.a_timestamp) >= '2011-04-11' 
             GROUP  BY a1.page_title 
             HAVING COUNT(*) > 20) morethan20 
         ON current .page_title = morethan20.page_title 


If all you have is this one simple table, I have no idea where you are pulling all of these other table names from, such as: a1, a2, ratings. I feel like either your SQL is quite a bit off, or your leaving out information.

The reason your having the error you do is because in your sub-sub-query you do not include a1 in your "FROM" statement... as so that table is not included, it cannot be referenced in your WHERE clause in that sub query.

SELECT * 
FROM
    (SELECT *
        FROM a1
        WHERE a1.timestamp <= (NOW()-604800)
            AND a1.timestamp >= (NOW()-1209600)
        GROUP BY a1.page_title
        HAVING COUNT(a1.page_title)>20)
    AS priorWeekCount
WHERE
    rating <= 2
ORDER BY timestamp DESC
LIMIT 10

as I dont have a full table to test this... I THINK this is what your looking for.. but it is untested, and knowing my coding habits, very rarely is what I type 100% perfect first time ;)


Your analysis of the error is correct: lowest is known in the subquery, a1 is not.

I think the logic is inside-out. The following probably isn't the best but the optimizer may be smart enough to combine the two subqueries in the outermost SELECT. (If it isn't, at the risk of readability you can introduce another level of subquery.)

SELECT r20plus.page_title,
 AVG((SELECT rating 
      FROM ratings r WHERE r.page_title=r20plus.page_title 
      ORDER BY timestamp DESC LIMIT 10) ) as av,
 SUM((SELECT CASE WHEN rating BETWEEN 1 AND 2 THEN 1 ELSE 0 END 
      FROM ratings r WHERE r.page_title=r20plus.page_title
      ORDER BY timestamp DESC LIMIT 10) ) as n_low,
FROM
(SELECT page_title FROM ratings  
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY page_title
HAVING COUNT(rating) >= 20) AS r20plus;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜