MySQL subquery to refer to field in parent query
I am building a query that performs some filtering on rating data.
Suppose I have a simple table called ratings
like the following, storing data from an online rating tool:
+----------------+----------------+--------+ | page_title | timestamp | rating | +----------------+----------------+--------+ | Abc | 20110417092134 | 1 | | Abc | 20110418110831 | 2 | | Def 开发者_开发技巧 | 20110417092205 | 3 | +----------------+----------------+--------+
I need to extract pages with a high frequency of low values in the latest 10 ratings, and limit this query to pages that produced a volume of at least 20 ratings in the preceding week. This is the ridiculously long query I came up with:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM
(
SELECT * FROM ratings a2 WHERE a2.page_title = a1.page_title
AND DATE(timestamp) <= '2011-04-24' ORDER BY timestamp DESC LIMIT 10
)
AS latest WHERE rating >=1 AND rating <=2 ORDER BY timestamp DESC
)
AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY a1.page_title HAVING COUNT(*) > 20
the top level query looks for pages with more than 20 ratings in the week terminating on 2011-04-24, the subquery is supposed to retrieve the number of ratings with values between [1,2] from the latest 10 ratings of each article from the top level query.
MySQL complains that a1.page_title in the WHERE clause of the subsubquery is an unknown column, I suspect this is because a1 is not defined as an alias in the second-level query, but only in the top-level query, but I am clueless how to fix this.
(edited)
I am adding as an explanation of my suspect above regarding cross-level referencing another query which works absolutely fine, note that here a1 is not defined in the subquery but it is in the immediate parent:
SELECT a1.page_title, COUNT(*) AS rvol, AVG(a1.rating) AS theavg,
(
SELECT COUNT(*) FROM ratings a2 WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17' AND rating >=1
AND rating <=2 AND a2.page_title = a1.page_title
) AS lowest FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17' AND DATE(a1.aa_timestamp) >= '2011-04-11'
GROUP BY a1.page_title HAVING COUNT(*) > 20
I think you might consider joining two in line views it might make things eaiser.
SELECT *
FROM (SELECT COUNT(*),
a2.page_title
FROM ratings a2
WHERE DATE(timestamp) <= '2011-04-24'
AND DATE(timestamp) >= '2011-04-17'
AND rating >= 1
AND rating <= 2
GROUP BY a2.page_title) current
JOIN
(SELECT a1.page_title,
COUNT(*) AS rvol,
AVG(a1.rating) AS theavg
FROM ratings a1
WHERE DATE(a1.timestamp) <= '2011-04-17'
AND DATE(a1.a_timestamp) >= '2011-04-11'
GROUP BY a1.page_title
HAVING COUNT(*) > 20) morethan20
ON current .page_title = morethan20.page_title
If all you have is this one simple table, I have no idea where you are pulling all of these other table names from, such as: a1, a2, ratings. I feel like either your SQL is quite a bit off, or your leaving out information.
The reason your having the error you do is because in your sub-sub-query you do not include a1 in your "FROM" statement... as so that table is not included, it cannot be referenced in your WHERE clause in that sub query.
SELECT *
FROM
(SELECT *
FROM a1
WHERE a1.timestamp <= (NOW()-604800)
AND a1.timestamp >= (NOW()-1209600)
GROUP BY a1.page_title
HAVING COUNT(a1.page_title)>20)
AS priorWeekCount
WHERE
rating <= 2
ORDER BY timestamp DESC
LIMIT 10
as I dont have a full table to test this... I THINK this is what your looking for.. but it is untested, and knowing my coding habits, very rarely is what I type 100% perfect first time ;)
Your analysis of the error is correct: lowest
is known in the subquery, a1 is not.
I think the logic is inside-out. The following probably isn't the best but the optimizer may be smart enough to combine the two subqueries in the outermost SELECT. (If it isn't, at the risk of readability you can introduce another level of subquery.)
SELECT r20plus.page_title,
AVG((SELECT rating
FROM ratings r WHERE r.page_title=r20plus.page_title
ORDER BY timestamp DESC LIMIT 10) ) as av,
SUM((SELECT CASE WHEN rating BETWEEN 1 AND 2 THEN 1 ELSE 0 END
FROM ratings r WHERE r.page_title=r20plus.page_title
ORDER BY timestamp DESC LIMIT 10) ) as n_low,
FROM
(SELECT page_title FROM ratings
WHERE DATE(a1.timestamp) <= "2011-04-24" AND DATE(a1.timestamp) >= "2011-04-17"
GROUP BY page_title
HAVING COUNT(rating) >= 20) AS r20plus;
精彩评论