Mysql limit rows per group weird results
I wanted to get the latest 4 dates for each symbolid
. I adapted the code here as follows:
set @num := 0, @symbolid := '';
select symbolid, date,
@num := if(@symbolid = symbolid, @num + 1, 1) as row_number,
@symbolid := symbolid as dummy
from projections
group by symbolid, date desc
having row_number < 5
and get the following results:
symbolid date row_number dummy
1 '2011-09-01 00:00:00' 1 1
1 '2011-08-31 00:00:00' 3 1
1 '2011-08-30 00:00:00' 5 1
2 '2011-09-01 00:00:00' 1 2
2 '2011-08-31 00:00:00' 3 2
2 '2011-08-30 00:00:00' 5 2
3 '2011-09-01 00:00:00' 1 3
3 '2011-08-31 00:00:00' 3 3
3 '2011-08-30 00:00:00' 5 3
4 '2011-09-01 00:00:00' 1 4
...
The obvious question is, why did I only get 3 rows per symbolid
, and why are they numbered 1,3,5? A few details:
- I tried both forcing an index and not (as seen here), and got the same results both ways.
- The dates are correct, i.e., the listing correctly shows the top 3 dates per
symbolid
, but the row_number value is off - When I don't use the "having" statement, the row numbers are correct, i.e., the most recent date is 1, the next most recent is 2, etc
Obviously the row_开发者_如何学Gonumber
computed field is being affected by the "having" clause, but I don't know how to fix it.
I realize that I could just change the "having" to "having row_number < 7" (6 gives the same as 5), but it's very ugly and would like to know what to do to make it "behave".
I'm not 100% sure why it behaves this way (maybe it's because logically SELECT
is processed prior to ORDER BY
), but it should work as expected:
SELECT *
FROM
(
select symbolid, date,
@num := if(@symbolid = symbolid, @num + 1, 1) as row_number,
@symbolid := symbolid as dummy
from projections
INNER JOIN (SELECT @symbolid:=0)c
INNER JOIN (SELECT @num:=0)d
group by symbolid, date desc
) a
WHERE row_number < 5
The user defined variables does not work well, (refer here)
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server. In SELECT @a, @a:=@a+1, ..., you might think that MySQL will evaluate @a first and then do an assignment second. However, changing the statement (for example, by adding a GROUP BY, HAVING, or ORDER BY clause) may cause MySQL to select an execution plan with a different order of evaluation.
Here is my proposal
select symbolid,
substring_index(group_concat(date order by date desc), ',', 4) as last_4_dates
from projections
group by symbolid
The drawback of this approach is it will group collapse the date,
and you need to explode before you can actually use it.
Final code:
set @num := 0, @symbolid := '';
select d.* from
(
select symbolid, date,
@num := if(@symbolid = symbolid, @num + 1, 1) as row_number,
@symbolid := symbolid as dummy
from projections
order by symbolid, date desc
) d
where d.row_number < 5
精彩评论