开发者

Finding the highest n values of each group in MySQL

I have some data formatted like this:

Lane         Series
1            680
1            685
1            688
2            666
2            425
2            775
...

And I'd like to grab the highest n series per lane (let's say 2 for the sake of this example, but it could be many more than that)

So the output should be:

Lane         Series
1            688
1  开发者_JAVA百科          685
2            775
2            666

Getting the highest series per lane is easy, but I can't seem to find a way to get the highest 2 results.

I use a MAX aggregate function with a GROUP BY to get the MAX, but there's no "TOP N" function as in SQL Server and using ORDER BY... LIMIT only returns the highest N results overall, not per lane.

Since I use a JAVA application I coded myself to query the database and choose what N is, I could do a loop and use a LIMIT and loop through every lane, making a different query each time, but I want to learn how to do it using MySQL.


See my other answer for the MySQL-only, but very fast, solution.

This solution lets you specify any number of top rows per lane and doesn't use any MySQL "funky" syntax - it should run on most databases.

select lane, series
from lane_series ls
group by lane, series
having (
    select count(*) 
    from lane_series
    where lane = ls.lane
    and series > ls.series) < 2 -- Here's where you specify the number of top rows
order by lane, series desc;

Test output:

create table lane_series (lane int, series int);

insert into lane_series values 
(1, 680),
(1, 685),
(1, 688),
(2, 666),
(2, 425),
(2, 775);

select lane, series
from lane_series ls
group by lane, series
having (select count(*) from lane_series where lane = ls.lane and series > ls.series) < 2
order by lane, series desc;

+------+--------+
| lane | series |
+------+--------+
|    1 |    688 |
|    1 |    685 |
|    2 |    775 |
|    2 |    666 |
+------+--------+
4 rows in set (0.00 sec)


This solution is the fastest for MySQL and will work with very large tables, but it uses "funky" MySQL features, so wouldn't be of use for other database flavours.

(Edited to sort before applying logic)

set @count:=-1, @lane:=0; 
select lane, series
from (select lane, series from lane_series order by lane, series desc) x
where if(lane != @lane, @count:=-1, 0) is not null
and if(lane != @lane, @lane:=lane, lane) is not null
and (@count:=@count+1) < 2; -- Specify the number of row at top of each group here

To put this query on steroids, define an index on lane and series: CREATE INDEX lane_series_idx on lane_series(lane, series); and it will do (super fast) index-only scan - so your other text columns don't affect it.

Good points of this query are:

  1. It requires only one table pass (albeit sorted)
  2. It handles ties at any level, for example if there's a tie for 2nd, only one of the 2nd will be displayed - ie the row count is absolute and never exceeded

Here's the test output:

create table lane_series (lane int, series int);

insert into lane_series values (1, 680),(1, 685),(1, 688),(2, 666),(2, 425),(2, 775);

-- Execute above query:

+------+--------+
| lane | series |
+------+--------+
|    1 |    688 |
|    1 |    685 |
|    2 |    775 |
|    2 |    666 |
+------+--------+


This will work, if you know you'll never have ties for first place:

SELECT lane,MAX(series)
FROM scores
GROUP BY lane
UNION 
SELECT s.lane,MAX(s.series)
FROM scores AS s
JOIN (
    SELECT lane,MAX(series) AS series
    FROM scores
    GROUP BY lane
) AS x ON (x.lane = s.lane)
WHERE s.series <> x.series
GROUP BY s.lane;


I think @Bohemian's generic answer can also be written as a join rather than a subquery, though it probably doesn't make much difference:

select ls1.lane, ls1.series
from lane_series ls1 left join lane_series ls2 on lane
where ls1.series < ls2.series
group by ls1.lane, ls1.series
having count(ls2.series) < 2 -- Here's where you specify the number of top rows
order by ls1.lane, ls1.series desc;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜