How do I use "group by" with three columns of data?
My query currently is:
SELECT x, MAX(z) AS mz, y FROM my_table GROUP BY x
The columns x and mz are returned as expected, but 开发者_开发问答the last column, y, does not match up with the other two. In other words, I want the "y" column to match the mz column just like the x column currently does. How do I pull that off?
UPDATE: Sorry, the question wasn't very clear. I want to perform the following query:
SELECT * FROM (SELECT x, MAX(z) AS mz FROM my_table GROUP BY x) a RIGHT JOIN (SELECT y, MAX(z) AS mz FROM my_table GROUP BY y) b ON a.mz = b.mz
without having to use 3 SELECT statements (Perhaps that's not a big deal, but it seems like an inefficient query to me. But I'm pretty new at sql queries, so I dunno.)
UPDATE #2: Lets say my table looks like this:
-------------------
| x | y | z |
-------------------
| 45 | h | 3 |
| 23 | c | 5 |
| 45 | e | 9 |
| 23 | b | 12 |
| 45 | x | 36 |
| 33 | s | 44 |
| 33 | p | 78 |
-------------------
I want to return the following:
-------------------
| x | y | z |
-------------------
| 23 | b | 12 |
| 45 | x | 36 |
| 33 | p | 78 |
-------------------
You could do
select s.x, s.mz, stuff.y
from (select x, max(z) as mz from stuff group by x) s
left join stuff on stuff.x = s.x and stuff.z = s.mz;
If I understand you right, you are looking to get the max of Z for all combinations of x and y.
If true, your statement should be:
SELECT x, y, MAX(z) AS mz FROM my_table GROUP BY x,y
I think your question is still a little hazy. I think what you are saying is this:
For your table, you have some values (X, Y and Z). For every X you can have multiple Y's and Z's, but for every X, you want to get back the exact value for Y that corresponds to the highest Z for that X.
Let's use an example. Let's pretend this was a table that kept track of number of days an employee was sick by year:
Employee, Year, Days Sick John, 1999, 1 John, 2000, 3 John, 2001, 8
And you wanted to know, for every employee, which year they were sick the most in.
Select Employee, Year, DaysSick
From SickDays SD
Join (Select Employee, MAX(DaysSick) DaysSick
From SickDays
Group By Employee) MSD ON SD.Employee = MSD.Employee
And SD.DaysSick = MSD.DaysSick
This eliminates one of your selects. There may be a way to get the same data using something like an Analytic function in Oracle, however the performance wont' be much better. At a minimum, you need a single pass to just figure out the MAX and a second fetch (hopefully on indexed columns) to retrieve the full dataset.
I understand you want to find the pairs (x,y) that shares the same their max(z). Its necesary to do diferent selects to find the max(z) for each one.... but maybe there's some hack or weird idea to do it.
Anyway doing your way is not ugly, because you are full-scanning the table two times, and next joining the partial results (with a bit of luck it will occupy not so much). So its time will be linearly proportional to the table (if there are not so much different values for x and y):
total cost = cost of grouping for X + cost of grouping for Y + a not so high cost if x and y are discrete.
And you ARE looking for max's so you necesarily have to fullscan the table, so it's linear time at lest.
That's my point: you're not so far from the optimum :)
The result of selecting "y" in your example with the group by is not defined. MySQL will return the first row it scans, but other databases may do something entirely different. If you want to get the row that contains the max(z), it must be done in a sub-query or a join. http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html
select * from f f1 where z=(select max(z) from f f2 where f1.x=f2.x);
+------+------+------+
| x | y | z |
+------+------+------+
| 23 | b | 12 |
| 45 | x | 36 |
| 33 | p | 78 |
+------+------+------+
select * from f;
+------+------+------+
| x | y | z |
+------+------+------+
| 45 | h | 3 |
| 23 | c | 5 |
| 45 | e | 9 |
| 23 | b | 12 |
| 45 | x | 36 |
| 33 | s | 44 |
| 33 | p | 78 |
+------+------+------+
精彩评论