Returning corresponding columns of a MAX function in MySQL
I have a table that contains the scores bowled by players in a bowling center. Each row has some data about which player bowled the game, which league it was played in, the date, the score of a single game, the lane number, etc.
What I'm trying to do is get who played (and in which league and on what date... Basically the whole row) the best series (three games) on every single lane.
What I have so far is
SELECT PlayerID, LaneNumber, MAX(Series)
FROM (SELECT Season, LeagueName, LaneNumber, WeekNumber, PlayerID, Date, SUM(Score) AS Series
FROM Scores
GROUP BY Season, LeagueName, WeekNumber, PlayerID)
GROUP BY LaneNumber
This works, as in I get the best three games for every single lane, which is actually what I want, but the other field containing the PlayerID isn't actually correct.
In my table, the best score on lane number 24 (gotten from the SUM(Score) and GROUP BY 开发者_JAVA技巧Season, LeagueName, WeekNumber, PlayerID) is 848 and was played by the player that has PlayerID 36.
What I get instead is Lane 24 with 848 (which is correct), but the PlayedID returned is 3166. The same thing happens on every single lane. (As in, I get PlayerIDs that are plain out wrong. And if I had other columns in the first select, they're also wrong)
You are violating the semantics of GROUP BY
.
When using GROUP BY
, it's only meaningful to SELECT
columns that you have grouped by (e.g. LaneNumber
) and aggregate functions of the other columns (e.g. MAX(Series)
). It is not meaningful to select anything else (in your case, PlayerID
) because you don't specify which player ID you want among those that share the same LaneNumber
.
Sadly, MySql will by default let you do this without reporting an error, and it will return any value it chooses for the offending column. In your case, this means you are getting back a player ID "randomly" chosen among those that are included in the specified grouping.
You are also doing this in the inner query, where you select LaneNumber
, WeekNumber
and Date
.
Solution
The query needs to be rewritten, but first you need to carefully specify exactly which results you want to get. Do you want the best player and relevant data for each series (and any lane)? For each series and lane separately? The answer to this question will dictate what you need to GROUP BY
, and by extension what the query will look like.
As noted by @Jon, you need to remove those elements NOT applicable to specific person. Then, as @Ord had the closest sample, it would be best to pre-query the results into a separate table (not temporary as MySQL will choke on it trying to query from itself in a self-join in the second query).
So, to me (having been a league bowler some years ago), and your content spanning ALL leagues, there would never be two different leagues on the same lane at the same time, however, for a full evening, you could have different leagues starting different times... 6-8:30, 8:45-11 for example... so grouping by the league and date would work. However, you DO need the player as part of the group by to get their respective SUM() values.
To help clarify the answers, let assume I have the following data. This data will represent only a single lane, one week, one season, but two leagues and 3 players per league (for sole purpose of showing results and limiting content here)
League Player Score
L1 1 223
L1 1 218
L1 1 204
L1 2 187
L1 2 201
L1 2 189
L1 3 148
L1 3 152
L1 3 158
L2 4 189
L2 4 195
L2 4 192
L2 5 182
L2 5 199
L2 5 209
L2 6 228
L2 6 234
L2 6 218
CREATE TABLE SeriesScores2
SELECT
Season,
LeagueName,
LaneNumber,
WeekNumber,
PlayerID,
SUM(Score) AS Series
FROM
Scores
GROUP BY
Season,
LeagueName,
LaneNumber,
WeekNumber,
PlayerID;
The first query (above) would create will create the series for all players all weeks, all leagues, etc.. Assume now I've added in the common season, lane, week too
Season League Lane Week Player Series
1 L1 1 1 1 645
1 L1 1 1 2 577
1 L1 1 1 3 458
1 L2 1 1 4 576
1 L2 1 1 5 590
1 L2 1 1 6 680
This gives us the precursor to determining the max(), otherwise we'd have to duplicate the query inside itself and at the outer level making it more complicated than this pre-aggregation.
Now, the above permanent table (can be deleted AFTER getting results), query the FIRST (PreQuery) for the maximum score PER LEAGUE PER LANE... Ex:, its common that a men's league will typically have higher series scores than women... similar with different age groups. So, Men's league Lane 1 highest score and Women's League Lane 1 highest score, etc.. Highest score typically identified by the single week out of the entire season, not highest series per lane every week.
Now, PreQuery "ss" alias is just on the season, league, lane and maximum series. Once THAT is known, self-join to the series score to pull in WHO DID that highest score on said lane and pull the who and what week it occurred
select
ss.season,
ss.leaguename,
ss.lanenumber,
ss.highestSeries,
ss2.PlayerID,
ss2.WeekNumber
from
( select season, leaguename, lanenumber, max( series ) highestSeries
from SeriesScores2
group by season, leaguename, lanenumber ) ss
join SeriesScores2 ss2
on ss.Season = ss2.Season
and ss.LeagueName = ss2.LeagueName
and ss.LaneNumber = ss2.LaneNumber
and ss.HighestSeries = ss2.Series
Now, from the above query... lets break it down. If we take the inner "ss" prequery
( select season, leaguename, lanenumber, max( series ) highestSeries
from SeriesScores2
group by season, leaguename, lanenumber ) ss
We will get the highest scores per league (ex: Men's league vs Women's league on same week, same night, same lane and we find (below), just by max, but don't have the WHO or what week, just the highest series bowled regardless of week or person. So THIS becomes the basis of the JOIN back to the pre-aggregated table "SeriesScores2", yet here, we have the highest series score to ensure we find the correct person
Season League Lane HighestSeries
1 L1 1 645
1 L2 1 680
To refresh preaggregation
Season League Lane Week Player Series
1 L1 1 1 1 645 <-- Join finds THIS entry League 1
1 L1 1 1 2 577
1 L1 1 1 3 458
1 L2 1 1 4 576
1 L2 1 1 5 590
1 L2 1 1 6 680 <-- Join finds THIS entry League 2
So, my original queries did work as I tested them before posting. I don't know what hiccup you had on yours unless column name not correct or something. As with respect to the "Date" column, I didn't particularly care because you had the week number available which would correspond to the week of bowling and would be a 1:1 relationship to a date anyhow. The date column could have been added to the pre-aggregation SeriesScores2 and pull along when getting the person's ID and week. (unless a league bowls on multiple nights in the same week, THEN you would need the explicit date).
Hope this clarifies your questions / comments.
Look here: http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html
It might get messy trying to do it all in one query, but basically, you want to generate your series data as you did:
SELECT Season, LeagueName, LaneNumber, WeekNumber, PlayerID, Date, SUM(Score) AS Series
FROM Scores
GROUP BY Season, LeagueName, WeekNumber, PlayerID
Then, instead of getting max series values from this table, you will want to add a clause: WHERE Series=
and then to get the right value, you need to do another select, where you get the max(Series) where LaneNumber is the same in both tables. I would have coded it for you, but I am not confident enough in my MySQL abilities!
Okay, attempting to actually write the MySQL code I was thinking of (I couldn't resist...):
CREATE TEMPORARY TABLE SeriesScores
SELECT Season, LeagueName, LaneNumber, WeekNumber, PlayerID, SUM(Score) AS Series
FROM Scores
GROUP BY Season, LeagueName, WeekNumber, PlayerID;
This bit just gets scores for each series, as you specified in your own MySQL code. Only difference is that I am not selecting Date, because since we are not grouping by it, its value will be random. Then:
SELECT PlayerID, LaneNumber, Series
FROM SeriesScores s1
WHERE Series=(SELECT MAX(s2.Series)
FROM SeriesScores s2
WHERE s1.LaneNumber = s2.LaneNumber);
This bit just selects what you need from SeriesScores, and only considers rows where the series score is the max for that lane.
Does that work for you?
精彩评论