开发者

Returning corresponding columns of a MAX function in MySQL

I have a table that contains the scores bowled by players in a bowling center. Each row has some data about which player bowled the game, which league it was played in, the date, the score of a single game, the lane number, etc.

What I'm trying to do is get who played (and in which league and on what date... Basically the whole row) the best series (three games) on every single lane.

What I have so far is

SELECT PlayerID, LaneNumber, MAX(Series)
  FROM (SELECT Season, LeagueName, LaneNumber, WeekNumber, PlayerID, Date, SUM(Score) AS Series 
          FROM Scores
      GROUP BY Season, LeagueName, WeekNumber, PlayerID)
GROUP BY LaneNumber

This works, as in I get the best three games for every single lane, which is actually what I want, but the other field containing the PlayerID isn't actually correct.

In my table, the best score on lane number 24 (gotten from the SUM(Score) and GROUP BY 开发者_JAVA技巧Season, LeagueName, WeekNumber, PlayerID) is 848 and was played by the player that has PlayerID 36.

What I get instead is Lane 24 with 848 (which is correct), but the PlayedID returned is 3166. The same thing happens on every single lane. (As in, I get PlayerIDs that are plain out wrong. And if I had other columns in the first select, they're also wrong)


You are violating the semantics of GROUP BY.

When using GROUP BY, it's only meaningful to SELECT columns that you have grouped by (e.g. LaneNumber) and aggregate functions of the other columns (e.g. MAX(Series)). It is not meaningful to select anything else (in your case, PlayerID) because you don't specify which player ID you want among those that share the same LaneNumber.

Sadly, MySql will by default let you do this without reporting an error, and it will return any value it chooses for the offending column. In your case, this means you are getting back a player ID "randomly" chosen among those that are included in the specified grouping.

You are also doing this in the inner query, where you select LaneNumber, WeekNumber and Date.

Solution

The query needs to be rewritten, but first you need to carefully specify exactly which results you want to get. Do you want the best player and relevant data for each series (and any lane)? For each series and lane separately? The answer to this question will dictate what you need to GROUP BY, and by extension what the query will look like.


As noted by @Jon, you need to remove those elements NOT applicable to specific person. Then, as @Ord had the closest sample, it would be best to pre-query the results into a separate table (not temporary as MySQL will choke on it trying to query from itself in a self-join in the second query).

So, to me (having been a league bowler some years ago), and your content spanning ALL leagues, there would never be two different leagues on the same lane at the same time, however, for a full evening, you could have different leagues starting different times... 6-8:30, 8:45-11 for example... so grouping by the league and date would work. However, you DO need the player as part of the group by to get their respective SUM() values.

To help clarify the answers, let assume I have the following data. This data will represent only a single lane, one week, one season, but two leagues and 3 players per league (for sole purpose of showing results and limiting content here)

League   Player   Score
L1       1        223
L1       1        218
L1       1        204
L1       2        187
L1       2        201
L1       2        189
L1       3        148
L1       3        152
L1       3        158

L2       4        189
L2       4        195
L2       4        192
L2       5        182
L2       5        199
L2       5        209
L2       6        228
L2       6        234
L2       6        218


CREATE TABLE SeriesScores2
SELECT 
      Season, 
      LeagueName, 
      LaneNumber, 
      WeekNumber, 
      PlayerID, 
      SUM(Score) AS Series 
   FROM 
      Scores
   GROUP BY 
      Season, 
      LeagueName, 
      LaneNumber, 
      WeekNumber, 
      PlayerID;

The first query (above) would create will create the series for all players all weeks, all leagues, etc.. Assume now I've added in the common season, lane, week too

Season   League   Lane   Week   Player   Series
1        L1       1      1      1        645
1        L1       1      1      2        577
1        L1       1      1      3        458
1        L2       1      1      4        576
1        L2       1      1      5        590
1        L2       1      1      6        680

This gives us the precursor to determining the max(), otherwise we'd have to duplicate the query inside itself and at the outer level making it more complicated than this pre-aggregation.

Now, the above permanent table (can be deleted AFTER getting results), query the FIRST (PreQuery) for the maximum score PER LEAGUE PER LANE... Ex:, its common that a men's league will typically have higher series scores than women... similar with different age groups. So, Men's league Lane 1 highest score and Women's League Lane 1 highest score, etc.. Highest score typically identified by the single week out of the entire season, not highest series per lane every week.

Now, PreQuery "ss" alias is just on the season, league, lane and maximum series. Once THAT is known, self-join to the series score to pull in WHO DID that highest score on said lane and pull the who and what week it occurred

select
      ss.season, 
      ss.leaguename, 
      ss.lanenumber, 
      ss.highestSeries, 
      ss2.PlayerID, 
      ss2.WeekNumber
   from
      ( select season, leaguename, lanenumber, max( series ) highestSeries
           from SeriesScores2
           group by season, leaguename, lanenumber ) ss
      join SeriesScores2 ss2
         on ss.Season = ss2.Season
        and ss.LeagueName = ss2.LeagueName
        and ss.LaneNumber = ss2.LaneNumber
        and ss.HighestSeries = ss2.Series

Now, from the above query... lets break it down. If we take the inner "ss" prequery

  ( select season, leaguename, lanenumber, max( series ) highestSeries
       from SeriesScores2
       group by season, leaguename, lanenumber ) ss

We will get the highest scores per league (ex: Men's league vs Women's league on same week, same night, same lane and we find (below), just by max, but don't have the WHO or what week, just the highest series bowled regardless of week or person. So THIS becomes the basis of the JOIN back to the pre-aggregated table "SeriesScores2", yet here, we have the highest series score to ensure we find the correct person

Season  League   Lane   HighestSeries
1       L1       1      645
1       L2       1      680

To refresh preaggregation
Season   League   Lane   Week   Player   Series
1        L1       1      1      1        645    <-- Join finds THIS entry League 1
1        L1       1      1      2        577
1        L1       1      1      3        458
1        L2       1      1      4        576
1        L2       1      1      5        590
1        L2       1      1      6        680    <-- Join finds THIS entry League 2

So, my original queries did work as I tested them before posting. I don't know what hiccup you had on yours unless column name not correct or something. As with respect to the "Date" column, I didn't particularly care because you had the week number available which would correspond to the week of bowling and would be a 1:1 relationship to a date anyhow. The date column could have been added to the pre-aggregation SeriesScores2 and pull along when getting the person's ID and week. (unless a league bowls on multiple nights in the same week, THEN you would need the explicit date).

Hope this clarifies your questions / comments.


Look here: http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html

It might get messy trying to do it all in one query, but basically, you want to generate your series data as you did:

SELECT Season, LeagueName, LaneNumber, WeekNumber, PlayerID, Date, SUM(Score) AS Series 
      FROM Scores
  GROUP BY Season, LeagueName, WeekNumber, PlayerID

Then, instead of getting max series values from this table, you will want to add a clause: WHERE Series= and then to get the right value, you need to do another select, where you get the max(Series) where LaneNumber is the same in both tables. I would have coded it for you, but I am not confident enough in my MySQL abilities!


Okay, attempting to actually write the MySQL code I was thinking of (I couldn't resist...):

CREATE TEMPORARY TABLE SeriesScores
SELECT Season, LeagueName, LaneNumber, WeekNumber, PlayerID, SUM(Score) AS Series 
    FROM Scores
    GROUP BY Season, LeagueName, WeekNumber, PlayerID;

This bit just gets scores for each series, as you specified in your own MySQL code. Only difference is that I am not selecting Date, because since we are not grouping by it, its value will be random. Then:

SELECT PlayerID, LaneNumber, Series
    FROM SeriesScores s1
    WHERE  Series=(SELECT MAX(s2.Series)
          FROM SeriesScores s2
          WHERE s1.LaneNumber = s2.LaneNumber);

This bit just selects what you need from SeriesScores, and only considers rows where the series score is the max for that lane.

Does that work for you?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜