开发者

MySQL: How does groupby work on columns without aggregate functions?

I am somewhat confused about how the group by command works in mysql.

Suppose I have a table:

mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort;                   
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress       | date                | httpMethod                                      |
+----------+-----------------+---------------------+-------------------------------------------------+
|        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
|        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
|        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
|        4 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/fcs_style.css HTTP/1.1             | 
|        5 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /include/main_page.css HTTP/1.1             | 
|        6 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/bigportaltopbanner.gif HTTP/1.1     | 
|        7 | 129.173.177.214 | 2003-07-09 00:01:23 | GET /images/right_1.jpg HTTP/1.1                | 
|        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
|        9 | 64.68.88.165    | 2003-07-09 00:02:44 | GET /news/sports/basketball.shtml HTTP/1.0      | 
|       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
|       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
|       12 | 129.173.159.98  | 2003-07-09 00:03:46 | GET /include/fcs_style.css HTTP/1.1             | 
|       13 | 129.173.159.98  | 2003-07-09 00:03:46 | GET /include/main_page.css HTTP/1.1             | 
|       14 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/bigportaltopbanner.gif HTTP/1.1     | 
|       15 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/left_1g.jpg HTTP/1.1                | 
|       16 | 129.173.159.98  | 2003-07-09 00:03:48 | GET /images/webcam.gif HTTP/1.1                 | 
+----------+-----------------+---------------------+-------------------------------------------------+

When I am execute this statement how does it choose which recordID to include since there are a range of recordIDs that would be correct? Does it just choose the first one that matches?

mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress       | date                | httpMethod                                      |
+----------+-----------------+---------------------+-------------------------------------------------+
|       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
|        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
|        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
|        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
|        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
|       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
+----------+-----------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)

For this table the max(date) and min(date) values seem logical to me but I am confused about how the recordID and httpMethod w开发者_如何学运维here chosen.

Is it safe use two aggregate functions in one command?

mysql> select recordID, IPAddress, min(date), max(date), httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
| recordID | IPAddress       | min(date)           | max(date)           | httpMethod                                      |
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
|       11 | 129.173.159.98  | 2003-07-09 00:03:46 | 2003-07-09 00:03:48 | GET / HTTP/1.1                                  | 
|        3 | 129.173.177.214 | 2003-07-09 00:01:23 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
|        8 | 64.68.88.165    | 2003-07-09 00:02:43 | 2003-07-09 00:02:44 | GET /studentservices/responsible.shtml HTTP/1.0 | 
|        2 | 64.68.88.166    | 2003-07-09 00:00:55 | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
|        1 | 64.68.88.22     | 2003-07-09 00:00:21 | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
|       10 | 64.68.88.34     | 2003-07-09 00:02:46 | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
+----------+-----------------+---------------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)


Usually use of GROUP BY while listing a field in the select expression without an aggregate function is invalid SQL and should throw an error.

MySQL, however, allows this and simply chooses one value randomly. Try to avoid it, because it is confusing.

To disallow this, you can say at runtime:

SET sql_mode := CONCAT('ONLY_FULL_GROUP_BY,',@@sql_mode);

or use the configuration value and/or command line option sql-mode.

Yes, listing two aggregate functions is completely valid.


Because I'm new apparently I can't post helpful images so I'll try to do this with text...

I just tested this and it appears that the values of fields that are NOT in the GROUP BY will use the values of the FIRST row that matches the group by condition. This will also explain the perceived "randomness" that others have experienced with selecting columns that aren't in a group by clause.

Example:

Create a table called "test" with 2 columns called "col1" and "col2" with data that looks like this:

Col1 Col2
1 2
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3

Then run the following query:

select col1,col2
from test
order by col2 desc

You will get this result:

1 3
2 3
3 3
1 2
1 2
2 2
3 2
2 1
3 1

Now consider the following query:

select groupTable.col1,groupTable.col2
from (
   select col1,col2
   from test
   order by col2 desc
) groupTable
group by groupTable.col1
order by groupTable.col1 desc

You will get this result:

3 3
2 3
1 3

Change the subquery to asc:

select col1,col2
from test
order by col2 asc

Result:

2 1
3 1
1 2
1 2
2 2
3 2
1 3
2 3
3 3

Again use that as the basis for your subquery:

select groupTable.col1,groupTable.col2
from (
   select col1,col2
   from test
   order by col2 asc
) groupTable
group by groupTable.col1
order by groupTable.col1 desc

Result:
3 1
2 1
1 2

Now you should be able to see how the order of the subquery affects which values are chosen for fields that are selected but not in the group by clause. This would explain the perceived "randomness" that others have mentioned because if the subquery (or lack there of) is not combined with an ORDER BY clause then mysql will grab rows as they come in, but by defining a sort order in a subquery you are able to control this behavior and get predictable results.


I thought it takes the first row according to the PRIMARY KEY or any INDEX, because it looks like it works that way, but i've tried a GROUP BY query on various tables and didn't identify any pattern.

Therefore i will avoid to use any value of non-grouped columns.


Group By picks up the first record based on the index. Let us say Log_Analysis_Records_dalhousieShort table has recoedID as index. Hence, group by picked 11 recordID for IPAddress 129.173.159.98 among recordID 11 to 16. However min and max are pre group by operations in a way so there values are computed logically for you.

mysql> select recordID, IPAddress, date, httpMethod from Log_Analysis_Records_dalhousieShort GROUP BY IPADDRESS;
+----------+-----------------+---------------------+-------------------------------------------------+
| recordID | IPAddress       | date                | httpMethod                                      |
+----------+-----------------+---------------------+-------------------------------------------------+
|       11 | 129.173.159.98  | 2003-07-09 00:03:46 | GET / HTTP/1.1                                  | 
|        3 | 129.173.177.214 | 2003-07-09 00:01:23 | GET / HTTP/1.1                                  | 
|        8 | 64.68.88.165    | 2003-07-09 00:02:43 | GET /studentservices/responsible.shtml HTTP/1.0 | 
|        2 | 64.68.88.166    | 2003-07-09 00:00:55 | GET /news/internet/xml.shtml HTTP/1.0           | 
|        1 | 64.68.88.22     | 2003-07-09 00:00:21 | GET /news/science/cancer.shtml HTTP/1.0         | 
|       10 | 64.68.88.34     | 2003-07-09 00:02:46 | GET /news/science/space.shtml HTTP/1.0          | 
+----------+-----------------+---------------------+-------------------------------------------------+
6 rows in set (0.00 sec)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜