MySQL claims that I can use columns in SELECT that aren't in GROUP BY, but I can't with equal performance

2023-01-25 19:38 问答作者：

The MySQL documents state in section 11.5.3 that, despite what the SQL standard may say, it's just fine to use columns in the SELECT clause that aren't in the GROUP BY clause, so long as they are functionally dependent on the grouped key.

MySQL extends the use of GROUP BY so that you can use nonaggregated columns or calculations in the select list that do not appe开发者_开发百科ar in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. For example, you need not group on customer.name in the following query:
SELECT order.custid, customer.name,
MAX(payments)   FROM order,customer  
WHERE order.custid = customer.custid  
GROUP BY order.custid;
In standard SQL, you would have to add customer.name to the GROUP BY clause. In MySQL, the name is redundant.

Sounds reasonable. However, though I can select those columns, it seems to have an adverse effect on performance.

EXPLAIN SELECT o.id FROM objects o GROUP BY o.id;
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | o     | range | NULL          | PRIMARY | 3       | NULL | 5262 | Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+

(I realize that this query is pretty silly; it's just the simplest version of a more complex query that has the same issue.) When selecting just the primary key ID I group by, then MySQL uses the primary key index. However, when I include other columns, MySQL does not.

EXPLAIN SELECT o.id, o.name FROM objects o GROUP BY o.id;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | o     | ALL  | NULL          | NULL | NULL    | NULL | 5261 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+

That use of filesort instead of the index really sets me back. I'm currently looking to select * from this table, so would like to avoid having to repeat all columns in the group and index them. Is there any way to get MySQL to use the primary key index, as I expect it to?

use a derived table for the group by and join back on whatever table you want to select from

Since it doesn't look like there's a simple answer, I'm going with a cheap solution for the moment.

What I would do would be something like the following:

SELECT o1.* FROM objects o1 WHERE o1.id IN (SELECT o2.id FROM objects o2 WHERE mycondition GROUP BY o2.id)

However, according to how it gets EXPLAINed, the MySQL optimizer views the subquery as being dependent, which is always a really, really nasty performance killer. I think that's a bug in the query optimizer brought about by the fact that it's the same table, even though it's aliased. As such, I'll be using one query to fetch the IDs, and putting them IN the second query that fetches o.*. It gets reasonable performance, and isn't too painful.

This question is still open to answers with cleaner solutions that perform as well, if not better :)

In the first query, the only field you're accessing is in an index, so mysql only has to look at the index file. But in the second query, you're now pulling a column from the table itself which requires also reading the table data. The first query isn't really using your primary key index the way it might if you had a WHERE clause. It's only using it for the group by, but it's still looking at every entry in the index.

The difference between the first query and the second is just that the second has to look at every row in the full table (aka a table scan), rather than just every primary key value in the index.

As far as optimizing goes, if your real query has no cumulative functions (SUM, COUNT, etc) as in your example then should see a major improvement just doing:

SELECT DISTINCT o.id, o.name FROM objects o

However, if that's only true of your simple example and your query really does require a GROUP BY then your next best bet is to increase your tmp_table_size and max_heap_table_size variables to allow for more rows to fit in memory at once.

继续阅读：group-by indexing optimization

MySQL claims that I can use columns in SELECT that aren't in GROUP BY, but I can't with equal performance

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Best solution for private video database [closed]

国内夏季避暑旅游胜地有哪些？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?