how to set indexes for join and group by queries

2023-02-22 18:20 问答作者：

Let's say we have a common join like the one below:

EXPLAIN SELECT *  
FROM visited_links vl
JOIN device_tracker dt ON ( dt.Client_id = vl.Client_id
AND dt.Device_id = vl.Device_id ) 
GROUP BY dt.id

if we execute an explain, it says:

id  select_type   table   type    possible_keys           key        key_len   ref                         rows   Extra
1   SIMPLE        vl      index   NULL                    vl_id      273       NULL                        1977   Using index; Using temporary; Using filesort
1   SIMPLE        dt      ref     Device_id,Device_id_2   Device_id  257       datumprotect.vl.device_id   4      Using where

I know that s开发者_运维知识库ometimes it's difficult to choose the right indexes when you are using group by but, what indexes could I set for avoiding 'using temporary, using filesort' in this query? why is this happening? and specially, why this happens after using an index?

One point to mention is that the fields returned by the select (* in this case) should either be in the GROUP BY clause or be using agregate functions such as SUM() or MAX(). Otherwise unexpected results can occur. This is because if the database is not told how to choose fields that are not in the group by clause you may get any member of the group, pretty much at random.

The way I look at it is to break the query down into bits.

you have a join on (dt.Client_id = vl.Client_id and dt.Device_id = vl.Device_id) so all of those fields should be indexed in their respective tables.
You are using GROUP BY dt.id so you need an index that includes dt.id

BUT...

an index on (dt.client_id,dt.device_id,dt.id) will not work for the GROUP BY

and

an index on (dt.id, dt.client_id, dt.device_id) will not work for the join.

Sometimes you end up with a query which just can't use an index.

See also: http://ntsrikanth.blogspot.com/2007/11/sql-query-order-of-execution.html

You didn't post your indices, but first of all, you'll want to have an index for (client_id, device_id) on visited_links, and (client_id, device_id, id) on device_tracker to make sure that query is fully indexed.

From page 191 of the excellent High Performance MySQL, 2nd Ed.:

MySQL has two kinds of GROUP BY strategies when it can't use an index: it can use a temporary table or a filesort to perform the grouping. Either one can be more efficient depending on the query. You can force the optimizer to choose one method or the other with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints.

In your case, I think the issue stems from joining on multiple columns and using GROUP BY together, even after the suggested indices are in place. If you remove either (a) one of the join conditions or (b) the GROUP BY, this shouldn't need a filesort.

However, keep in mind that a filesort doesn't always use actual files, it can also happen entirely within a memory buffer if the result set is small enough, so the performance penalty may be minimal. Consider the wall-clock time for the query too.

HTH!

继续阅读：indexing join

how to set indexes for join and group by queries

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？