开发者

how to set indexes for join and group by queries

Let's say we have a common join like the one below:

EXPLAIN SELECT *  
FROM visited_links vl
JOIN device_tracker dt ON ( dt.Client_id = vl.Client_id
AND dt.Device_id = vl.Device_id ) 
GROUP BY dt.id

if we execute an explain, it says:

id  select_type   table   type    possible_keys           key        key_len   ref                         rows   Extra
1   SIMPLE        vl      index   NULL                    vl_id      273       NULL                        1977   Using index; Using temporary; Using filesort
1   SIMPLE        dt      ref     Device_id,Device_id_2   Device_id  257       datumprotect.vl.device_id   4      Using where

I know that s开发者_运维知识库ometimes it's difficult to choose the right indexes when you are using group by but, what indexes could I set for avoiding 'using temporary, using filesort' in this query? why is this happening? and specially, why this happens after using an index?


One point to mention is that the fields returned by the select (* in this case) should either be in the GROUP BY clause or be using agregate functions such as SUM() or MAX(). Otherwise unexpected results can occur. This is because if the database is not told how to choose fields that are not in the group by clause you may get any member of the group, pretty much at random.


The way I look at it is to break the query down into bits.

  1. you have a join on (dt.Client_id = vl.Client_id and dt.Device_id = vl.Device_id) so all of those fields should be indexed in their respective tables.

  2. You are using GROUP BY dt.id so you need an index that includes dt.id

BUT...

an index on (dt.client_id,dt.device_id,dt.id) will not work for the GROUP BY

and

an index on (dt.id, dt.client_id, dt.device_id) will not work for the join.

Sometimes you end up with a query which just can't use an index.

See also: http://ntsrikanth.blogspot.com/2007/11/sql-query-order-of-execution.html


You didn't post your indices, but first of all, you'll want to have an index for (client_id, device_id) on visited_links, and (client_id, device_id, id) on device_tracker to make sure that query is fully indexed.

From page 191 of the excellent High Performance MySQL, 2nd Ed.:

MySQL has two kinds of GROUP BY strategies when it can't use an index: it can use a temporary table or a filesort to perform the grouping. Either one can be more efficient depending on the query. You can force the optimizer to choose one method or the other with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints.

In your case, I think the issue stems from joining on multiple columns and using GROUP BY together, even after the suggested indices are in place. If you remove either (a) one of the join conditions or (b) the GROUP BY, this shouldn't need a filesort.

However, keep in mind that a filesort doesn't always use actual files, it can also happen entirely within a memory buffer if the result set is small enough, so the performance penalty may be minimal. Consider the wall-clock time for the query too.

HTH!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜