MySQL performance with GROUP BY and JOIN

2023-03-16 19:35 问答作者：

After spending a lot of time with variants to this question I'm wondering if someone can help me optimize this query or indexes.

I have three temp tables ref1, ref2, ref3 all defined as below, with ref1 and ref2 each having about 6000 rows and ref3 only 3 rows:

CREATE TEMPORARY TABLE ref1 (
  id INT NOT NULL AUTO_INCREMENT,
  val INT,
  PRIMARY KEY (id)
)
ENGINE = MEMORY;

The slow query is against a table like so, with about 1M rows:

CREATE TABLE t1 (
  d DATETIME NOT NULL,
  id1 INT NOT NULL,
  id2 INT NOT NULL,
  id3 INT NOT NULL,
  x INT NULL,
  PRIMARY KEY (id1, d, id2, id3)
)
ENGINE = INNODB;

The query in question:

SELECT id1, SUM(x)
  FROM t1
  INNER JOIN ref1 ON ref1.id = t1.id1
  INNER JOIN ref2 ON ref2.开发者_StackOverflow社区id = t1.id2
  INNER JOIN ref3 ON ref3.id = t1.id3
  WHERE d BETWEEN '2011-03-01' AND '2011-04-01'
  GROUP BY id1;

The temp tables are used to filter the result set down to just the items a user is looking for.

EXPLAIN

+----+-------------+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref              | rows | Extra                           |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+
|  1 | SIMPLE      | ref1  | ALL    | PRIMARY       | NULL    | NULL    | NULL             | 6000 | Using temporary; Using filesort |
|  1 | SIMPLE      | t1    | ref    | PRIMARY       | PRIMARY | 4       | med31new.ref1.id |   38 | Using where                     |
|  1 | SIMPLE      | ref3  | ALL    | PRIMARY       | NULL    | NULL    | NULL             |    3 | Using where; Using join buffer  |
|  1 | SIMPLE      | ref2  | eq_ref | PRIMARY       | PRIMARY | 4       | med31new.t1.id2  |    1 |                                 |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+

(on a different system with ~5M rows EXPLAIN show t1 first in the list, with "Using where; Using index; Using temporary; Using filesort")

Is there something obvious I'm missing that would prevent the temporary table from being used?

First filesort does not mean a file is writtent on disk to perform the sort, it's the name of the quicksort algorithm in mySQL, check what-does-using-filesort-mean-in-mysql.

So the problematic keyword in your explain is Using temporary, not Using filesort. For that you can play with tmp_table_size & max_heap_table_size(put the same values on both) to allow more in-memory work and avoid temporary table creation, check this link on the subject with remarks about documentation mistakes.

Then you could try different index policy, and see the results, but do not try to avoid filesort.

Last thing, not related, you make a SUM(x) but x can takes NULL values, SUM(COALESCE(x) , 0) is maybe better if you do not want any NULL value on the Group to make your sum being NULL.

Add an index on JUST the DATE. Since that is the criteria of the first table, and the others are just joins, it will be optimized against the DATE first... the joins are secondary.

Isn't this:

SELECT id1, SUM(x)
  FROM t1
  INNER JOIN ref1 ON ref1.id = t1.id1
  INNER JOIN ref2 ON ref2.id = t1.id2
  INNER JOIN ref3 ON ref3.id = t1.id3
  WHERE d BETWEEN '2011-03-01' AND '2011-04-01'
  GROUP BY id1;

exactly equivalent to:

select id1, SUM(x)
  FROM t1
  WHERE d BETWEEN '2011-03-01' AND '2011-04-01'
  group by id1;

What are the extra tables being used for? I think the temp table mentioned in another answer is referring to MySQL creating a temp table during query execution. If you're hoping to create a sub-query (or table) that will minimize number of operations required in a join, that might speed up the query, but I don't see joined data being selected.

继续阅读：performance

MySQL performance with GROUP BY and JOIN

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

Best solution for private video database [closed]

imessage会显示已读吗？