MySQL performance with GROUP BY and JOIN
After spending a lot of time with variants to this question I'm wondering if someone can help me optimize this query or indexes.
I have three temp tables ref1, ref2, ref3 all defined as below, with ref1 and ref2 each having about 6000 rows and ref3 only 3 rows:
CREATE TEMPORARY TABLE ref1 (
id INT NOT NULL AUTO_INCREMENT,
val INT,
PRIMARY KEY (id)
)
ENGINE = MEMORY;
The slow query is against a table like so, with about 1M rows:
CREATE TABLE t1 (
d DATETIME NOT NULL,
id1 INT NOT NULL,
id2 INT NOT NULL,
id3 INT NOT NULL,
x INT NULL,
PRIMARY KEY (id1, d, id2, id3)
)
ENGINE = INNODB;
The query in question:
SELECT id1, SUM(x)
FROM t1
INNER JOIN ref1 ON ref1.id = t1.id1
INNER JOIN ref2 ON ref2.开发者_StackOverflow社区id = t1.id2
INNER JOIN ref3 ON ref3.id = t1.id3
WHERE d BETWEEN '2011-03-01' AND '2011-04-01'
GROUP BY id1;
The temp tables are used to filter the result set down to just the items a user is looking for.
EXPLAIN
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+
| 1 | SIMPLE | ref1 | ALL | PRIMARY | NULL | NULL | NULL | 6000 | Using temporary; Using filesort |
| 1 | SIMPLE | t1 | ref | PRIMARY | PRIMARY | 4 | med31new.ref1.id | 38 | Using where |
| 1 | SIMPLE | ref3 | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using where; Using join buffer |
| 1 | SIMPLE | ref2 | eq_ref | PRIMARY | PRIMARY | 4 | med31new.t1.id2 | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+
(on a different system with ~5M rows EXPLAIN show t1 first in the list, with "Using where; Using index; Using temporary; Using filesort")
Is there something obvious I'm missing that would prevent the temporary table from being used?
First filesort does not mean a file is writtent on disk to perform the sort, it's the name of the quicksort algorithm in mySQL, check what-does-using-filesort-mean-in-mysql.
So the problematic keyword in your explain is Using temporary
, not Using filesort
. For that you can play with tmp_table_size
& max_heap_table_size
(put the same values on both) to allow more in-memory work and avoid temporary table creation, check this link on the subject with remarks about documentation mistakes.
Then you could try different index policy, and see the results, but do not try to avoid filesort.
Last thing, not related, you make a SUM(x)
but x can takes NULL values, SUM(COALESCE(x) , 0)
is maybe better if you do not want any NULL value on the Group to make your sum being NULL.
Add an index on JUST the DATE. Since that is the criteria of the first table, and the others are just joins, it will be optimized against the DATE first... the joins are secondary.
Isn't this:
SELECT id1, SUM(x)
FROM t1
INNER JOIN ref1 ON ref1.id = t1.id1
INNER JOIN ref2 ON ref2.id = t1.id2
INNER JOIN ref3 ON ref3.id = t1.id3
WHERE d BETWEEN '2011-03-01' AND '2011-04-01'
GROUP BY id1;
exactly equivalent to:
select id1, SUM(x)
FROM t1
WHERE d BETWEEN '2011-03-01' AND '2011-04-01'
group by id1;
What are the extra tables being used for? I think the temp table mentioned in another answer is referring to MySQL creating a temp table during query execution. If you're hoping to create a sub-query (or table) that will minimize number of operations required in a join, that might speed up the query, but I don't see joined data being selected.
精彩评论