Why an index can make a query really slow?
Some day I answered a question on SO (accepted as correct), but the answer left me with a great doubt.
Shortly, user had a table with this fields:id INT PRIMARY KEY
dt DATETIME (with an INDEX)
lt DOUBLE
The query SELECT DATE(dt),AVG(lt) FROM table GROUP BY DATE(dt)
was really slow.
We told him that (part of) the problem was using DATE(dt) as field and grouping, but db was on a production server and wasn't possible to split that field.
da DATE (with an INDEX)
filled automatically with DATE(dt). Query SELECT da,AVG(lt) FROM table GROUP BY da
was a bit fa开发者_高级运维ster, but with about 8mln records it took about 60s!!!
I tried on my pc and finally I discovered that, removing the index on field da query took only 7s, while using DATE(dt) after removing index it took 13s.
I've always thought an index on column used for grouping could really speed the query up, not the contrary (8 times slower!!!).
Why? Which is the reason?
Thanks a lot.Because you still need to read all the data from both index + data file. Since you're not using any where
condition - you always will have the query plan, that access all the data, row by row and you can do nothing with this.
If performance is important for this query and it is performed often - I'd suggest to cache the results into some temporary table and update it hourly (daily, etc).
Why it becomes slower: because in index data is already sorted and when mysql calculates cost of the query execution it thinks that it will be better to use already sorted data, then group it, then calculate agregates. But it is not in this case.
I think this is because of this or similiar MySQL bug: Index degrades sort performance and optimizer does not honor IGNORE INDEX
I remember the question as I was going to answer it but got distracted with something else. The problem was that his table design wasnt taking advantage of a clustered primary key index.
I would have re-designed the table creating a composite clustered primary key with the date as the leading part of the index. The sm_id field is still just a sequential unsigned int to guarantee uniqueness.
drop table if exists speed_monitor;
create table speed_monitor
(
created_date date not null,
sm_id int unsigned not null,
load_time_secs double(10,4) not null default 0,
primary key (created_date, sm_id)
)
engine=innodb;
+------+----------+
| year | count(*) |
+------+----------+
| 2009 | 22723200 | 22 million
| 2010 | 31536000 | 31 million
| 2011 | 5740800 | 5 million
+------+----------+
select
created_date,
count(*) as counter,
avg(load_time_secs) as avg_load_time_secs
from
speed_monitor
where
created_date between '2010-01-01' and '2010-12-31'
group by
created_date
order by
created_date
limit 7;
-- cold runtime
+--------------+---------+--------------------+
| created_date | counter | avg_load_time_secs |
+--------------+---------+--------------------+
| 2010-01-01 | 86400 | 1.66546802 |
| 2010-01-02 | 86400 | 1.66662466 |
| 2010-01-03 | 86400 | 1.66081309 |
| 2010-01-04 | 86400 | 1.66582251 |
| 2010-01-05 | 86400 | 1.66522316 |
| 2010-01-06 | 86400 | 1.66859480 |
| 2010-01-07 | 86400 | 1.67320440 |
+--------------+---------+--------------------+
7 rows in set (0.23 sec)
精彩评论