Why doesn't this index work (Mysql)
I have this table:
CREATE TABLE `maindb`.`daily_info` (
`di_date` date NOT NULL,
`di_sid` int(10) unsigned NOT NULL default '0',
`di_type` int(10) unsigned NOT NULL default '0',
`di_name` varchar(20) NOT NULL default '',
`di_num` int(10) unsigned NOT NULL default '0',
`di_abt` varchar(1) NOT NULL default 'a',
PRIMARY KEY (`di_date`,`di_sid`,`di_type`,`di_name`,`di_abt`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
When I use this query:
explain
SELECT MONTH(di_date) as label1, DAYOFMONTH(di_date) as label2, sum(di_num) as count , di_abt as abt
FROM `daily_info`
WHERE di_sid=6
AND di_type = 4
AND di_name='clk-1'
AND di_date > '2009-10-01' AND di_date < '2009-10-16'
GROUP BY
DAYOFMONTH(di_date)
ORDER BY
TO_DAYS(di_date) DESC
I get:
1, 'SIMPLE', 'daily_info', 'range', 'PRIMARY', 'PRIMARY', '3', '', 2500, 'Using where; Using temporary; Using filesort'
When actually if the key worked and the query would be filtered by di_date, d开发者_StackOverflowi_sid and di_type, it would need to search only a few dozen rows.
What is wrong with the index (or query?)
Thanks!
You use the range condition on the first index column which kills possibility to filter on other columns.
There is no single contiguous range in this index which would contain those and only those records that satisfy the condition.
MySQL
is not able to do SKIP SCAN
which would jump over the distinct values of di_date
. That's why it does it's best: uses range
access to filter on di_date
and uses WHERE
to filter on all other fields.
Either recreate the index as this (the best decision):
PRIMARY KEY (`di_sid`,`di_type`,`di_name`,`di_date`,`di_abt`)
or, if you're unable to recreate the index, you can emulate the SKIP SCAN
:
SELECT MONTH(di.di_date) as label1, DAYOFMONTH(di.di_date) as label2, sum(di.di_num) as count , di.di_abt as abt
FROM (
SELECT DISTINCT di_date
FROM daily_info
WHERE di_date > '2009-10-01' AND di_date < '2009-10-16'
) do
JOIN daily_info di
ON di.di_date <= do.di_date
AND di.di_date>= do.di_date
AND di_sid = 6
AND di_type = 4
AND di_name = 'clk-1'
GROUP BY
DAYOFMONTH(di.di_date)
ORDER BY
TO_DAYS(di.di_date) DESC
Make sure that Using index for group-by
and Range checked for each record
are present in the plan.
This condition:
di.date <= do.date
AND di.date >= do.date
is used instead of simple di.date = do.date
to force the range checking.
See this article in my blog for more detailed explanation of emulating SKIP SCAN:
- Emulating SKIP SCAN
Update:
The latter query actually uses an equijoin and MySQL
optimizes it without the tricks.
The trick above applies only to the ranged queries, i. e. when the innermost loop should use the range
access, not the ref
access.
It would be useful if you had to do something like di_name <= 'clk-1'
This query should work fine:
SELECT MONTH(di.di_date) as label1, DAYOFMONTH(di.di_date) as label2, sum(di.di_num) as count , di.di_abt as abt
FROM (
SELECT DISTINCT di_date
FROM daily_info
WHERE di_date > '2009-10-01' AND di_date < '2009-10-16'
) do
JOIN daily_info di
ON di.di_date = do.di_date
AND di_sid = 6
AND di_type = 4
AND di_name = 'clk-1'
GROUP BY
DAYOFMONTH(di.di_date)
ORDER BY
TO_DAYS(di.di_date) DESC
Make sure that di
uses ref
access on the whole subkey possible here, with key_len = 33
Update 2
In your query, you are using these expressions out of the GROUP BY
:
MONTH(di_date)
TO_DAYS(di_date)
di_abt
The query as it is now will sum all values for the 1st
, 2nd
etc. for any month and year.
I. e. for the first group it will add up all values from Jan 1st, 2000
, then Feb 1st, 2000
, etc.
Then it will return any random value of MONTH
, any random value of TO_DAYS
and any random value of di_abt
from each group.
Your condition now is within a single month, so it's OK now, but if your condition will span multiple months (to say nothing of years), they query will produce unexpected results.
Do you really want to group by dates?
You are range-scanning the first part of the index - therefore it cannot use the subsequent parts of the index.
The way to improve this is to create another index with the fields in a different order which is more conducive to this particular query.
If your index was di_sid,di_type,di_date then it may be better.
精彩评论