Why scan type is changed from ALL to RANGE when using LIMIT on SQL queries + Optimize query
I have this query
SELECT l.licitatii_id,
l.nume,
l.data_publicarii,
l.data_limita
FROM licitatii_ue l
INNER JOIN domenii_licitatii dl
ON l.licitatii_id = dl.licitatii_id
AND dl.tip_licitatie = '2'
INNER JOIN domenii d
ON dl.domenii_id = d.domenii_id
AND d.status = 1
AND d.tip_domeniu = '1'
WHERE l.status = 1
AND Unix_timestamp(T开发者_运维技巧IMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita)
AS DATE)))
< '1300683793'
GROUP BY l.licitatii_id
ORDER BY data_publicarii DESC
Explain outputs:
+-----+--------------+--------+---------+-------------------------------------+----------+----------+---------------------------+-------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | SIMPLE | d | ALL | PRIMARY,key_status_tip_domeniu | NULL | NULL | NULL | 120 | 85.83 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dl | ref | PRIMARY,tip_licitatie,licitatii_id | PRIMARY | 4 | web61db1.d.domenii_id | 6180 | 100.00 | Using where; Using index |
| 1 | SIMPLE | l | eq_ref | PRIMARY | PRIMARY | 4 | web61db1.dl.licitatii_id | 1 | 100.00 | Using where |
+-----+--------------+--------+---------+-------------------------------------+----------+----------+---------------------------+-------+-----------+----------------------------------------------+
As you see type=ALL for d table
now if I add LIMIT 100
to the query
plan changes to range
:
+-----+--------------+--------+---------+-------------------------------------+-------------------------+----------+---------------------------+-------+-----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | SIMPLE | d | range | PRIMARY,key_status_tip_domeniu | key_status_tip_domeniu | 9 | NULL | 103 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | dl | ref | PRIMARY,tip_licitatie,licitatii_id | PRIMARY | 4 | web61db1.d.domenii_id | 6180 | 100.00 | Using where; Using index |
| 1 | SIMPLE | l | eq_ref | PRIMARY | PRIMARY | 4 | web61db1.dl.licitatii_id | 1 | 100.00 | Using where |
+-----+--------------+--------+---------+-------------------------------------+-------------------------+----------+---------------------------+-------+-----------+----------------------------------------------+
Why does this happen?
Can this query be optimized more, both queries take 13 seconds.Table schema is visible on gist github
MySQL
chooses domenii
as the leading table for the join.
This table is filtered on (status, tip_domeniu) = (1, 1)
.
It does not seem to be a very selective condition, so normally a full table scan with filtering would be preferred over the index scan.
We can see that MySQL
expects 120
records to be returned from domanii
for which this condition would hold.
When you add a LIMIT
, the number of records expected to be processed is decreased, and MySQL
considers the index scan more efficient for this.
Note that this condition:
Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) AS DATE))) < '1300683793'
is not sargable, so you deprive the optimizer to use an index on data_limita
.
Create the following indexes:
licitatii_ue (status, data_limita)
licitatii_ue (status, data_publicarii)
and rewrite the query like this:
SELECT l.licitatii_id,
l.nume,
l.data_publicarii,
l.data_limita
FROM licitatii_ue l
JOIN domenii_licitatii dl
ON l.licitatii_id = dl.licitatii_id
AND dl.tip_licitatie = '2'
JOIN domenii d
ON dl.domenii_id = d.domenii_id
AND d.status = 1
AND d.tip_domeniu = '1'
WHERE l.status = 1
AND l.data_limita < FROM_UNIXTIME(((1300683793 - 86400) div 86400) * 86400)
GROUP BY
l.licitatii_id
ORDER BY
data_publicarii DESC
Ah, the mysteries of the query optimizer are many and unknowable...
At a quick glance, the most obvious thing to optimize might be the
AND Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita)
AS DATE)))
clause.
depending on the number of records in the licitatii_ue table, this looks like an expensive operation, and it will bypass any indices available.
ALL is table scan, range is range scan (due to LIMIT). Nothing bad with that, actually it also causes a key to be used (key_status_tip_domeniu
).
The reason you are slow is, most likely, that you are using ORDER BY data_publicarii DESC
(this is easy to test, just drop the ORDER BY and benchmark the query; would expect few orders of magnitude).
Mysql admits (under Extra column of explain) that it is using filesort (needed for order by because it can't or does not know how to use an index). Adding yet another index to the mix might help, especially if you confirm that ORDER BY is making it slow.
EDIT Actually, you do have a cardinal sin in your query:
Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita)
AS DATE)))
< '1300683793'
Avoid applying any functions to your field values if you can apply them to a constant. So switch it around and rewrite it as
l.data_limita < some_function('1300683793')
However complext the some_function
would be, it will be calculated only once. Mysql planner will know it is a constant. The way you wrote it would force mysql to apply unix_timestamp
, timestampadd
, cast
and from_unixtime
to value of data_limita
from each row. Now in I/O bound systems this will usually just burn some extra CPU cycles while waiting for the disks to spin around (however, it might get significant, your system might get CPU bound and it is just a bad thing). Biggest difference is that you loose possibility to use an index on data_limita
.
Finally, all your indexes are singe field indexes and mysql does some index merging, but is not stellar in it. You might want to try creating indexes that cover all your conditions and sorting order (in order of selectivity for target query).
精彩评论