开发者

Why scan type is changed from ALL to RANGE when using LIMIT on SQL queries + Optimize query

I have this query

SELECT l.licitatii_id, 
       l.nume, 
       l.data_publicarii, 
       l.data_limita 
FROM   licitatii_ue l 
       INNER JOIN domenii_licitatii dl 
         ON l.licitatii_id = dl.licitatii_id 
            AND dl.tip_licitatie = '2' 
       INNER JOIN domenii d 
         ON dl.domenii_id = d.domenii_id 
            AND d.status = 1 
            AND d.tip_domeniu = '1' 
WHERE  l.status = 1 
       AND Unix_timestamp(T开发者_运维技巧IMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) 
                                               AS DATE))) 
           < '1300683793' 
GROUP  BY l.licitatii_id 
ORDER  BY data_publicarii DESC 

Explain outputs:

+-----+--------------+--------+---------+-------------------------------------+----------+----------+---------------------------+-------+-----------+----------------------------------------------+
| id  | select_type  | table  | type    | possible_keys                       | key      | key_len  | ref                       | rows  | filtered  | Extra                                        |
| 1   | SIMPLE       | d      | ALL     | PRIMARY,key_status_tip_domeniu      | NULL     | NULL     | NULL                      | 120   | 85.83     | Using where; Using temporary; Using filesort |
| 1   | SIMPLE       | dl     | ref     | PRIMARY,tip_licitatie,licitatii_id  | PRIMARY  | 4        | web61db1.d.domenii_id     | 6180  | 100.00    | Using where; Using index                     |
| 1   | SIMPLE       | l      | eq_ref  | PRIMARY                             | PRIMARY  | 4        | web61db1.dl.licitatii_id  | 1     | 100.00    | Using where                                  |
+-----+--------------+--------+---------+-------------------------------------+----------+----------+---------------------------+-------+-----------+----------------------------------------------+

As you see type=ALL for d table

now if I add LIMIT 100 to the query

plan changes to range:

+-----+--------------+--------+---------+-------------------------------------+-------------------------+----------+---------------------------+-------+-----------+----------------------------------------------+
| id  | select_type  | table  | type    | possible_keys                       | key                     | key_len  | ref                       | rows  | filtered  | Extra                                        |
| 1   | SIMPLE       | d      | range   | PRIMARY,key_status_tip_domeniu      | key_status_tip_domeniu  | 9        | NULL                      | 103   | 100.00    | Using where; Using temporary; Using filesort |
| 1   | SIMPLE       | dl     | ref     | PRIMARY,tip_licitatie,licitatii_id  | PRIMARY                 | 4        | web61db1.d.domenii_id     | 6180  | 100.00    | Using where; Using index                     |
| 1   | SIMPLE       | l      | eq_ref  | PRIMARY                             | PRIMARY                 | 4        | web61db1.dl.licitatii_id  | 1     | 100.00    | Using where                                  |
+-----+--------------+--------+---------+-------------------------------------+-------------------------+----------+---------------------------+-------+-----------+----------------------------------------------+

Why does this happen?

Can this query be optimized more, both queries take 13 seconds.

Table schema is visible on gist github


MySQL chooses domenii as the leading table for the join.

This table is filtered on (status, tip_domeniu) = (1, 1).

It does not seem to be a very selective condition, so normally a full table scan with filtering would be preferred over the index scan.

We can see that MySQL expects 120 records to be returned from domanii for which this condition would hold.

When you add a LIMIT, the number of records expected to be processed is decreased, and MySQL considers the index scan more efficient for this.

Note that this condition:

Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) AS DATE))) < '1300683793'

is not sargable, so you deprive the optimizer to use an index on data_limita.

Create the following indexes:

licitatii_ue (status, data_limita)
licitatii_ue (status, data_publicarii)

and rewrite the query like this:

SELECT l.licitatii_id, 
       l.nume, 
       l.data_publicarii, 
       l.data_limita 
FROM   licitatii_ue l 
JOIN   domenii_licitatii dl 
ON     l.licitatii_id = dl.licitatii_id 
       AND dl.tip_licitatie = '2' 
JOIN   domenii d 
ON     dl.domenii_id = d.domenii_id 
       AND d.status = 1 
       AND d.tip_domeniu = '1' 
WHERE  l.status = 1
       AND l.data_limita < FROM_UNIXTIME(((1300683793 - 86400) div 86400) * 86400)
GROUP BY
       l.licitatii_id 
ORDER BY
       data_publicarii DESC 


Ah, the mysteries of the query optimizer are many and unknowable...

At a quick glance, the most obvious thing to optimize might be the

AND Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) 
                                               AS DATE))) 

clause.

depending on the number of records in the licitatii_ue table, this looks like an expensive operation, and it will bypass any indices available.


ALL is table scan, range is range scan (due to LIMIT). Nothing bad with that, actually it also causes a key to be used (key_status_tip_domeniu).

The reason you are slow is, most likely, that you are using ORDER BY data_publicarii DESC (this is easy to test, just drop the ORDER BY and benchmark the query; would expect few orders of magnitude).

Mysql admits (under Extra column of explain) that it is using filesort (needed for order by because it can't or does not know how to use an index). Adding yet another index to the mix might help, especially if you confirm that ORDER BY is making it slow.

EDIT Actually, you do have a cardinal sin in your query:

Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) AS DATE))) < '1300683793'

Avoid applying any functions to your field values if you can apply them to a constant. So switch it around and rewrite it as

l.data_limita < some_function('1300683793')

However complext the some_function would be, it will be calculated only once. Mysql planner will know it is a constant. The way you wrote it would force mysql to apply unix_timestamp, timestampadd, cast and from_unixtime to value of data_limita from each row. Now in I/O bound systems this will usually just burn some extra CPU cycles while waiting for the disks to spin around (however, it might get significant, your system might get CPU bound and it is just a bad thing). Biggest difference is that you loose possibility to use an index on data_limita.

Finally, all your indexes are singe field indexes and mysql does some index merging, but is not stellar in it. You might want to try creating indexes that cover all your conditions and sorting order (in order of selectivity for target query).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜