Why scan type is changed from ALL to RANGE when using LIMIT on SQL queries + Optimize query

2023-02-17 21:28 问答作者：

I have this query

SELECT l.licitatii_id, 
       l.nume, 
       l.data_publicarii, 
       l.data_limita 
FROM   licitatii_ue l 
       INNER JOIN domenii_licitatii dl 
         ON l.licitatii_id = dl.licitatii_id 
            AND dl.tip_licitatie = '2' 
       INNER JOIN domenii d 
         ON dl.domenii_id = d.domenii_id 
            AND d.status = 1 
            AND d.tip_domeniu = '1' 
WHERE  l.status = 1 
       AND Unix_timestamp(T开发者_运维技巧IMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) 
                                               AS DATE))) 
           < '1300683793' 
GROUP  BY l.licitatii_id 
ORDER  BY data_publicarii DESC

Explain outputs:

+-----+--------------+--------+---------+-------------------------------------+----------+----------+---------------------------+-------+-----------+----------------------------------------------+
| id  | select_type  | table  | type    | possible_keys                       | key      | key_len  | ref                       | rows  | filtered  | Extra                                        |
| 1   | SIMPLE       | d      | ALL     | PRIMARY,key_status_tip_domeniu      | NULL     | NULL     | NULL                      | 120   | 85.83     | Using where; Using temporary; Using filesort |
| 1   | SIMPLE       | dl     | ref     | PRIMARY,tip_licitatie,licitatii_id  | PRIMARY  | 4        | web61db1.d.domenii_id     | 6180  | 100.00    | Using where; Using index                     |
| 1   | SIMPLE       | l      | eq_ref  | PRIMARY                             | PRIMARY  | 4        | web61db1.dl.licitatii_id  | 1     | 100.00    | Using where                                  |
+-----+--------------+--------+---------+-------------------------------------+----------+----------+---------------------------+-------+-----------+----------------------------------------------+

As you see type=ALL for d table

now if I add LIMIT 100 to the query

plan changes to range:

+-----+--------------+--------+---------+-------------------------------------+-------------------------+----------+---------------------------+-------+-----------+----------------------------------------------+
| id  | select_type  | table  | type    | possible_keys                       | key                     | key_len  | ref                       | rows  | filtered  | Extra                                        |
| 1   | SIMPLE       | d      | range   | PRIMARY,key_status_tip_domeniu      | key_status_tip_domeniu  | 9        | NULL                      | 103   | 100.00    | Using where; Using temporary; Using filesort |
| 1   | SIMPLE       | dl     | ref     | PRIMARY,tip_licitatie,licitatii_id  | PRIMARY                 | 4        | web61db1.d.domenii_id     | 6180  | 100.00    | Using where; Using index                     |
| 1   | SIMPLE       | l      | eq_ref  | PRIMARY                             | PRIMARY                 | 4        | web61db1.dl.licitatii_id  | 1     | 100.00    | Using where                                  |
+-----+--------------+--------+---------+-------------------------------------+-------------------------+----------+---------------------------+-------+-----------+----------------------------------------------+

Why does this happen?

Can this query be optimized more, both queries take 13 seconds.

Table schema is visible on gist github

MySQL chooses domenii as the leading table for the join.

This table is filtered on (status, tip_domeniu) = (1, 1).

It does not seem to be a very selective condition, so normally a full table scan with filtering would be preferred over the index scan.

We can see that MySQL expects 120 records to be returned from domanii for which this condition would hold.

When you add a LIMIT, the number of records expected to be processed is decreased, and MySQL considers the index scan more efficient for this.

Note that this condition:

Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) AS DATE))) < '1300683793'

is not sargable, so you deprive the optimizer to use an index on data_limita.

Create the following indexes:

licitatii_ue (status, data_limita)
licitatii_ue (status, data_publicarii)

and rewrite the query like this:

SELECT l.licitatii_id, 
       l.nume, 
       l.data_publicarii, 
       l.data_limita 
FROM   licitatii_ue l 
JOIN   domenii_licitatii dl 
ON     l.licitatii_id = dl.licitatii_id 
       AND dl.tip_licitatie = '2' 
JOIN   domenii d 
ON     dl.domenii_id = d.domenii_id 
       AND d.status = 1 
       AND d.tip_domeniu = '1' 
WHERE  l.status = 1
       AND l.data_limita < FROM_UNIXTIME(((1300683793 - 86400) div 86400) * 86400)
GROUP BY
       l.licitatii_id 
ORDER BY
       data_publicarii DESC

Ah, the mysteries of the query optimizer are many and unknowable...

At a quick glance, the most obvious thing to optimize might be the

AND Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) 
                                               AS DATE)))

clause.

depending on the number of records in the licitatii_ue table, this looks like an expensive operation, and it will bypass any indices available.

ALL is table scan, range is range scan (due to LIMIT). Nothing bad with that, actually it also causes a key to be used (key_status_tip_domeniu).

The reason you are slow is, most likely, that you are using ORDER BY data_publicarii DESC (this is easy to test, just drop the ORDER BY and benchmark the query; would expect few orders of magnitude).

Mysql admits (under Extra column of explain) that it is using filesort (needed for order by because it can't or does not know how to use an index). Adding yet another index to the mix might help, especially if you confirm that ORDER BY is making it slow.

EDIT Actually, you do have a cardinal sin in your query:

Unix_timestamp(TIMESTAMPADD(DAY, 1, CAST(From_unixtime(l.data_limita) AS DATE))) < '1300683793'

Avoid applying any functions to your field values if you can apply them to a constant. So switch it around and rewrite it as

l.data_limita < some_function('1300683793')

However complext the some_function would be, it will be calculated only once. Mysql planner will know it is a constant. The way you wrote it would force mysql to apply unix_timestamp, timestampadd, cast and from_unixtime to value of data_limita from each row. Now in I/O bound systems this will usually just burn some extra CPU cycles while waiting for the disks to spin around (however, it might get significant, your system might get CPU bound and it is just a bad thing). Biggest difference is that you loose possibility to use an index on data_limita.

Finally, all your indexes are singe field indexes and mysql does some index merging, but is not stellar in it. You might want to try creating indexes that cover all your conditions and sorting order (in order of selectivity for target query).

继续阅读：query-optimization sql

Why scan type is changed from ALL to RANGE when using LIMIT on SQL queries + Optimize query

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？