MySQL index on numerical columns makes query slower

2023-01-30 09:08 问答作者：

I have an optimisation problem with a fairly large table (~1.7M rows).

There are two columns used when selecting rows, let's call them colA and colB. They are both of type 'double' (5 decimal places) and range from:

colA: -90 ~ 90 colB: -180 ~ 180

Without an index, any query of the form:

SELECT * FROM table where colA BETWEEEN a and b AND colB BETWEEN c and d

takes approximately the same time to run (~ 1 second), regardless of the range of (a,b), and (c,d) (as MySQL has to examine every row).

If I add an index to colA and colB, two things happen: queries where the (a,b) & (c,d) range are small, for example:

SELECT * FROM table where colA BETWEEEN -4 and 4 AND colB BETWEEN 3 and 7

run very quickly (~ 1/10 second). However, the execution time increases with the range between the queried values. For example:

 SELECT * FROM table where colA BETWEEEN -80 and 80 AND colB BETWEEN -150 and 150

takes about a minute to execute.

I know how B-trees works for strings, but I'm not sure of the mechanism when the data is numeric and the query is conducted using a range.

If anyone could suggest how to optimise this query I would be grateful. One thought is to use the index for small ranges and tell MySQL not to use it for larger ones, however I couldn't find a command which allows this.

Thanks

EDIT: the explains

There is something I stupidly forgot to mention. The results are ordered by rand() - I'm aware of how inefficient this is, but I could see no other way to get a limited number of rows from the table randomly.

Adding rand() does not affect the execution time when there is no index, but drastically increases the time taken when there is.

EDIT2: this is using composite indexes.

SMALL RANGE:

"explain select * from table where colA between 35 and 38 and colB between -10 and 5 ORDER BY RAND() LIMIT 20"

9783 rows

NO INDEX (fast)

+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | table | ALL  | NULL          | NULL | NULL    | NULL | 1673784 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

WITH INDEX (very fast)

+----+-------------+-------+-------+---------------+------+---------+------+--------+--------开发者_如何学C-----+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | table | range | test          | test | 18      | NULL | 136222 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+

LARGE RANGE:

"explain select * from table where colA between -80 and 80 and colB between -150 and 150 ORDER BY RAND() LIMIT 20;"

1631862 rows

NO INDEX (fast)

+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | table | ALL  | NULL          | NULL | NULL    | NULL | 1673784 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

WITH INDEX (very slow: > 60 seconds)

+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | table | ALL  | test          | NULL | NULL    | NULL | 1673784 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

EDIT3:

To summarize:(all queries limited to return 20 rows)

big range with rand() with index: 45 seconds

big range without rand(), with index: 0.003 seconds

big range with rand, no index: 1 second

big range without rand, no index: 0.003 seconds

The anomaly is: "big range with rand() with index, 45 seconds".

I know how B-trees works for strings, but I'm not sure of the mechanism when the data is numeric and the query is conducted using a range.

They work the same way for numbers as they do for strings.

Without the index the query takes approximately the same time to run (~ 1 second), regardless of the range of (a,b), and (c,d)

The runtime of a full table scan does not significantly vary with the contents of the WHERE condition. The time taken by the index access path is proportional to the number of rows returned. If a query selects a significant part of the table, using an index will always be slower than not using an index.

An index access path is only efficient if the index selectivity is sufficient, i.e. the number of rows retrieved is small (some say 10% at the very most). Execution time will be roughly proportional to the number of rows returned, and could end up slower than a full table scan.

One thought is to use the index for small ranges and tell MySQL not to use it for larger ones, however I couldn't find a command which allows this.

The query optimizer has to use statistics and heuristics to determine if an index should be used. Maybe you need to update these statistics using OPTIMIZE TABLE. If it still fails to make the proper decision, you can help it with hints.

SELECT * FROM table 
   IGNORE INDEX (the_index)
   where colA BETWEEEN -80 and 80 AND colB BETWEEN -150 and 150

Other options could be deleting the index (if you never see any benefit from it, a constant one second response time might be good enough), or trying a composite index on both columns (also only if the number of records resulting from the query is small).

Now that you mention LIMIT 20, it starts making more sense:

big range with rand() with index: 45 seconds

NESTED LOOP with many results + SORT

Get ALL the records (in the range) from the index, fetch them one by one from the table, then sort, then limit to 20

big range without rand(), with index: 0.003 seconds

NESTED LOOP aborted at 20 records

Get 20 records from the index, fetch them one by one from the table, and return them. No sorting, in effect no big range.

big range with rand, no index: 1 second

FULL TABLE SCAN + SORT

Read through the whole table, keep what is in range, then sort, then limit to 20

big range without rand, no index: 0.003 seconds

FULL TABLE SCAN, aborted at 20 records

Start reading through the table, keep what is in range, stop when you have 20 and return.

Indexes with many duplicates are a waste.

Make sure your index uses both fields;

create index idx_faster on tbl_mytbl (colA,colB)

for colB you may add another,

create index idx_colb on tbl_mytbl (colB)

regards, /t

Last query should not take longer than first. MySQL may not have the index updated see OPTIMIZE TABLE

Also you can check how it is planning the query with EXPLAIN and EXPLAIN ANALYZE.

Finally, you can force the index to be disabled with IGNORE INDEX (idx_name)

继续阅读：indexing optimization

MySQL index on numerical columns makes query slower

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？