MySQL index on numerical columns makes query slower
I have an optimisation problem with a fairly large table (~1.7M rows).
There are two columns used when selecting rows, let's call them colA and colB. They are both of type 'double' (5 decimal places) and range from:
colA: -90 ~ 90 colB: -180 ~ 180
Without an index, any query of the form:
SELECT * FROM table where colA BETWEEEN a and b AND colB BETWEEN c and d
takes approximately the same time to run (~ 1 second), regardless of the range of (a,b), and (c,d) (as MySQL has to examine every row).
If I add an index to colA and colB, two things happen: queries where the (a,b) & (c,d) range are small, for example:
SELECT * FROM table where colA BETWEEEN -4 and 4 AND colB BETWEEN 3 and 7
run very quickly (~ 1/10 second). However, the execution time increases with the range between the queried values. For example:
SELECT * FROM table where colA BETWEEEN -80 and 80 AND colB BETWEEN -150 and 150
takes about a minute to execute.
I know how B-trees works for strings, but I'm not sure of the mechanism when the data is numeric and the query is conducted using a range.
If anyone could suggest how to optimise this query I would be grateful. One thought is to use the index for small ranges and tell MySQL not to use it for larger ones, however I couldn't find a command which allows this.
Thanks
EDIT: the explains
There is something I stupidly forgot to mention. The results are ordered by rand() - I'm aware of how inefficient this is, but I could see no other way to get a limited number of rows from the table randomly.
Adding rand() does not affect the execution time when there is no index, but drastically increases the time taken when there is.
EDIT2: this is using composite indexes.
SMALL RANGE:
"explain select * from table where colA between 35 and 38 and colB between -10 and 5 ORDER BY RAND() LIMIT 20"
9783 rows
NO INDEX (fast)
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 1673784 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
WITH INDEX (very fast)
+----+-------------+-------+-------+---------------+------+---------+------+--------+--------开发者_如何学C-----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | table | range | test | test | 18 | NULL | 136222 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
"explain select * from table where colA between -80 and 80 and colB between -150 and 150 ORDER BY RAND() LIMIT 20;"
1631862 rows
NO INDEX (fast)
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | table | ALL | NULL | NULL | NULL | NULL | 1673784 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
WITH INDEX (very slow: > 60 seconds)
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | table | ALL | test | NULL | NULL | NULL | 1673784 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
EDIT3:
To summarize:(all queries limited to return 20 rows)
big range with rand() with index: 45 seconds
big range without rand(), with index: 0.003 seconds big range with rand, no index: 1 second big range without rand, no index: 0.003 secondsThe anomaly is: "big range with rand() with index, 45 seconds".
I know how B-trees works for strings, but I'm not sure of the mechanism when the data is numeric and the query is conducted using a range.
They work the same way for numbers as they do for strings.
Without the index the query takes approximately the same time to run (~ 1 second), regardless of the range of (a,b), and (c,d)
The runtime of a full table scan does not significantly vary with the contents of the WHERE condition. The time taken by the index access path is proportional to the number of rows returned. If a query selects a significant part of the table, using an index will always be slower than not using an index.
An index access path is only efficient if the index selectivity is sufficient, i.e. the number of rows retrieved is small (some say 10% at the very most). Execution time will be roughly proportional to the number of rows returned, and could end up slower than a full table scan.
One thought is to use the index for small ranges and tell MySQL not to use it for larger ones, however I couldn't find a command which allows this.
The query optimizer has to use statistics and heuristics to determine if an index should be used. Maybe you need to update these statistics using OPTIMIZE TABLE. If it still fails to make the proper decision, you can help it with hints.
SELECT * FROM table
IGNORE INDEX (the_index)
where colA BETWEEEN -80 and 80 AND colB BETWEEN -150 and 150
Other options could be deleting the index (if you never see any benefit from it, a constant one second response time might be good enough), or trying a composite index on both columns (also only if the number of records resulting from the query is small).
Now that you mention LIMIT 20, it starts making more sense:
big range with rand() with index: 45 seconds
NESTED LOOP with many results + SORT
Get ALL the records (in the range) from the index, fetch them one by one from the table, then sort, then limit to 20
big range without rand(), with index: 0.003 seconds
NESTED LOOP aborted at 20 records
Get 20 records from the index, fetch them one by one from the table, and return them. No sorting, in effect no big range.
big range with rand, no index: 1 second
FULL TABLE SCAN + SORT
Read through the whole table, keep what is in range, then sort, then limit to 20
big range without rand, no index: 0.003 seconds
FULL TABLE SCAN, aborted at 20 records
Start reading through the table, keep what is in range, stop when you have 20 and return.
Indexes with many duplicates are a waste.
Make sure your index uses both fields;
create index idx_faster on tbl_mytbl (colA,colB)
for colB you may add another,
create index idx_colb on tbl_mytbl (colB)
regards, /t
Last query should not take longer than first. MySQL may not have the index updated see OPTIMIZE TABLE
Also you can check how it is planning the query with EXPLAIN
and EXPLAIN ANALYZE
.
Finally, you can force the index to be disabled with IGNORE INDEX (idx_name)
精彩评论