Why does MySQL not use an index for a greater than comparison?
I am trying to optimize a bigger query and ran into this wall when I realized this part of the query was doing a full table scan, which in my mind does not make sense considering the field in question is a primary key. I would assume that the MySQL Optimizer would use the index.
Here is the table:
CREATE TABLE userapplication (
application_id int(11) NOT NULL auto_increment,
userid int(11) NOT NULL default '0',
accountid int(11) NOT NULL default '0',
resume_id int(11) NOT NULL default '0',
coverletter_id int(11) NOT NULL default '0',
user_email varchar(100) NOT NULL default '',
account_name varchar(200) NOT NULL default '',
resume_name varchar(255) NOT NULL default '',
resume_modified datetime NOT NULL default '0000-00-00 00:00:00',
cover_name varchar(255) NOT NULL default '',
cover_modified datetime NOT NULL default '0000-00-00 00:00:00',
application_status tinyint(4) NOT NULL default '0',
application_created datetime NOT NULL default '0000-00-00 00:00:00',
application_modified timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
publishid int(11) NOT NULL default '0',
application_visible int(11) default '1',
PRIMARY KEY (application_id),
KEY publishid (publishid),
KEY application_status (application_status),
KEY userid (userid),
KEY accountid (accountid),
KEY application_created (application_created),
KEY resume_id (resume_id),
KEY coverletter_id (coverletter_id),
) ENGINE=MyISAM ;
This simple query seems to do a full table scan:
SELECT * FROM userapplication WHERE application_id > 1025;
This is the output of the EXPLAIN:
+----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | userapplication | ALL | PRIMARY | NULL | NULL | NULL | 784422 | Using where | +----+-------------+-------------------+------+---------------+------+---------+------+--------+--开发者_运维技巧-----------+`
Any ideas how to prevent this simple query from doing a full table scan? Or am I out of luck?
You'd probably be better off letting MySql decide on the query plan. There is a good chance that doing an index scan would be less efficient than a full table scan.
There are two data structures on disk for this table
- The table itself; and
- The primary key B-Tree index.
When you run a query the optimizer has two options about how to access the data:
SELECT * FROM userapplication WHERE application_id > 1025;
Using The Index
- Scan the B-Tree index to find the address of all the rows where
application_id > 1025
- Read the appropriate pages of the table to get the data for these rows.
Not using the Index
Scan the entire table, and pick the appropriate records.
Choosing the best stratergy
The job of the query optimizer is to choose the most efficient strategy for getting the data you want. If there are a lot of rows with an application_id > 1025
then it can actually be less efficient to use the index. For example if 90% of the records have an application_id > 1025
then the query optimizer would have to scan around 90% of the leaf nodes of the b-tree index and then read at least 90% of the table as well to get the actual data; this would involve reading more data from disk than just scanning the table.
MyISAM
tables are not clustered, a PRIMARY KEY
index is a secondary index and requires an additional table lookup to get the other values.
It is several times more expensive to traverse the index and do the lookups. If you condition is not very selective (yields a large share of total records), MySQL
will consider table scan cheaper.
To prevent it from doing a table scan, you could add a hint:
SELECT *
FROM userapplication FORCE INDEX (PRIMARY)
WHERE application_id > 1025
, though it would not necessarily be more efficient.
Mysql definitely considers a full table scan cheaper than using the index; you can however force to use your primary key as preferred index with:
mysql> EXPLAIN SELECT * FROM userapplication FORCE INDEX (PRIMARY) WHERE application_id > 10; +----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | userapplication | range | PRIMARY | PRIMARY | 4 | NULL | 24 | Using where | +----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------------+
Note that using "USE INDEX" instead of "FORCE INDEX" to only hint mysql on the index to use, mysql still prefers a full table scan:
mysql> EXPLAIN SELECT * FROM userapplication USE INDEX (PRIMARY) WHERE application_id > 10; +----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | userapplication | ALL | PRIMARY | NULL | NULL | NULL | 34 | Using where | +----+-------------+-----------------+------+---------------+------+---------+------+------+-------------+
If your WHERE is a "greater than" comparison, it probably returns quite a few entries (and can realistically return all of them), therefore full table scans are usually preferred.
It should be the case of just typing:
SELECT * FROM userapplication WHERE application_id > 1025;
As detailed at this link. According to that guide, it should work where the application_id is a numeric value, for non-numeric values, you should type:
SELECT * FROM userapplication WHERE application_id > '1025';
I don't think there's anything wrong with your SELECT, maybe it's a table configuration problem?
精彩评论