mysql subquery strangely slow
I have a query to select from another sub-query select. While the two queries look almost the same the second query (in this sample) runs much slower:
SELECT
user.id
,user.first_name
-- user.*
FROM user
WHERE
user.id IN (SELECT ref_id
FROM education
WHERE ref_type='user'
AND education.institute_id='58'
AND education.institute_type='1'
);
This query takes 1.2s Explain on this query results:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY user index first_name 152 141192 Using where; Using index
2 DEPENDENT SUBQUERY education index_subquery ref_type,ref_id,institute_id,institute_type,ref_type_2 ref_id 4 func 1 Using where
The second query:
SELECT
-- user.id
-- user.first_name
user.*
FROM user
WHERE
user.id IN (SELECT ref_id
FROM education
WHERE ref_type='user'
AND education.institute_id='58'
AND education.institute_type='1'
);
Takes 45se开发者_如何学JAVAc to run, with explain:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY user ALL 141192 Using where
2 DEPENDENT SUBQUERY education index_subquery ref_type,ref_id,institute_id,institute_type,ref_type_2 ref_id 4 func 1 Using where
Why is it slower if i query only by index fields? Why both queries scans the full length of the user table? Any ideas how to improve?
Thanks.
I'm not sure why it chooses to use the index when you select only two columns but not when you select all columns, but it is better to select only the columns you need anyway. Also it might be better to try a JOIN instead of a subquery:
SELECT
user.id
user.first_name
FROM user
JOIN education
ON user.id = education.ref_id
AND education.ref_type='user'
AND education.institute_id='58'
AND education.institute_type='1'
I have had several occasions where replacing "WHERE foo in (subquery)" with dumping the results of the subquery into a temporary table and using an inner join seriously improved the performance. (Like, a 6.5 minute query turning into a sub-second query.)
Which, er, is what Mark Byers just said.
This is what I think happens:
The query planner will turn the query into an inner join, which gives the database the freedom to start from either table when filtering out the result.
When you only select a few fields from the user table, the result from both tables are small, so the database can choose which table will filter the other from what's most efficient depending on what indexes can be used.
When you fetch all data from the user table, you are forcing it to use the education table to filter the user table as the intermediate result would be too large the other way around. There is no index that fits for matching that way, so you get a table scan which slows down the query.
(Excuse me if some of the terminology is coloured from SQL Server, that's what I regularly use.)
精彩评论