Faster alternatives for "IN" statements?
I'm not great with MySQL, so I often find myself preparing sub-optimal queries that work, but I know must be horribly inefficient. I'm hoping you guys could give me some pointers on why the following q开发者_运维百科uery doesn't work well, and what methods I should use to accomplish similar queries.
I have the following table structure:
TABLE Files
files_id => INT(12), PRIMARY, AUTO INCREMENT, NOT NULL
files_name => VARCHAR(255), NOT NULL
(some other fields such as file type etc)
TABLE File_Permissions
perm_id => INT(12), PRIMARY, AUTO INCREMENT, NOT NULL
perm_files_id => INT(12), NOT NULL
perm_users_id => INT(12), NOT NULL
I pull a list of the files a user is allowed to view with the following SQL:
SELECT files_name FROM Files WHERE files_id IN
(SELECT perm_files_id FROM File_Permissions WHERE perm_users_id = 'xxxxxx');
This, as far as I can tell, will go through each of the thousands of records in the Files table, and for each one execute a subquery that selects from the File_Permissions table to check against the user's ID.
This takes almost 2 seconds per query. I'm sure something is fundamentally wrong with this, I just don't know what it is.
Thanks so much for the help!
For this sort of query you can use a JOIN, WHERE ... IN, or WHERE EXISTS. An approach using IN like you have posted should be fine assuming you have the appropriate indexes.
Just so you can compare with something else, here's an example of the WHERE EXISTS:
SELECT files_name FROM Files
WHERE EXISTS
(
SELECT *
FROM File_Permissions
WHERE perm_users_id = 'xxxxxx'
AND files_id = perm_files_id
)
But most important thing is: add the appropriate indexes! This can make a huge difference to the performance. If you are unsure if you have the correct indexes please look at the output of the following statements to see which indexes you have and which indexes the query is using:
EXPLAIN SELECT ...your query here...
SHOW CREATE TABLE Files
SHOW CREATE TABLE File_Permissions
If you still are't sure, edit the question to include the output of each of the above statements and also these:
SELECT COUNT(*) FROM Files
SELECT COUNT(*) FROM File_Permissions
SELECT COUNT(*) FROM (SELECT ...your query here...) T1
Most queries that involve an IN clause for a subquery can be refactored to use a join. In your case:
SELECT files_name
FROM Files
JOIN File_Permissions ON files_id = perm_files_id
WHERE perm_users_id = 'xxxxxx';
The above query will create a result set of the join between the two tables, then filter by the conditions. This requires two passes instead of N+1.
You can restructure your query as above, but you may also try putting an index on perm_users_id first. It will probably speed things up quite a bit.
Your tables need indexes. The query above shows me you need the following:
Table Files
needs and index on files_id
Table File_Permissions
needs and index on perm_users_id
That will make the query a lot faster.
try
SELECT files_name FROM Files LEFT JOIN File_permissions ON files_id = perm_files_id
AND perm_users_id = 'xxxxx'
also indexing the joined columns will help performance. So an Index on perm_files_id would increase performance
Two common alternatives are:
SELECT files_name
FROM Files f
WHERE EXISTS (
SELECT *
FROM File_Permissions
WHERE f.files_id = perm_files_id
AND perm_users_id = 'xxxxxx');
and:
SELECT DISTINCT files_name fn
FROM Files f
JOIN File_Permissions fp ON f.files_id = fp.perm_files_id
WHERE perm_users_id = 'xxxxxx';
I'm not sure why you're not just using a standard join as follows:
SELECT <required fields> FROM (Files, File_Permissions) WHERE
files_id = perm_files_id AND perm_user_id='xxxxx'
Beyond that, you should ensure that the appropriate indexes are set, etc.
Implicit joins are evil - see comments below. :-)
精彩评论