Query Optimization using WHERE IN
I'm wondering if someone can explain how the IN calculates? Well, u开发者_StackOverflow社区ltimately I'm trying to find out why this query is slow and how to optimize it. I waited over 3 minutes and when I cancelled the query it had only returned 1000 lines which doesn't seem like it should take that long.
SELECT t2.*
FROM report_tables.roc_test_results as t2
WHERE t2.job IN (SELECT DISTINCT(t1.job)
FROM report_tables.roc_test_results as t1
WHERE t1.operation = 'TEST'
AND result = 'Passed'
AND STR_TO_DATE(t1.date_created,'%d-%M-%Y') BETWEEN '2009-10-01'
AND '2009-10-31')
I'm not sure what the total query should return, if I had to guess I would say around 2000 records, the subquery returns 332 (336 when not Distinct).
Can anyone give me some pointers on how to optimize this query? Also, I'm wondering, does the subquery calculate every time or just once and store it?
As requested, the results for DESC... (by the way, please don't laugh, I am self taught so I'm sure this table is hideously designed.)
Field Type Null Key Default Extra
------ ----- ----- --- ------- -----
operation varchar(10) NO
tester varchar(25) NO
result varchar(45) NO
fail_mode varchar(45) NO
primary_failure varchar(25) NO
ref_des varchar(45) NO
rf_hours varchar(15) NO
ac_hours varchar(15) NO
comments text NO
job varchar(15) NO
rma bigint(20) unsigned NO
item varchar(45) NO
item_description text NO
serial varchar(25) NO
created_by varchar(25) NO
collection bigint(20) unsigned NO PRI
date_created varchar(15) NO
The date_created
data type needs to change to be a DATETIME before it's worth defining an index on the column. The reason being, the index will be worthless if you are changing the data type from string to DATETIME as you are currently.
You've mentioned that you're using LOAD DATA INFILE
, and that the source file contains dates in DD-MON-YY format. MySQL will implicitly convert strings into DATETIME if the YY-MM-DD format is used, so if you can correct this in your source file before using LOAD DATA INFILE
the rest should fall in to place.
After that, a covering index using:
- job
- operation
- result
- date_created
...would be a good idea.
first of all you don't need the distinct in the subquery since IN eliminates duplicates anyhow Do you need the function call in the WHERE clause and do you have and index on the date_created column?
what happens when you change
WHERE STR_TO_DATE(t1.date_created,'%d-%M-%Y')
BETWEEN '2009-10-01' AND '2009-10-31')
to
WHERE 1.date_created >= '2009-10-01'
AND 1.date_created < '2010-01-01'
Sometimes indexes won't be used if you use functions on the column
My advice is to replace the IN with a JOIN, and then consider adding indexes on some of your columns, such as job, and maybe operation and/or result. You should read up on indexes in the MySQL manual, and also on using EXPLAIN to optimize your queries:
http://dev.mysql.com/doc/refman/5.1/en/indexes.html
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html
Here's an example of converting the IN to a JOIN:
SELECT distinct t2.*
FROM roc_test_results as t2
inner join roc_test_results as t1 on t1.job = t2.job
WHERE t1.operation = 'TEST'
AND t1.result = 'Passed'
AND STR_TO_DATE(t1.date_created,'%d-%M-%Y') BETWEEN '2009-10-01' AND '2009-10-31';
精彩评论