Please help optimizing a long running query (left outer join, with 2 derived tables)
The query I need help with is:
SELECT d.bn, d.4700, d.4500, ... , p.`Activity Description`
FROM
( SELECT temp.bn, temp.4700, temp.4500, ....
FROM `tdata` temp
GROUP BY temp.bn
HAVING (COUNT(temp.bn) = 1) ) d
LEFT OUTER JOIN
( SELECT temp2.bn, max(temp2.FPE) AS max_fpe, temp2.`Activity Description`
FROM `pdata` temp2
GROUP BY temp2.bn ) p
ON p.bn = d.bn;
The ... represents other fields that aren't really important to solving this problem.
The issue is on the the second derived table - it is not using the index I have created and I am not sure why, it seems to be because of the way TEXT fields are handled. The first subquery uses the index I have created and runs quite snappy, however an EXPLAIN on the second shows a 'Using temporary; Using filesort'. Please see the indexes I have created in the below table create statements. Can anyone help me optimize this?
By way of quick explanation the first subquery is meant to only select records that have unique bn's, the second, while it looks a bit wacky (with the max function there which is not being used in the result set) is making sure that only one record from the right part of the join is included in the result set.
My table create statements are
CREATE TABLE `tdata` (
`BN` varchar(15) DEFAULT NULL,
`4000` varchar(3) DEFAULT NULL,
`5800` varchar(3) DEFAULT NULL,
....
KEY `BN` (`BN`),
KEY `idx_t3010`(`BN`,`4700`,`4500`,`4510`,`4520`,`4530`,`4570`,`4950`,`5000`,`5010`,`5020`,`5050`,`5060`,`5070`,`5100`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `pdata` (
`BN` varchar(15) DEFAULT NULL,
`FPE` datetime DEFAULT NULL,
`Activity Description` text,
....
KEY `BN` (`BN`),
KEY `idx_programs_2009` (`BN`,`FPE`,`Activity Description`(100))
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Thanks!
EDIT
Thanks to Malvolio and Brian Hooper for their comments. Malvolio's suggestion won't work for me as in both tables there are records with identical bn's and no common field that is unique to these records.
It really boils down to the the second derived table query:
SELECT temp2.bn, max(temp2.FPE) AS max_fpe, temp2.Activity Description
FROM pdata temp2
GROUP BY temp2.bn;
No matter what I do here to create an index on the Activity Description TEXT field, the query will not use it according to the EXPLAIN. If it would use an index, I'm sure this query would run great (as the first derived table query runs very speedy). Alternatively, if there was a better way of structuring this query to ensure there was 开发者_C百科only one record per bn, that would work too.
Thanks.
Subselects are usually the quickest way to a slow query. I'm not sure exactly what you are trying to do, but you could select the BN from pdata with the FPE with the following query
SELECT p.* FROM pdata p
LEFT JOIN pdata p0 ON p.BN = p0.BN AND p.FPE < p0.FPE
WHERE p0.BN IS NULL
Similarly, if you had some column in tdata that was unique (or unique among rows with the same BN)
SELECT t.* FROM tdata t
LEFT JOIN tdata t0 ON t.BN = t0.BN AND t.SOMEUNIQUEKEY != t0.SOMEUNIQUEKEY
WHERE t0.BN IS NULL
Something quirky about subselects: they are always much slower than an equivalent join. I think it's a bug.
EDIT
Hooper, the ichthyologist, was unclear on how the LEFT JOIN pdata p0 ON p.BN = p0.BN ... WHERE p0.BN IS NULL
stuff worked. Let me go through it step by step with a much simpler example. You have a table names, with a last name and a first name and you want to find the unique last names (that is, every last name held by only one person. The data are as follows:
last first
Smith Will
Smith John
Smith Adam
Jones John
First try the left-join by itself
SELECT n1.last, n1.first, n2.last, n2.first FROM names n1
LEFT JOIN names n2 ON n1.last = n2.last and n1.first != n2.first
that will return
last first last first
Smith Will Smith John
Smith John Smith Will
Smith Adam Smith John
Smith Will Smith Adam
Smith John Smith Adam
Smith Adam Smith Will
Jones John NULL NULL
Notice those nulls in the last row? That was no boating accident, that is the difference between an ordinary inner join and a left join. An inner join (find all pairs of rows with the same last name and a different first name) would have found the first six, but ignore the unpaired seventh. The only function of the LEFT JOIN is to pad out with nulls anything not filled in by the ON clause.
Now we pull only that row:
SELECT n1.last, n1.first, n2.last, n2.first FROM names n1
LEFT JOIN names n2 ON n1.last = n2.last and n1.first != n2.first
WHERE n2.last IS NULL
And (assuming no nulls in the underlying data), we get only the row that the ON clause failed to match.
Ta, and if I may so so, da.
精彩评论