Please help optimizing a long running query (left outer join, with 2 derived tables)

2023-02-02 10:50 问答作者：

The query I need help with is:

SELECT d.bn, d.4700, d.4500, ... , p.`Activity Description`  
FROM   
( SELECT temp.bn, temp.4700, temp.4500, ....  
FROM `tdata` temp  
GROUP BY temp.bn  
HAVING (COUNT(temp.bn) = 1) ) d   
LEFT OUTER JOIN  
( SELECT temp2.bn, max(temp2.FPE) AS max_fpe, temp2.`Activity Description`
FROM `pdata` temp2
GROUP BY temp2.bn ) p  
ON p.bn = d.bn;

The ... represents other fields that aren't really important to solving this problem.

The issue is on the the second derived table - it is not using the index I have created and I am not sure why, it seems to be because of the way TEXT fields are handled. The first subquery uses the index I have created and runs quite snappy, however an EXPLAIN on the second shows a 'Using temporary; Using filesort'. Please see the indexes I have created in the below table create statements. Can anyone help me optimize this?

By way of quick explanation the first subquery is meant to only select records that have unique bn's, the second, while it looks a bit wacky (with the max function there which is not being used in the result set) is making sure that only one record from the right part of the join is included in the result set.

My table create statements are

CREATE TABLE `tdata` (
`BN` varchar(15) DEFAULT NULL,
`4000` varchar(3) DEFAULT NULL,
`5800` varchar(3) DEFAULT NULL,
....
KEY `BN` (`BN`),
KEY `idx_t3010`(`BN`,`4700`,`4500`,`4510`,`4520`,`4530`,`4570`,`4950`,`5000`,`5010`,`5020`,`5050`,`5060`,`5070`,`5100`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

CREATE TABLE `pdata` (
`BN` varchar(15) DEFAULT NULL,
`FPE` datetime DEFAULT NULL,
`Activity Description` text,
....
KEY `BN` (`BN`),
KEY `idx_programs_2009` (`BN`,`FPE`,`Activity Description`(100))
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Thanks!

EDIT

Thanks to Malvolio and Brian Hooper for their comments. Malvolio's suggestion won't work for me as in both tables there are records with identical bn's and no common field that is unique to these records.

It really boils down to the the second derived table query:

SELECT temp2.bn, max(temp2.FPE) AS max_fpe, temp2.Activity Description 
FROM pdata temp2 
GROUP BY temp2.bn;

No matter what I do here to create an index on the Activity Description TEXT field, the query will not use it according to the EXPLAIN. If it would use an index, I'm sure this query would run great (as the first derived table query runs very speedy). Alternatively, if there was a better way of structuring this query to ensure there was 开发者_C百科only one record per bn, that would work too.

Thanks.

Subselects are usually the quickest way to a slow query. I'm not sure exactly what you are trying to do, but you could select the BN from pdata with the FPE with the following query

SELECT p.* FROM pdata p 
LEFT JOIN pdata p0 ON p.BN = p0.BN AND p.FPE < p0.FPE 
WHERE p0.BN IS NULL

Similarly, if you had some column in tdata that was unique (or unique among rows with the same BN)

SELECT t.* FROM tdata t 
LEFT JOIN tdata t0 ON t.BN = t0.BN AND t.SOMEUNIQUEKEY != t0.SOMEUNIQUEKEY 
WHERE t0.BN IS NULL

Something quirky about subselects: they are always much slower than an equivalent join. I think it's a bug.

EDIT

Hooper, the ichthyologist, was unclear on how the LEFT JOIN pdata p0 ON p.BN = p0.BN ... WHERE p0.BN IS NULL stuff worked. Let me go through it step by step with a much simpler example. You have a table names, with a last name and a first name and you want to find the unique last names (that is, every last name held by only one person. The data are as follows:

last first
Smith Will
Smith John
Smith Adam
Jones John

First try the left-join by itself

SELECT n1.last, n1.first, n2.last, n2.first FROM names n1 
  LEFT JOIN names n2 ON n1.last = n2.last and n1.first != n2.first

that will return

last first last first
Smith Will Smith John 
Smith John Smith Will
Smith Adam Smith John 
Smith Will Smith Adam 
Smith John Smith Adam 
Smith Adam Smith Will
Jones John NULL  NULL

Notice those nulls in the last row? That was no boating accident, that is the difference between an ordinary inner join and a left join. An inner join (find all pairs of rows with the same last name and a different first name) would have found the first six, but ignore the unpaired seventh. The only function of the LEFT JOIN is to pad out with nulls anything not filled in by the ON clause.

Now we pull only that row:

SELECT n1.last, n1.first, n2.last, n2.first FROM names n1 
  LEFT JOIN names n2 ON n1.last = n2.last and n1.first != n2.first
  WHERE n2.last IS NULL

And (assuming no nulls in the underlying data), we get only the row that the ON clause failed to match.

Ta, and if I may so so, da.

继续阅读：indexing optimization query-optimization sql

Please help optimizing a long running query (left outer join, with 2 derived tables)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？