Query performance WHERE clause contains IN (subquery)
SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate,
SUM(TradeLine.Notional) / 1000 AS Expr1
FROM Trade INNER JOIN
TradeLine ON Trade.TradeId = TradeLine.TradeId
WHERE (TradeLine.Id IN
(SELECT PairOffId
FROM TradeLine AS TradeLine_1
WHERE (TradeDate <= '2011-05-11')
GROUP BY PairOffId
HAVING (SUM(Notional)开发者_运维问答 <> 0)))
GROUP BY Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate
ORDER BY Trade.Type, Trade.TradeDate
I am concerned about the performance of the IN in the WHERE clause when the table starts to grow. Does anyone have a better strategy for this kind of query? The number of records returned by the subquery grows much slower than the number of records in the TradeLine table. The TradeLine table itself grows at a rate of 10/day.
Thank you.
EDIT: I used the idea of moving the subquery from WHERE to FROM. I voted up on all answers that contributed to this new query.
SELECT Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate,
PairOff.Notional / 1000 AS Expr1
FROM Trade INNER JOIN
TradeLine ON Trade.TradeId = TradeLine.TradeId INNER JOIN
(SELECT PairOffId, SUM(Notional) AS Notional
FROM TradeLine AS TradeLine_1
WHERE (TradeDate <= '2011-05-11')
GROUP BY PairOffId
HAVING (SUM(Notional) <> 0)) AS PairOff ON TradeLine.Id = PairOff.PairOffId
ORDER BY Trade.Type, Trade.TradeDate
The subquery in the IN
clause does not depend on anything in the outer query. You can safely move it into FROM
clause; a sane query plan builder would do it automatically.
Also, calling EXPLAIN PLAN
on any query you're going to use in production is a must. Do it and see what the DBMS thinks of the plan for this query.
I'm a fan of temp tables when a sub-query starts returning too large a result set.
So your where
clause would just be
Where TradeLine.Id In (Select PairOffId From #tempResults)
and #tempResults
would be defined as (warning: syntax is from memory, which means there may be errors)
Select PairOffId Into #tempResults
From TradeLine
Where (TradeDate <= @TradeDate)
//I prefer params in case the query becomes a StoredProc
Group By PairOffId
Having (Sum(Notional) <> 0)
I have 2 suggestion you can try:
1). use Exists since you don't need get data from subquery, like this:
where exists ( select 1 from TradeLine AS TradeLine_1 where TradeLine.Id = TradeLine_1.PairOffId -- continue with your subquery ... )
2). main query join to your subquery, for instance
... join ( your_subquery) on your_subquery.PairOffId = TradeLine.Id
I believe these 2 ways can achieve better performance than "In" operation.
I have faced same issue with hundreds of thousands of records in XXXXXX DB. In my code i want to retrieve hierarchy (node which contain at least one child) nodes from all nodes.
The initial query written which is very slow.
SELECT SUPPLIER_ID, PARENT_SUPPLIER_ID,
FROM SUPPLIER
WHERE
SUPPLIER_ID != PARENT_SUPPLIER_ID
OR
SUPPLIER_ID IN
(SELECT DISTINCT PARENT_SUPPLIER_ID
FROM SUPPLIER
WHERE SUPPLIER_ID != PARENT_SUPPLIER_ID
);
Then re-written to
SELECT a.SUPPLIER_ID, a.PARENT_SUPPLIER_ID,
FROM SUPPLIER a
LEFT JOIN
(SELECT DISTINCT PARENT_SUPPLIER_ID
FROM SUPPLIER
WHERE SUPPLIER_ID != PARENT_SUPPLIER_ID
) b
ON a. SUPPLIER_ID = b.PARENT_SUPPLIER_ID
WHERE a. SUPPLIER_ID != a.PARENT_SUPPLIER_ID
OR a. SUPPLIER_ID = b.PARENT_SUPPLIER_ID;
Using an IN will essentially force you to do a table scan. When your table grows, your execution time grows. Also you are running that query for each record returned. Would be easier to use a scalar select as a table:
SELECT t.TradeId, t.Type, t.Symbol, t.TradeDate,
SUM(TradeLine.Notional) / 1000 AS Expr1
FROM Trade t,
(SELECT TradeId, PairOffID
FROM TradeLine AS TradeLine_1
WHERE (TradeDate <= '2011-05-11')
GROUP BY PairOffId
HAVING (SUM(Notional) <> 0)) tl
WHERE t.TradeId = tl.TradeId
and t.id <> tl.PairOffID
GROUP BY Trade.TradeId, Trade.Type, Trade.Symbol, Trade.TradeDate
ORDER BY Trade.Type, Trade.TradeDate
精彩评论