Query time with a JOIN using a subquery versus a string
I have a query like this :
SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
AND t2.time IN (SELECT year FROM activeYears WHERE active = 1)
Where activeYears.year is NVARCHAR(50), each year is a row.
Why is that join's run time faster than this query :
SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id AND t2.time IN ('2009','2010')
This is short, simple version, basically I have a large query with a join that uses a sub-query. When I change the sub-query to a string for testing, it takes double the time to run, even when clearing the cache. I think it might be a casting issue, but I tried declaring two variables as NVARCHAR(50) as well, and using them in the query and it made no difference.
It has just been puzzling me for a few days, and I don't understand why the sub-query is faster unless the query is actually built differently in some manner.
Thanks!
edit -- exec plan information
I diff'd the two execution plans, and will try to give you the anonymized highlights.
The MissingIndexes section of the execution plan for the faster (sub-query) query :
<QueryPlan CachedPlanSize="196" CompileTime="2166" CompileCPU="2166" CompileMemory="18640">
<MissingIndexes>
<MissingIndexGroup Impact="41.4663">
<MissingIndex Database="[database]" Schema="[dbo]" Table="[table2]">
<ColumnGroup Usage="EQUALITY">
<Column Name="[time]" ColumnId="3" />
<Column Name="[id]" ColumnId="10" />
</ColumnGroup>
</MissingIndex>
</MissingIndexGroup>
</MissingIndexes>
The MissingIndexes section of the query using a string for t2.time IN ('2009','2010')
<MissingIndexes>
<MissingIndexGroup Impact="35.4994">
<MissingIndex Database="[database]" Schema="[dbo]" Table="[table2]">
<ColumnGroup Usage="INEQUALITY">
<Column Name="[time]" ColumnId="3" />
</ColumnGroup>
<ColumnGroup Usage="INCLUDE">
<Column Name="[id]" ColumnId="10" />
<Column Name="[field1]" ColumnId="15" />
</ColumnGroup>
</MissingIndex>
</MissingIndexGroup>
<MissingIndexGroup Impact="44.364">
<MissingIndex Database="[database]" Schema="[dbo]" Table="[table2]">
<ColumnGroup Usage="EQUALITY">
<Column Name="[time]" ColumnId="3" />
<Column Name="[id]" ColumnId="10" />
</ColumnGroup>
<ColumnGroup Usage="INCLUDE">
<Column Name="[field1]" ColumnId="15" />
</ColumnGroup>
</MissingIndex>
</MissingIndexGroup>
</MissingIndexes>
And then there is a nested loop that is built differently in each plan, this is the nested loop that is drastically different, in the same order as above, first sub-query, then string version :
<RelOp AvgRowSize="878" EstimateCPU="4.45867E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1.06667" LogicalOp="Left Outer Join" NodeId="1" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="4.44321">
<OutputList>
--SNIPPED--
</OutputList>
<NestedLoops Optimized="false">
<OuterReferences>
<ColumnReference Database="[database]" Schema="[dbo]" Table="[table1]" Alias="[t1]" Column="time" />
<ColumnReference Database="[database]" Schema="[dbo]" Table="[table1]" Alias="[t1]" Column="id" />
</OuterReferences>
The same nested loop, but from the second query :
<RelOp AvgRowSize="878" EstimateCPU="0.223308" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1.06667" LogicalOp="Left Outer Join" NodeId="1" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="4.66781">
<OutputList>
-- SNIPPED --
</OutputList>
<NestedLoops Optimized="false">
<Predicate>
<ScalarOperator ScalarString="[database].[dbo].[table1].[time] as [t1].[time]=[database].[dbo].[table2].[time] as [t2].[time] AND [database].[dbo].[table1].[id] as [t1].[id]=[database].[dbo].[table2].[id] as [t2].[id]">
<Logical Operation="AND">
<ScalarOperator>
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
开发者_开发知识库 <ColumnReference Database="[database]" Schema="[dbo]" Table="[table1]" Alias="[t1]" Column="time" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[database]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="time" />
</Identifier>
</ScalarOperator>
</Compare>
</ScalarOperator>
<ScalarOperator>
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[database]" Schema="[dbo]" Table="[table1]" Alias="[t1]" Column="id" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[database]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="id" />
</Identifier>
</ScalarOperator>
</Compare>
</ScalarOperator>
</Logical>
</ScalarOperator>
</Predicate>
And finally, same order, there is a piece of another, different nested loop that looks like it may make a difference because it seems to show how that section of the query is built, first the sub-query plan's version :
<ScalarOperator ScalarString="[database].[dbo].[table1].[id] as [t1].[id]=[database].[dbo].[table2].[id] as [t2].[id] AND [database].[dbo].[table1].[time] as [t1].[time]=[database].[dbo].[table2].[time] as [t2].[time] AND [database].[dbo].[table2].[time] as [t2].[time]>=N'2009' AND [database].[dbo].[table2].[time] as [t2].[time]<=N'2010'">
Then the string version, IN ('2009', '2010') :
<ScalarOperator ScalarString="[database].[dbo].[table2].[time] as [t2].[time]=N'2009' OR [database].[dbo].[table2].[time] as [t2].[time]=N'2010'">
2nd edit -- statistics information
Per request, here is SET STATISTICS TIME ON
and SET STATISTICS IO ON
, in the same order as above, sub-query first:
Table 'activeYear'. Scan count 2, logical reads 2010, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table2'. Scan count 1, logical reads 2339848, physical reads 0, read-ahead reads 2303, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table3'. Scan count 1016, logical reads 4624, physical reads 21, read-ahead reads 1047, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 12, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table4'. Scan count 1, logical reads 126, physical reads 0, read-ahead reads 126, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table1'. Scan count 1033, logical reads 5331, physical reads 57, read-ahead reads 123, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table5'. Scan count 1, logical reads 219, physical reads 0, read-ahead reads 219, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table6'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 10328 ms, elapsed time = 11479 ms.
Then string
Table 'table2'. Scan count 1, logical reads 2339848, physical reads 0, read-ahead reads 2303, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table3'. Scan count 1016, logical reads 4467, physical reads 21, read-ahead reads 1047, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 659, logical reads 5863, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table4'. Scan count 1, logical reads 126, physical reads 0, read-ahead reads 126, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table1'. Scan count 1033, logical reads 5228, physical reads 60, read-ahead reads 120, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table5'. Scan count 1, logical reads 219, physical reads 0, read-ahead reads 219, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'activeYear'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table6'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 16719 ms, elapsed time = 17447 ms.
(There are a few more tables in this join than the simplified query, but it's only the string versus sub-query change on that single join I am concerned with... so hopefully that is isolated, since t1 joins separately to all the tables in this query.)
Educated guess: 1st query's IN is converted to JOIN:
SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
JOIN activeYears ay ON t2.time = ay.year
WHERE ay.active = 1
while 2d query's IN is converted to OR (or even UNION):
SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
WHERE t2.time = '2009'
OR t2.time = '2010'
精彩评论