开发者

Query time with a JOIN using a subquery versus a string

I have a query like this :

SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id 
     AND t2.time IN (SELECT year FROM activeYears WHERE active = 1)

Where activeYears.year is NVARCHAR(50), each year is a row.

Why is that join's run time faster than this query :

SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id AND t2.time IN ('2009','2010')

This is short, simple version, basically I have a large query with a join that uses a sub-query. When I change the sub-query to a string for testing, it takes double the time to run, even when clearing the cache. I think it might be a casting issue, but I tried declaring two variables as NVARCHAR(50) as well, and using them in the query and it made no difference.

It has just been puzzling me for a few days, and I don't understand why the sub-query is faster unless the query is actually built differently in some manner.

Thanks!


edit -- exec plan information

I diff'd the two execution plans, and will try to give you the anonymized highlights.

The MissingIndexes section of the execution plan for the faster (sub-query) query :

         <QueryPlan CachedPlanSize="196" CompileTime="2166" CompileCPU="2166" CompileMemory="18640">
            <MissingIndexes>
             <MissingIndexGroup Impact="41.4663">
                <MissingIndex Database="[database]" Schema="[dbo]" Table="[table2]">
                 <ColumnGroup Usage="EQUALITY">
                   <Column Name="[time]" ColumnId="3" />
                    <Column Name="[id]" ColumnId="10" />
                  </ColumnGroup>
                </MissingIndex>
              </MissingIndexGroup>
            </MissingIndexes>

The MissingIndexes section of the query using a string for t2.time IN ('2009','2010')

           <MissingIndexes>
              <MissingIndexGroup Impact="35.4994">
                <MissingIndex Database="[database]" Schema="[dbo]" Table="[table2]">
                  <ColumnGroup Usage="INEQUALITY">
                    <Column Name="[time]" ColumnId="3" />
                  </ColumnGroup>
                  <ColumnGroup Usage="INCLUDE">
                    <Column Name="[id]" ColumnId="10" />
                    <Column Name="[field1]" ColumnId="15" />
                  </ColumnGroup>
                </MissingIndex>
              </MissingIndexGroup>
              <MissingIndexGroup Impact="44.364">
                <MissingIndex Database="[database]" Schema="[dbo]" Table="[table2]">
                  <ColumnGroup Usage="EQUALITY">
                    <Column Name="[time]" ColumnId="3" />
                    <Column Name="[id]" ColumnId="10" />
                  </ColumnGroup>
                  <ColumnGroup Usage="INCLUDE">
                    <Column Name="[field1]" ColumnId="15" />
                  </ColumnGroup>
                </MissingIndex>
              </MissingIndexGroup>
            </MissingIndexes>

And then there is a nested loop that is built differently in each plan, this is the nested loop that is drastically different, in the same order as above, first sub-query, then string version :

                <RelOp AvgRowSize="878" EstimateCPU="4.45867E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1.06667" LogicalOp="Left Outer Join" NodeId="1" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="4.44321">
                  <OutputList>
                     --SNIPPED--
                  </OutputList>
                  <NestedLoops Optimized="false">
                    <OuterReferences>
                      <ColumnReference Database="[database]" Schema="[dbo]" Table="[table1]" Alias="[t1]" Column="time" />
                      <ColumnReference Database="[database]" Schema="[dbo]" Table="[table1]" Alias="[t1]" Column="id" />
                    </OuterReferences>

The same nested loop, but from the second query :

                <RelOp AvgRowSize="878" EstimateCPU="0.223308" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1.06667" LogicalOp="Left Outer Join" NodeId="1" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="4.66781">
                  <OutputList>
                    -- SNIPPED --
                  </OutputList>
                  <NestedLoops Optimized="false">
                    <Predicate>
                      <ScalarOperator ScalarString="[database].[dbo].[table1].[time] as [t1].[time]=[database].[dbo].[table2].[time] as [t2].[time] AND [database].[dbo].[table1].[id] as [t1].[id]=[database].[dbo].[table2].[id] as [t2].[id]">
                        <Logical Operation="AND">
                          <ScalarOperator>
                            <Compare CompareOp="EQ">
                              <ScalarOperator>
                                <Identifier>
          开发者_开发知识库                        <ColumnReference Database="[database]" Schema="[dbo]" Table="[table1]" Alias="[t1]" Column="time" />
                                </Identifier>
                              </ScalarOperator>
                              <ScalarOperator>
                                <Identifier>
                                  <ColumnReference Database="[database]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="time" />
                                </Identifier>
                              </ScalarOperator>
                            </Compare>
                          </ScalarOperator>
                          <ScalarOperator>
                            <Compare CompareOp="EQ">
                              <ScalarOperator>
                                <Identifier>
                                  <ColumnReference Database="[database]" Schema="[dbo]" Table="[table1]" Alias="[t1]" Column="id" />
                                </Identifier>
                              </ScalarOperator>
                              <ScalarOperator>
                                <Identifier>
                                  <ColumnReference Database="[database]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="id" />
                                </Identifier>
                              </ScalarOperator>
                            </Compare>
                          </ScalarOperator>
                        </Logical>
                      </ScalarOperator>
                    </Predicate>

And finally, same order, there is a piece of another, different nested loop that looks like it may make a difference because it seems to show how that section of the query is built, first the sub-query plan's version :

<ScalarOperator ScalarString="[database].[dbo].[table1].[id] as [t1].[id]=[database].[dbo].[table2].[id] as [t2].[id] AND [database].[dbo].[table1].[time] as [t1].[time]=[database].[dbo].[table2].[time] as [t2].[time] AND [database].[dbo].[table2].[time] as [t2].[time]&gt;=N'2009' AND [database].[dbo].[table2].[time] as [t2].[time]&lt;=N'2010'"> 

Then the string version, IN ('2009', '2010') :

<ScalarOperator ScalarString="[database].[dbo].[table2].[time] as [t2].[time]=N'2009' OR [database].[dbo].[table2].[time] as [t2].[time]=N'2010'">

2nd edit -- statistics information

Per request, here is SET STATISTICS TIME ON and SET STATISTICS IO ON, in the same order as above, sub-query first:

Table 'activeYear'. Scan count 2, logical reads 2010, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table2'. Scan count 1, logical reads 2339848, physical reads 0, read-ahead reads 2303, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table3'. Scan count 1016, logical reads 4624, physical reads 21, read-ahead reads 1047, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 12, logical reads 109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table4'. Scan count 1, logical reads 126, physical reads 0, read-ahead reads 126, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table1'. Scan count 1033, logical reads 5331, physical reads 57, read-ahead reads 123, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table5'. Scan count 1, logical reads 219, physical reads 0, read-ahead reads 219, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table6'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 10328 ms,  elapsed time = 11479 ms.

Then string

Table 'table2'. Scan count 1, logical reads 2339848, physical reads 0, read-ahead reads 2303, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table3'. Scan count 1016, logical reads 4467, physical reads 21, read-ahead reads 1047, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 659, logical reads 5863, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table4'. Scan count 1, logical reads 126, physical reads 0, read-ahead reads 126, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table1'. Scan count 1033, logical reads 5228, physical reads 60, read-ahead reads 120, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table5'. Scan count 1, logical reads 219, physical reads 0, read-ahead reads 219, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'activeYear'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'table6'. Scan count 1, logical reads 2, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 16719 ms,  elapsed time = 17447 ms.

(There are a few more tables in this join than the simplified query, but it's only the string versus sub-query change on that single join I am concerned with... so hopefully that is isolated, since t1 joins separately to all the tables in this query.)


Educated guess: 1st query's IN is converted to JOIN:

SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id 
JOIN activeYears ay ON t2.time = ay.year 
WHERE ay.active = 1

while 2d query's IN is converted to OR (or even UNION):

SELECT * FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id 
WHERE t2.time = '2009'
   OR t2.time = '2010'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜