What influences a query plan's performance besides what is shown in Mgt Studio's graphical output?

2023-03-05 12:45 问答作者：

(Note: This was formerly titled "How do a query plan's estimated executions/IO/CPU/Executions, etc, influence performance, all other things being equal?", but I renamed it now that I know the answer.)

Below is a case where two identical SP calls vary only by the cached plan they are using. The graphical representation of the plan looks identical: it is a simple nested loops join in both cases (see below). However in one case it is using a plan that was cached from an earlier call when many fewer rows in the temp table, and this case is consistently slower than when the SP is recompiled. I am trying to understand what is happening under the hood so that I can better work around the performance problems I am having.

Here are the details, along with a script if you want to test this out yourself:

A SP is called that populates a temp table with either 10 rows or 400 rows. Then the temp table is joined back to a permanent table with about 4 million rows and the results are selected out.

Case 1: Recompile SP, Run the 10 row case first, and the 400 row case second.

Case 2: Recompile SP, Run the 400 row case first, and the 10 row case second.

The performance of the 400 row temp table is consistently worse in Case 1 than in Case 2 even though the execution plans appear to be the same (250ms vs 125ms). I understand that a plan is cached based on the first run of the SP after recompiling, but in both the 10 and 400 row cases, they both use a very simple nested loops join, so I didn't expe开发者_JAVA技巧ct any difference in whether the 400 case ran first or second. But apparently the plan that is cached for the 10 row case is inefficient in the 400 row case.

I can see now that there is more to a cached plan than the visual operations & order that is displayed in the Management Studio graphical representation. The problem is, I don't understand what else is going on under the hood that is causing the difference. I did look at the STATISTICS PROFILE output (see below) and saw differences in Estimate Rows, Estimate IO, Estimate CPU, EstimateExecutions and/or TotalSubtree cost.

Can anyone explain what is going on? Is the Estimated Number of Executions being used to allocate memory as if only 10 rows are expected, and it continually has to reallocate memory over and over to process 400 rows? I'm just guessing.

Case 1 Execution Plan for 400 row case (based on cached plan for 10 row case):

What influences a query plan's performance besides what is shown in Mgt Studio's graphical output?

Case 2 Execution Plan for 400 row case (recompiled):

What influences a query plan's performance besides what is shown in Mgt Studio's graphical output?

-------------------------------------------------------------------------------------------------------------------------------
-- TEST CASE
------------------------------------------------------------------------------------------------------------------------------- 


-- 1. SET UP
    -- Create a table with 4 million rows

    IF OBJECT_ID('A') IS NOT NULL DROP TABLE A;
    CREATE TABLE A (ID INT PRIMARY KEY, Val1 VARCHAR(100), Val2 VARCHAR(100))

    ;WITH  
        Pass0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows  
        Pass1 AS (SELECT 1 AS C FROM Pass0 AS A, Pass0 AS B),--4 rows  
        Pass2 AS (SELECT 1 AS C FROM Pass1 AS A ,Pass1 AS B),--16 rows 
        Pass3 AS (SELECT 1 AS C FROM Pass2 AS A ,Pass2 AS B),--256 rows 
        Pass4 AS (SELECT 1 AS C FROM Pass3 AS A ,Pass3 AS B, Pass2 AS X, Pass1 AS Y),--65536 x 16 x 4 rows 
        Tally AS (SELECT  ROW_NUMBER() OVER (ORDER BY C) AS Number FROM   Pass4)
    INSERT INTO A
    SELECT Number, 'xxxxxxxxx10xxxxxxxx20xxxxxxxx30xxxxxxxx40', 'xxxxxxxxx10xxxxxxxx20xxxxxxxx30xxxxxxxx40'
    FROM  Tally

    -- Create SP

    IF OBJECT_ID('Proc1') IS NOT NULL DROP PROC Proc1
    GO

    CREATE PROC Proc1
    (
        @TempTableName AS VARCHAR(100)
    )
    AS
        CREATE TABLE #T (ID INT PRIMARY KEY)
        EXEC('INSERT INTO #T SELECT ID FROM ' + @TempTableName)

        SET STATISTICS PROFILE ON

        SELECT A.*
        INTO #X
        FROM #T AS T
        JOIN A AS A ON A.ID = T.ID

        SET STATISTICS PROFILE OFF

        DROP TABLE #X

GO

    -- Generate temp table data 

    IF OBJECT_ID('tempdb..##Temp400') IS NOT NULL DROP TABLE ##Temp400;
    SELECT TOP 400 ID
    INTO ##Temp400
    FROM A
    ORDER BY NEWID()

    IF OBJECT_ID('tempdb..##Temp10') IS NOT NULL DROP TABLE ##Temp10;
    SELECT TOP 10 ID
    INTO ##Temp10
    FROM A
    WHERE ID NOT IN (SELECT ID FROM ##Temp400 AS T)
    ORDER BY NEWID()

-- 2. RUN TEST WHILE MONITORING IN PROFILER

    -- Execute 10 then 400
        EXEC sp_recompile Proc1
        CHECKPOINT
        DBCC DROPCLEANBUFFERS
        GO
        EXEC Proc1 '##Temp10';
        GO
        EXEC Proc1 '##Temp400';
        GO
    -- Execute 400 then 10
        EXEC sp_recompile Proc1
        CHECKPOINT
        DBCC DROPCLEANBUFFERS
        GO
        EXEC Proc1 '##Temp400';
        GO
        EXEC Proc1 '##Temp10';

p.s. This is based on a real case where the performance is much more pronounced than in the example given here. Since the optimal plan was highly dependent on the query size, there was no single plan that fit all row sizes well. My solution was to copy the main query in the SP multiple times within several IF statements, forcing the SQL engine to give each case its own optimal plan. It's the best solution I found, but it feels a bit clunky:

IF @RowCount < 1
    [paste query here]
ELSE IF @RowCount < 50
    [paste query here]
ELSE IF @RowCount < 200
    [paste query here]
ELSE
    [paste query here]

I don't believe, that estimated rows, i/o and cpu affects anything at all as long as execution plans are the same. In your case the execution plans are similar, but there is difference, if you expand the nested loop record, you will see that you have different arguments to this operator. One of them goes with PREFETCH and the other does not. This is how your plans are different and this is what affects your performance.

This looks relevant: http://blogs.msdn.com/b/craigfr/archive/2008/10/07/random-prefetching.aspx

Some of your ideas are flawed.

There are no indexes on the temp table so there will be a cost difference accessing 400 rows compared to 10 rows
This affects how it JOINs into A (more expensive to find 400 rows in A than 10)
The number of rows affects the relative cost of the load into #X
ORDER BY NEWID() will change data distribution each run: so the 10 rows may be close in one of your runs
the memory used for tables/indexes will be somewhat limited ny NEWID too
missed that in table creation, oops~~the plan shows a clustered index on ##temp. This isn't shown in the code~~

You should show real code rather than contrived code so we can solve your problem, rather than explaining "this is as expected" for contrived code.

继续阅读：sql-execution-plan sql-server

What influences a query plan's performance besides what is shown in Mgt Studio's graphical output?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？