SQL "WITH" Performance and Temp Table (possible "Query Hint" to simplify)

2023-03-21 07:48 问答作者：

Given the example queries below (Simplified examples only)

DECLARE @DT int; SET @DT=20110717; -- yes this is an INT
WITH LargeData AS (
    SELECT * -- This is a MASSIVE table indexed on dt field
    FROM mydata
    WHERE dt=@DT
), Ordered AS (
    SELECT TOP 10 * 
        , ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
    FROM LargeData
)
SELECT * FROM Ordered

and ...

DECLARE @DT int; SET @DT=20110717;

BEGIN TRY DROP TABLE #LargeData END TRY BEGIN CATCH END CATCH; -- dump any possible table.

SELECT * -- This is a MASSIVE table indexed on dt field
INTO #LargeData -- put smaller results into temp
FROM mydata
WHERE dt=@DT;

WITH Ordered AS (
    SELECT TOP 10 * 
        , ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
    FROM #LargeData
)
SELECT * FROM Ordered

Both produce the same results, which is a limited and ranked list of values from a list based on a fields data.

When these queries get considerably more complicated (many more tables, lots of criteria, multiple levels of "with" table alaises, etc...) the bottom query executes MUCH faster then the top one. Sometimes in the order of 20x-100x faster.

The Question is...

Is there some kind of query HINT or other SQL option that would tell the SQL Server to perform the same kind of optimization automatically, or other formats of this that would involve a cleaner aproach (trying to keep the format as much like query 1 as possible) ?

Note that the "Ranking" or secondary 开发者_C百科queries is just fluff for this example, the actual operations performed really don't matter too much.

This is sort of what I was hoping for (or similar but the idea is clear I hope). Remember this query below does not actually work.

DECLARE @DT int; SET @DT=20110717;
WITH LargeData AS (
    SELECT * -- This is a MASSIVE table indexed on dt field
    FROM mydata
    WHERE dt=@DT
    **OPTION (USE_TEMP_OR_HARDENED_OR_SOMETHING) -- EXAMPLE ONLY**
), Ordered AS (
    SELECT TOP 10 * 
        , ROW_NUMBER() OVER (ORDER BY valuefield DESC) AS Rank_Number
    FROM LargeData
)
SELECT * FROM Ordered

EDIT: Important follow up information!

If in your sub query you add

 TOP 999999999      -- improves speed dramatically

Your query will behave in a similar fashion to using a temp table in a previous query. I found the execution times improved in almost the exact same fashion. WHICH IS FAR SIMPLIER then using a temp table and is basically what I was looking for.

However

 TOP 100 PERCENT    -- does NOT improve speed

Does NOT perform in the same fashion (you must use the static Number style TOP 999999999 )

Explanation:

From what I can tell from the actual execution plan of the query in both formats (original one with normal CTE's and one with each sub query having TOP 99999999)

The normal query joins everything together as if all the tables are in one massive query, which is what is expected. The filtering criteria is applied almost at the join points in the plan, which means many more rows are being evaluated and joined together all at once.

In the version with TOP 999999999, the actual execution plan clearly separates the sub querys from the main query in order to apply the TOP statements action, thus forcing creation of an in memory "Bitmap" of the sub query that is then joined to the main query. This appears to actually do exactly what I wanted, and in fact it may even be more efficient since servers with large ammounts of RAM will be able to do the query execution entirely in MEMORY without any disk IO. In my case we have 280 GB of RAM so well more then could ever really be used.

Not only can you use indexes on temp tables but they allow the use of statistics and the use of hints. I can find no refernce to being able to use the statistics in the documentation on CTEs and it says specifically you cann't use hints.

Temp tables are often the most performant way to go when you have a large data set when the choice is between temp tables and table variables even when you don't use indexes (possobly because it will use statistics to develop the plan) and I might suspect the implementation of the CTE is more like the table varaible than the temp table.

I think the best thing to do though is see how the excutionplans are different to determine if it is something that can be fixed.

What exactly is your objection to using the temp table when you know it performs better?

The problem is that in the first query SQL Server query optimizer is able to generate a query plan. In the second query a good query plan isn't able to be generated because you're inserting the values into a new temporary table. My guess is there is a full table scan going on somewhere that you're not seeing.

What you may want to do in the second query is insert the values into the #LargeData temporary table like you already do and then create a non-clustered index on the "valuefield" column. This might help to improve your performance.

It is quite possible that SQL is optimizing for the wrong value of the parameters.

There are a couple of options

Try using option(RECOMPILE). There is a cost to this as it recompiles the query every time but if different plans are needed it might be worth it.
You could also try using OPTION(OPTIMIZE FOR @DT=SomeRepresentatvieValue) The problem with this is you pick the wrong value.

See I Smell a Parameter! from The SQL Server Query Optimization Team blog

继续阅读：sql sql-server temp-tables tsql

SQL "WITH" Performance and Temp Table (possible "Query Hint" to simplify)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？