Huge performance dip with TOP in SQL Server

2023-04-03 11:06 问答作者：

Consider this SQL performance difference, where in the first, I select all 26,000 rows, and in the second, I just want the first 5.

SELECT tw.*
FROM entity e
JOIN entity_tag et on et.entity_id = e.id
JOIN tag t on t.tag_id = et.tag_id
JOIN tagrelatedtweets trt on trt.FK_Tag_ID = t.tag_id
JOIN tweets tw on tw.PK_Tweet_ID = trt.FK_Tweet_ID
WHERE e.id = 765131
ORDER BY tw.[timestamp]

SELECT TOP (5) tw.*
FROM entity e
JOIN entity_tag et on et.entity_id = e.id
JOIN tag t on t.tag_id = et.tag_id
JOIN tagrelatedtweets trt on trt.FK_Tag_ID = t.tag_id
JOIN tweets tw on tw.PK_Tweet_ID = trt.FK_Tweet_I开发者_开发百科D
WHERE e.id = 765131
ORDER BY tw.[timestamp]

Without: CPU =    201 | Reads:    6880 | Writes:    0 | Duration:   451
With:    CPU = 302439 | Reads: 7453199 | Writes: 3169 | Duration: 74188

This just doesn't make sense to me... Is there another way to go about this?

After Martin's suggestion of a REBUILD STATISTICS on all tables involved, there is a small improvement, but the trick with changing the TOP amount into a parameter works the best.

Before statistics rebuild:

CPU = 302439 | Reads: 7453199 | Writes: 3169 | Duration: 74188

After statistics rebuild:

CPU = 127734 | Reads: 4100436 | Writes: 2656 | Duration: 16880

With parameter:

CPU =    218 | Reads:    6899 | Writes:    0 | Duration:    83

Query with parameter:

DECLARE @TOP INT; SET @TOP=5; 
SELECT TOP (@TOP) tw.*
FROM entity e
JOIN entity_tag et on et.entity_id = e.id
JOIN tag t on t.tag_id = et.tag_id
JOIN tagrelatedtweets trt on trt.FK_Tag_ID = t.tag_id
JOIN tweets tw on tw.PK_Tweet_ID = trt.FK_Tweet_ID
WHERE e.id = 765131
ORDER BY tw.timestamp desc

One last remark for those of you using Entity Framework; if you experience this behavior, you can simulate the same parameter-based behavior as follows:

.Take(100).ToList().Take(5)

I know it isn't pretty, but it's the only way to trigger the correct execution plan if you are using entity framework as far as I can tell.

Thanks a lot for pointing me in the correct direction Martin!

Following discussion in the comments it seems that for some reason the plan SQL Server is choosing to optimize for the "First 5 Rows Only" case is sub optimal. It probably will be using non blocking operators such as nested loops joins rather than blocking ones such as hash joins.

As you are unable to post the execution plans it is difficult to know exactly the reason why but comparing the plans for both versions of the query and looking at actual vs estimated number of rows in the actual execution plan for the problem query should set some light on the matter.

It seems in this case that hiding the TOP 5 information from the optimiser at compile time is sufficient to give you the plan you want however!

继续阅读：sql-server-2005 tsql

Huge performance dip with TOP in SQL Server

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？