Impact of ordering of correlated subqueries within a projection
I'm noticing something a bit unexpected with 开发者_开发技巧how SQL Server (SQL Server 2008 in this case) treats correlated subqueries within a select statement. My assumption was that a query plan should not be affected by the mere order in which subqueries (or columns, for that matter) are written within the projection clause of the select statement. However, this does not appear to be the case.
Consider the following two queries, which are identical except for the ordering of the subqueries within the CTE:
--query 1: subquery for Color is second
WITH vw AS
(
SELECT p.[ID],
(SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName],
(SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color]
FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';
--query 2: subquery for Color is first
WITH vw AS
(
SELECT p.[ID],
(SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color],
(SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName]
FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';
If you look at the two query plans, you'll see that an outer join is used for each subquery and that the order of the joins is the same as the order the subqueries are written. There is a filter applied to the result of the outer join for color, to filter out rows where the color is not 'Gray'. (It's odd to me that SQL would use an outer join for the color subquery since I have a non-null constraint on the result of the color subquery, but OK.)
Most of the rows are removed by the color filter. The result is that query 2 is significantly cheaper than query 1 because fewer rows are involved with the second join. All reasons for constructing such a statement aside, is this an expected behavior? Shouldn't SQL server opt to move the filter as early as possible in the query plan, regardless of the order the subqueries are written?
Edit: Just to clarify, there is a valid reason I'm exploring this scenario. I may need to create a view that involves similarly constructed subqueries, and it is now apparent that any filtering based on these columns projected from the view will vary in performance just because of the ordering of the columns!
With the TOP operator coming into play here, the Query Optimizer is remarkably blind about the statistics, so it will look for other clues about how best to work it, such as instantiating relevant parts of the CTE first.
And it's an outer join because the subquery will be used as NULL if nothing is returned, and the system is instantiating it first. If you were using an aggregate instead of TOP, you'd probably get a slightly different but more consistent plan.
Here is an alternate version that might perform better:
With Colors As
(
Select Id, [Color]
, ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [LastModified] DESC ) As Num
From Preference
Where [Color] Is Not Null
)
, Names As
(
Select Id, [FirstName]
, ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [LastModified] DESC ) As Num
From Preference
Where [FirstName] Is Not Null
)
Select
From Person As P
Join Colors As C
On C.Id = P.Id
And C.Num = 1
Left Join Names As N
On N.Id = P.Id
And N.Num = 1
Where C.[Color]= 'Grey'
Another solution which is more concise but may or may not perform as well:
With RankedItems
(
Select Id, [Color], [FirstName]
, ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY Case When [Color] Is Not Null 1 Else 0 End DESC, [LastModified] DESC ) As ColorRank
, ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY Case When [FirstName] Is Not Null 1 Else 0 End DESC, [LastModified] DESC ) As NameRank
From Preference
)
Select
From Person As P
Join RankedItems As RI
On RI.Id = P.Id
And RI.ColorRank = 1
Left Join RankedItems As RI2
On RI2.Id = P.Id
And RI2.NameRank = 1
Where RI.[Color]= 'Grey'
精彩评论