Help with Improving Performance of SQL Server 2008 CTE
I have a SQL Server 2008 CTE which is responsible for returning the Top Review for a location.
The CTE is wrapped in a UDF (Table-Valued), and joined on the LocationId field, so i can get the top review for each location.
Cardinalities:
1 Location has 0-many PostLocations
1 PostLocation has 1 Post 1 Post has 1 ReviewHere's the UDF:
CREATE FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
(
)
RETURNS TABLE
AS
RETURN
(
WITH [RankedLocations] AS
(
SELECT PL.LocationId,
R.Rating,
P.PostID,
P.UniqueUri,
P.Content,
ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
From dbo.PostLocations As PL
INNER JOIN dbo.Posts As P
ON P.PostId = PL.PostId
INNER JOIN dbo.Reviews As R
ON R.PostId = P.PostId
WHERE R.ReviewTypeId <> 5
AND P.Content IS NOT NULL
)
SELECT LocationId, Rating, PostID, UniqueUri, Content
FROM RankedLocations
WHERE ScoreRank = 1
)
Here's an example of how i'm using it:
select l.LocationId, l.Name, l.Unique开发者_如何学PythonUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join dbo.Review_HighestRated_Aggregated_ByLocation() r
on l.LocationId = r.LocationId
The above query is taking 15 seconds to execute, which is unacceptable. Without the join to the UDF it's takes 0 seconds.
Any ideas on how i can improve it?
If i look at the execution plan, it's the SORT that is taking up 98% of the execution cost. The IO/subtree cost of this operation is ~300.
I was hoping the execution plan would give me a hint to an index i could create to improve the cost, but i get nothing.
Any ideas?
So i found the performance problem, and it wasn't the CTE, it was how i was using it.
I have several lookup tables, one in particular for Location Type (Street = 7, City = 5, etc).
So to keep my SQL fluent and consistent (and avoid hard-coded magic numbers), i created a wrapper scalar function which returns the evuivalent value based on the string, e.g:
DECLARE @Street_LocationType = [dbo].[ToLocationTypeId]('Street')
The function is extremely simple, just a series of CASE statements.
But, i was using my CTE like this:
SELECT a.LocationId, b.Content, b.UniqueUri
FROM [dbo].[Locations] a
INNER JOIN dbo.Review_HighestRated_Aggregated_ByLocation() b -- UDF with CTE
ON a.LocationId = b.LocationId
WHERE a.LocationTypeId = @Street_LocationType
So i wasn't even using it on the CTE itself, i was using it as a filter on the Locations table.
If i change the above to hardcode the value (e.g 7), the procedure execution time drops from 13 seconds to 2 secs.
I don't get it, but it solved the problem. I was noticing when the procedure was performing badly, the "SORT" operation in the query plan had an estimated number of rows = 32,000 - which is basically every post in the system.
After my changes, the estimated number of rows is 1 (as it should be).
Bizarre activity indeed.
If your table valued function dont need parameters consider to use a VIEW
instead a UDF. Probably it solves the performance problem.
To convert the CTE and the UDF into a VIEW:
DROP FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
GO
CREATE VIEW Review_HighestRated_Aggregated_ByLocation
AS
SELECT LocationId, Rating, PostID, UniqueUri, Content
FROM
(
SELECT PL.LocationId,
R.Rating,
P.PostID,
P.UniqueUri,
P.Content,
ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
From dbo.PostLocations As PL
INNER JOIN dbo.Posts As P
ON P.PostId = PL.PostId
INNER JOIN dbo.Reviews As R
ON R.PostId = P.PostId
WHERE R.ReviewTypeId <> 5
AND P.Content IS NOT NULL
) RankedLocations
WHERE ScoreRank = 1
GO
OP's sample query revised to use the new VIEW:
select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join Review_HighestRated_Aggregated_ByLocation r
on l.LocationId = r.LocationId
精彩评论