SQL Server 2008: Very slow query performance when dealing with specific date boundary
Right, this is an odd one.
We have a web service that returns data to a silverlight client. The queries are generated against a SQL Server 2008 database using the entity framework. Most of these queries are date range based - pull results between this date and that, for example. Also, a view is used to make things a little easier.
We have noticed that when issuing a specific set of queries that start on or after a certain date, execution will be very slow. This date is the 5th November, any year. If our start date is one day earlier execution will be instant. So, 5th - 6th, slow. 4th - 6th - quick.
Here is the SQL that EF generates:
THIS QUERY WILL BE VERY SLOW (30 seconds)
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [Name],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[Name] AS [K1],
SUM([Extent1].[Value]) AS [A1]
FROM (SELECT
[view_answers].[Value] AS [Value],
[view_answers].[Comment] AS [Comment],
[view_answers].[NewStockist] AS [NewStockist],
[view_answers].[NewDistPoint] AS [NewDistPoint],
[view_answers].[VoucherUsed] AS [VoucherUsed],
[view_answers].[CashTotal] AS [CashTotal],
[view_answers].[AnswerType] AS [AnswerType],
[view_answers].[StartTime] AS [StartTime],
[view_answers].[ActualEndTime] AS [ActualEndTime],
[view_answers].[Complete] AS [Complete],
[view_answers].[UserID] AS开发者_运维技巧 [UserID],
[view_answers].[UserName] AS [UserName],
[view_answers].[QuestionType] AS [QuestionType],
[view_answers].[ProductSKU] AS [ProductSKU],
[view_answers].[BrandID] AS [BrandID],
[view_answers].[TeamID] AS [TeamID],
[view_answers].[Name] AS [Name],
[view_answers].[Stage] AS [Stage],
[view_answers].[Question] AS [Question]
FROM [dbo].[view_answers] AS [view_answers]) AS [Extent1]
WHERE
([Extent1].[UserID] = '16E3692F-806E-40A0-BB99-ABBBCC13060D')
AND (N'Distribution Points' = [Extent1].[QuestionType])
AND ([Extent1].[StartTime] >= '11/05/2009 00:00:00')
AND ([Extent1].[StartTime] <= '11/08/2009 00:00:00')
AND (1 = [Extent1].[Complete])
AND (2 = [Extent1].[BrandID])
AND (N'Distribution Points' = [Extent1].[QuestionType])
GROUP BY
[Extent1].[Name])
AS [GroupBy1]
THIS ONE WILL BE MUCH QUICKER
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [Name],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[Name] AS [K1],
SUM([Extent1].[Value]) AS [A1]
FROM (SELECT
[view_answers].[Value] AS [Value],
[view_answers].[Comment] AS [Comment],
[view_answers].[NewStockist] AS [NewStockist],
[view_answers].[NewDistPoint] AS [NewDistPoint],
[view_answers].[VoucherUsed] AS [VoucherUsed],
[view_answers].[CashTotal] AS [CashTotal],
[view_answers].[AnswerType] AS [AnswerType],
[view_answers].[StartTime] AS [StartTime],
[view_answers].[ActualEndTime] AS [ActualEndTime],
[view_answers].[Complete] AS [Complete],
[view_answers].[UserID] AS [UserID],
[view_answers].[UserName] AS [UserName],
[view_answers].[QuestionType] AS [QuestionType],
[view_answers].[ProductSKU] AS [ProductSKU],
[view_answers].[BrandID] AS [BrandID],
[view_answers].[TeamID] AS [TeamID],
[view_answers].[Name] AS [Name],
[view_answers].[Stage] AS [Stage],
[view_answers].[Question] AS [Question]
FROM [dbo].[view_answers] AS [view_answers]) AS [Extent1]
WHERE
([Extent1].[UserID] = '16E3692F-806E-40A0-BB99-ABBBCC13060D')
AND (N'Distribution Points' = [Extent1].[QuestionType])
AND ([Extent1].[StartTime] >= '11/04/2009 00:00:00')
AND ([Extent1].[StartTime] <= '11/08/2009 00:00:00')
AND (1 = [Extent1].[Complete])
AND (2 = [Extent1].[BrandID])
AND (N'Distribution Points' = [Extent1].[QuestionType])
GROUP BY
[Extent1].[Name])
AS [GroupBy1]
If we set the start date to the 5th Nov last year, execution will be slow, the 4th Nov last year, and its fast again. Looking at the data in the database there is nothing unusual around the 5th. Also, it appears that queries where the start date is after the 5th will run slow.
I'm stumped!
(the database is hosted remotely so I don't have direct access to it)
UPDATE
Thanks for thr replied guys. Firsltly, I should probably make it clear that my knowledge of sql server is skin deep. I build databases of varying degress of quality and then use something like Linq To SQL or EF to work with them. So I feel a little out of my depth here.
Regarding the joins - the view I'm querying does include data from about 6-7 tables. I shall try and grab some stats next time at work and add some more info here. I dont really know much about execution plans, or is they are something I can see through SQL Server Management Studio
UPDATE Stats from slow query (3 row(s) affected)
Table 'tblProducts'. Scan count 0, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblQuestionTypes'. Scan count 0, logical reads 1496, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblQuestions'. Scan count 0, logical reads 1496, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblSessions'. Scan count 0, logical reads 28551, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblAnswers'. Scan count 1, logical reads 1976256, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblCalls'. Scan count 1, logical reads 439, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblUsers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Stats from fast query
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblAnswers'. Scan count 1, logical reads 7008, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblProducts'. Scan count 0, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblQuestions'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblQuestionTypes'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblCalls'. Scan count 1, logical reads 439, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblSessions'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblUsers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
UPDATE The query execution plan (a new thing to me) is suggesting that I add a new index to a row in one of the tables - questionid in the answers table. The suggested sql is as follows:
USE [brandfourcoke]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[tblAnswers] ([QuestionID])
INCLUDE ([CallID],[Value])
GO
Estimated improvements in query cost are 93%. Should I go ahead and do this? The database is in a live environment and uses sql server 2008's change tracking and the sync framework. As such Im always concerned that database changes will impact on the tracked data and break things for my clients. Will adding an index reset change tracking data? Thanks for all your help guys. I feel like a complete beginner here.
Run the slow and fast query with SET sTATISTICS IO ON
and see if there is a significant difference in the amount of logical reads/physical reads between the two.
Most likely there is a strong skew in the data. For instance the plan on the fast one does a nested loop driven by a result of 10 rows (resulting in 10 nested lookups), while the slow one suddenly sees 10000 rows from where the previous one has seen 10, resulting in 10000 lookups. Although your query has no joins, the engine may use various access indexes and join the indexes with the cluster index. The actual execution plan will always reveal exactly what's going on.
Without knowing anything additional, first guess would be that you are likely either hitting the tipping point of index usage for a particular operation or have outdated/wrong statistical distribution information for given column(s). Could be a few other things as well such as suboptimal indexing (perhaps wrong, perhaps key ordered incorrectly, etc.), given that you are using Sql 2008 are you sure you haven't created a filtered index rather than a traditional full index on the particular column(s), etc. - however, in order to determine that, we'd need to see a lot more information (i.e. schema, indexes, query plans, data distribution, statistics, etc.).
Might help if you could post the query plan used for each of the queries listed above, that would at least help us determine if you are getting drastically different plans.
That query is one reason why using generated sql is often a terrible idea. As far as I can tell this is the equivalent query if you had written it yourself:
SELECT 1 AS [C1],
[view_answers].[Name] AS [K1],
SUM([view_answers].[Value]) AS [C2]
FROM [view_answers]
WHERE ([view_answers].[UserID] = '16E3692F-806E-40A0-BB99-ABBBCC13060D')
AND (N'Distribution Points' = [view_answers].[QuestionType])
AND ([view_answers].[StartTime] >= '11/04/2009 00:00:00')
AND ([view_answers].[StartTime] <= '11/08/2009 00:00:00')
AND (1 = [view_answers].[Complete])
AND (2 = [view_answers].[BrandID])
AND (N'Distribution Points' = [view_answers].[QuestionType])
GROUP BY [view_answers].[Name]
Try this with both dates and see if you get the same results and the same delay when using Nov 5. What fields are indexed?
精彩评论