Detecting/Monitoring for parameter sniffing problems
Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
I have just got hit with a parameter sniffing problem. (It wasn't too serious as it caused a report to take about 2 minutes to run instead of a few seconds if properly cached and maybe 30 seconds if recompiled. And since the report is usually only run a few times per month, it is not really a problem).
However, since I wrote the report and I knew what it did, I was curious and went investigating and using SQL Profiler, I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
So, it struck me, that if SQL has these figures, (or at least can get these figures), that perhaps there is some way of getting sql to track and report which p开发者_如何学运维lans were significantly out.
You've got a couple of questions in there:
Are there any tools to specifically monitor/detect for parameter sniffing problems as opposed to those which report queries that take a long time?
To catch this, you need to monitor the procedure cache to find out when a query's execution plan changes from good to bad. SQL Server 2008 made this a lot easier by adding query_hash and query_plan_hash fields to sys.dm_exec_query_stats. You can compare the current query plan to past ones for the same query_hash, and when it changes, compare the number of logical reads or amount of worker time from the old query to the new one. If it skyrockets, you might have a parameter sniffing problem.
Then again, someone might have just eliminated an index or changed the code in a UDF that's being called or a change in MAXDOP or any one of a million settings that influence query plan behavior.
What you want is a single dashboard that shows the most resource-consuming queries in aggregate (because you might have this problem on a query that's called extremely frequently, but consumes tiny amounts of resources each time) and then shows you changes in its execution plan over time, plus lays over system and database level changes. Quest Foglight Performance Analysis does this. (I used to work for Quest, so I know the product, but I'm not shilling here.) Note that Quest sells a separate product, Foglight, that has nothing to do with Performance Analysis. I'm not aware of any other product that goes into this level of detail.
I could see a section in the query plan where the number of estimated rows was 1, but the actual number of rows was several hundred thousand.
That's not necessarily parameter sniffing - that could be bad stats or table variable usage, for example. To catch this kind of issue, I like the free SQL Sentry Plan Advisor tool. In the Top Operations tab, it highlights variances between estimated and actual rows.
Now, that's only for one plan at a time, and you have to know the plan first. You want to do this 24/7, right? Sure you do - but it's computationally intensive. The procedure cache can be huge (I've got clients with >100GB of procedure cache), and it's all unindexed XML. To compare estimated vs actual rows, you have to shred all that XML - and keep in mind that the procedure cache can be constantly changing under load.
What you really want is a product that could very rapidly dump the entire procedure cache into a database, throw XML indexes on it, and then compare estimates versus actual rows. I can imagine a script doing that, but I haven't seen one yet.
You said
"estimated rows was 1, but the actual number of rows was several hundred thousand."
This can be caused by table variables which don't have statistics.
To detect parameter sniffing is difficult but you can verify it is happening by running sp_updatestats. If the problems disappears it's most likely parameter sniffing. If it doesn't then you have other problems, such as too large table variables
We use parameter masking consistently now (system was developed on SQL Server 2000). We don't need it 99.9+ % of the time but the < 0.1% justifies it because of user confidence + support overhead it entails.
You can set up a trace that to record the query text of all batches / stored procedures run that have duration > Ns.
You obviously need to tailor N for your system (and probably add rules to exclude batch jobs that take a long time even during normal execution), but this should identify which queries offer the poorest performance and will also record any queries (along with their parameters) which have abnormally long execution times - potentially the result of a parameter sniffing problem.
See How to create a SQL trace using T-SQL on how to create a trace using T-SQL. This will give better performance than using SQL Profiler as this only captures the events that you set trace events for (SQL Profiler reportedly captures all events and then filters them in the application).
精彩评论