开发者

Do SQL Server statistics gather different data if they are updated under load vs. off peak usage?

Aright, basic question is: Has anyone heard anything that would imply that statistics need to be updated under daily use to be effective as opposed to off peak hours?

We had an issue with a database recently where a query plan got off track and decided to use a table scan instead of a seek. This of course, caused significant locking issues, and caused customer delays while we tracked down the index and updated 开发者_高级运维the correct statistics.

The query plan seems to have been the result of our maintenance plan that runs weekly on the system. One of the last steps in the plan updates statistics. The only thing that changed recently was that the statistics update was set to a sample of 10% instead of 30% as it was previously. (We have now changed this back)

According to one DBA we talked to though, we need to update statistics during load or SQL would not calculate the statistics on the best data. He claims that if you update stats during low use time, which we currently do, then the stats will not be as accurate.

This goes against what I understand about SQL statistics, and I cannot find anything online that tells me I should update stats at one time or another.


He's wrong (the other DBA). Table statistics are a function of the data you have stored in your columns, and absolutely nothing to do with server load. As a matter of fact, updating statistics at peak hours is pretty much the worst time to do it (updating statistics obviously has a deleterious effect on performance while the update is running).

Update your statistics WITH FULLSCAN during off-peak hours for maximum accuracy, then leave them alone.

edit: As http://blogs.msdn.com/b/chrissk/archive/2008/08/27/do-we-need-to-run-update-statistics-with-fullscan.aspx points out, stats with fullscan may not be particularly useful in some cases (basically if you have a table you're modifying data in a lot). Auto stats will rebuild your statistics if enough of the rows in your table have changed (I think it's around 500 rows or 20%, whichever is greater), and then use those auto-rebuilt statistics in preference to your fullscan. You can either manage statistics manually (not recommended), or leave auto stats on but schedule stat rebuilds manually too.


Statistics are about "data distribution" not "load on the server"

If you do them each night, you have less chance of stats being updated during the day.

Then you do do full or close to full sampling too...


It depends on your load. If your workload consists from frequent inserts and deletes (eg. queue like systems) then the statistics would look significantly different during peak-hours vs. off-hours, because the underlying tables would also look significantly different (tables would have high number of item during day, but be mostly empty during night).

If the tables looks the same at off-hours and at peak-hours (typical case for when the daily workload mostly adds new data) then it is safe to snapshot the stats at night.


As others have pointed out, statistics have to do with the distribution of data among the potential key space, and so they are not dependent on load. That said, I have seen many databases that do off-hours processing of certain tables (ETL staging etc) that would wildly skew statistics depending on the hour that the stats were run. This isn't to say that the statistics gathered during the day would be better, just that they would be different.

The goal with gathering statistics, as with index optimization in general, is to tune to the scenario in which you want your queries to function optimally. So if you want your queries to function optimally with a million rows in a table, generate statistics with a million rows in the table (distributed among the various index keyspaces more or less as they would be when your queries ought to function optimally).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜