开发者

Optimal Database fullness?

Lets say you have a relational database of arbitrary but finite ca开发者_运维问答pacity, and the database holds historical event information for an online system that is constantly generating new events. The database should hold event information for reporting purposes, but should purge events older than (N) number of days. Given that you have enough historical information to deduce that the rate of event generation is relatively constant and not increasing or decreasing over time, is there an optimal percentage (60%, 70%, 80%,...) fullness that you would design for this database? If so, then why did you choose that percentage?


It depends.

Well, to be slightly more helpful, you said that the rate of event generation is "relatively constant". You will need enough margin to deal with inconstancies in that rate, both statistical and emergency. Statistics you can get from your history, but emergencies can only be guessed at.

The actual amount of space used depends on how it is stored. On a related note, many filesystems become very slow if they exceed a certain degree of fullness; you will likely want include this percentage as part of your total margin. Also, consider things like the granularity of your event purge: how often does it happen?

Also, consider the consequences of running out of capacity. Will your system crash? How critical is the system, anyway? Can you do an emergency purge to make additional space? How expensive is extra capacity, relative to the expense of an outage?


This isn't so much a database design issue as it is an operational issue.

You're nightly maintenance process (or however else you age out expired data) needs to maintain enough free space to accomodate any rational daily volume of events. Presumably failure due to running out of space isn't an option. But you only know how much space that is by knowing what your daily volume is and what the variance is. If your mean daily volume is 5,000,000 events with a variance of +/- 4,000,000 events and you've got a standard deviation of say 2,000,000, you're going to need to maintain a lot more free space than if you've got the same mean daily volume, but the variance is +/- 500,000 and a standard deviation of 50,000. Until you've got some statistics to inform you, you're just guessing.

In a world where terabyte hard drives cost less than $200, worrying about space isn't worthwhile.

Far more important, IMHO, from an operational perspective, is just how much free space to maintain on data and index pages, so as to minimize page splits on insert and update operations and the performance hit you take from that. And again, you need to know something about the actual data to figure that out.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜