Raw Data or Pre-Calculated Values in Database?
In ge开发者_如何学运维neral, is it better to store raw data with pre-calculated values in the database and concentrate on keeping the database up-to-date if I remove or delete a row while using the pre-calculated values for display to the user
OR
is it better to store the raw data and calculate the correct display values on-the-fly?
An example (which is pertinent to my project) would be similar to the following:
You have a timer application. In my case its using Core Data. It's not connected to the web, but a self-contained app that runs on a computer or mobile device (user's choice). The app stores a raw start time and a raw end time. The application needs to display the duration of the event and the interval at which the events are occuring. Would it be better to store a pre-calculated "duration" time and even a pre-formatted duration string that will be used for output or would it be better to calculate the duration on-the-fly, so to speak, for display?
Same goes with the interval, although there's another layer involved because when I create/delete/update a row in the database, I'll have update the interval for the items that are affected by this. Or, is it better to just calculate as the app executes?
For the record, I'm not trying to micro-optimize. I'm trying to figure out the best way to reduce the amount of code I have to maintain. If performance improves as a result, so be it.
Thoughts?
Generally, you would want to avoid computed values in the DB (from existing columns/tables), unless profiling absolutely dictates that they are necessary (i.e., the DB is underperforming or to great of a load is being placed on the server). This is even more true for formatting of the data, which should almost always be performed on the client side, instead of wasting DB server cycles.
Of course, any data that is absolutely mandatory to perform the calculations should be stored in the database.
When you speak of reducing the amount of code you need to maintain, keep in mind that the DBA needs to maintain stored-proc code and table schemas, too. Moving maintenance responsibilities from Developers to DBAs is not eliminating work, it is just shifting it.
Finally, database changes often cascade to many applications, whereas application changes only affect that application.
The only time I store calculated values in a database is if I need it for historical purposes. You'll see this all the time in accounting software.
For example if I'm dealing with an invoice, I will typically save the calculated invoice total because perhaps the way that total will get calculated later on will change.
I will also sometimes perform the actual calculation on the database server using views.
As with so many other things, "it depends". For your described case, I would lean towards keeping the calculation in code. If you do choose to use the database, you should use a view to dynamically calculate rather than put in a static value. The risk of changing the start time or end time and forgetting to change the duration would be too high otherwise :)
This really depends on wether you want to be pure (keep your data clean) or fast. Compute capacity on the desktop facilitates purity, high speed cores and large memory spaces make string composition for table cells possible with large data sets.
However on the phone, an iPhone 4 even, computing a single NSString for a UITableViewCell over a set of 1000 objects takes a noticeable amount of time, and this can affect your user experience.
So, tune the balance for your use case. Duration doesn't sound like it will change, so I would precalculate and store the duration AND the display string (feels aweful from the perspective of a DBA, but it will render fast on the phone).
For the interval it sounds like you actually need another entity, to relate the interval to a set of events. It would then be easy enough to pre-compute / maintain this calculation as well each time the relationship changes (i.e. you add an entity to the relationship, update the interval).
精彩评论