开发者

Storing "derived" values vs calculating them on extraction

When you have values which only depends on one or more other fiel开发者_Python百科ds +/- constants (say retail price & discount price), is it better to store those values too or to calculate them "on the fly" when retrieving the data?


The default is not to store redundant information: the third normal form is usually a sensible initial goal. Redundancy is introduced when a "good enough" reason appears, such as a "big enough" performance hit you take when you have to calculate a derived value and the calculation is intensive.

Obviously, "good enough" and "big enough" are qualifiers which only mean something in a given context. For what it's worth, the retail/discount price calculation seems too cheap and simple to do to warrant the introduction of a redundant column in most (obviously not all) cases.


I would agree with Tomislav - try to avoid redundancy because you can end up with data on multiple tables disagreeing with each other. It makes updates more painful.

There are exceptions that are worth considering, though, that are not related to database performance.

  • When it painful to calculate the value (e.g. some complex mathematical function), then it makes sense to store (you could imagine the column as the 'last calculated value').
  • You might have inputs that change over time, e.g. fee is derived from a fee rate, but the fee rate is stored as a single value in a configuration table. You might want to record the fee because historical fees would only be calculated from the current fee rate. Alternatively, you might store the rate by time as well to circumvent this problem.
  • If the derived value can be overriden by user input or some other process, then again it makes sense to store. Alternatively, you might model this with two states 'CALCULATED' and 'OVERRIDDEN', so that you only store a value in the latter state.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜