Database design for summarized data
I have a new table I'm going to add to a bunch of other summarized data, basically to take some of the load off by calculating weekly avgs.
My question is whether I would be better off wit开发者_如何学编程h one model over the other. One model with days of the week as a column with an additional column for price or another model as a series of fields for the DOW each taking a price.
I'd like to know which would save me in speed and/or headaches? Or at least the trade off.
IE.
ID OBJECT_ID MON TUE WED THU FRI SAT SUN SOURCE
OR
ID OBJECT_ID DAYOFWEEK PRICE SOURCE
I would give the first preference to the following aggreate model:
ID | OBJECT_ID | DATE | PRICE | SOURCE
---+-----------+------------+--------+--------
1 | 100 | 2010/01/01 | 10.00 | 0
2 | 100 | 2010/01/02 | 15.00 | 0
3 | 100 | 2010/01/03 | 20.00 | 0
4 | 100 | 2010/01/04 | 12.00 | 0
You would then be able to aggreate the above data to generate averages for every week/month/year very easily and relatively quickly.
To get the list of weekly averages, you would be able to do the following:
SELECT WEEK(date), AVG(price) FROM table GROUP BY WEEK(date);
For some further examples, the following query would return the average price on Sundays:
SELECT AVG(price) FROM table WHERE DAYOFWEEK(date) = 1;
Or maybe get the average daily price for the 8th week of the year:
SELECT AVG(price) FROM table WHERE WEEK(date) = 8;
It would also be quite easy to get monthly or yearly averages:
SELECT MONTH(date), AVG(price) FROM table GROUP BY MONTH(date);
I would only opt for more de-normalized options like the two you proposed if the above aggregations would still be too expensive to compute.
I would vote for the second. With the first, you would need some contraints to ensure that any row has only one of MON, TUE, WED, THU, FRI, SAT, SUN
. Of course, with the second, you might need some additional reference data to define the Days of the Week, to populate DAYOFWEEK
.
UPDATE:
Ok it wasn't clear there would always be a price for every day. In that case my point about constraints isn't so valid. I'd still prefer the second model though, it seems better normalized. I don't know enough about this case now to say if this is a good time to cast off some good normalization practices for clarity and performance, but it might be...
精彩评论