Database Design Question regarding duplicate information

2022-12-25 19:05 问答作者：

I have a database that contains a history of product sales. For example the following table

CREATE TABLE SalesHistoryTable (
OrderID, // Order Number Unique to all orders
ProductID, // Product ID can be used as a Key to look up product info in another table
Price, // Price of the product per unit at the time of the order
Quantity, // quantity of the product for the order
Total, // total cost of the order for the product. (Price * Quantity)
Date, // Date of the order
StoreID, // The store that created the Order
PRIMARY KEY(OrderID));

The table will eventually have millions of transactions. From this, profiles can be created for products in different geographical regions (based on the StoreID). Creating these profiles can be very time consuming as a database query. For example.

SELECT ProductID, StoreID,
 SUM(Total) AS Total,
SUM(Quantity) QTY,
SUM(Total)/SUM(Quantity) AS AvgPrice
FROM SalesHistoryTable
GROUP BY ProductID, StoreID;

The above query could be used to get the Information based on products for any particular store. You could then determine which store has sold the most, has made the most money, and on average sells for the most/least. This would be very costly to use as a normal query run anytime. What are some design descisions in order to allow these types of queries to run faster assuming storage size isn’t an issue. For example, I could create another Table with duplicate information. Store ID (Key), Product ID, TotalCost, QTY, AvgPrice And provide a trigger so that when a new order is received, the entry for that store is updated in a new table. The cost for the update is almost nothing.

What开发者_开发知识库 should be considered when given the above scenario?

This is normally something you would use a data warehouse for, but aside from that, using a trigger to update a second table is a perfectly viable option.

You could also have a second table that is populated by a batch job on a periodic basis (a more data-warehouse like option). You could also use a materialized view if your database supports them.

I'd consider:

a data warehouse/OLAP solution
(as you said) run your data mining queries against a separate precomputed table/dataset
indexed/materialised views which is almost the same as the previous point

There are some questions though:

do you expect real time data?
what is your write volume?
what DB engine?

You may want to look into using materialized views, which will only be queried periodically.

"The cost for the update is almost nothing."

Except that all updates must now be serialized. Because no matter what, the ancient law of physics still remains that no two things can be in the same place at the same time.

继续阅读：database database-design

Database Design Question regarding duplicate information

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？