True or False: Good design calls for every table to have a primary key, if nothing else, a running integer

2022-12-22 06:56 问答作者：

Consider a grocery store scenario (I'm making this up) where you h开发者_开发技巧ave FACT records that represent a sale transaction, where the columns of the Fact table include

SaleItemFact Table
------------------
CustomerID  
ProductID  
Price  
DistributorID  
DateOfSale  
Etc  
Etc  
Etc

Even if there are duplicates in the table when you consider ALL the keys, I would contend that a surrogate running numeric key (i.e. identity column) should be made up, e.g., TransactionNumber of type Integer.

I can see someone arguing that a Fact table might not have a unique key (though I'd invent one and waste the 4 bytes, but how about a dimension table?

First normal form requires a primary key on every table. So this is the bare minimum required for good database design. What you choose for the primary key is open to much debate. But first normal form for database design is not.

One reason, among many, to have a unique key per row (built from the data, or otherwise) is to facilitate updates or deletions to that specific row.

In any case, this question is kind of silly, because there really isn't an engineering trade-off at stake. There is no real proposed benefit to not having the key, so what's the point? True/yes, rows should have unique identifiers.

Because your question is under datawarehousing:

Dimension tables should have a surrogate (meaningless) primary key, usually an auto-increment integer; and a business key which uniquely identifies an object that the table row describes -- like an email address, full name or similar.
Fact tables mostly (almost always) have a primary key which is combination of two or more foreign keys.

There should be no duplicates in fact tables when combining foreign keys into the primary key. To test this, simply try to load the same transaction twice -- it should fail. An auto generated primary key will not prevent this, because it does not exists outside the warehouse. The problem can be usually solved by including the time-stamp into the primary key.

Sometimes a fact table is used as a dimension, or in a view that may act as a dimension. In this case it is convenient to have one (big)integer as a primary key, instead of several FK fields -- however, the original combination of FKs and time-stamp(s) should still uniquely identify the fact row.

For data warehouses, fact tables often have a composite primary key, usually the composite of all the foreign keys to your dimension tables.

It's rather common to not have any primary key in your fact tables as well, as they often serve no purpose other than wasting space - and for large datawarehouses the space can be quite big. Your dimension tables will have primary keys though.

If you're talking about the OLTP part of your grocery store, you would normally follow standard OLTP database design, normalize your tables and provide a primary key.

True. Think conceptually, everything is unique even if its not defined by unique data. So if you enter data into a table and they have the exact same information, they are still unique as they were entered twice.

Leaving that, you gain the benefit of being able to easily select, update, delete based on the id at a relatively low cost (4 bytes). Which arguably, the larger the table, the more useful the id is. So the 4 bytes becomes less and less of a point the larger the table :-)

继续阅读：data-warehouse database-design

True or False: Good design calls for every table to have a primary key, if nothing else, a running integer

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？