True or False: Good design calls for every table to have a primary key, if nothing else, a running integer
Consider a grocery store scenario (I'm making this up) where you h开发者_开发技巧ave FACT records that represent a sale transaction, where the columns of the Fact table include
SaleItemFact Table
------------------
CustomerID
ProductID
Price
DistributorID
DateOfSale
Etc
Etc
Etc
Even if there are duplicates in the table when you consider ALL the keys, I would contend that a surrogate running numeric key (i.e. identity column) should be made up, e.g., TransactionNumber of type Integer.
I can see someone arguing that a Fact table might not have a unique key (though I'd invent one and waste the 4 bytes, but how about a dimension table?
First normal form requires a primary key on every table. So this is the bare minimum required for good database design. What you choose for the primary key is open to much debate. But first normal form for database design is not.
One reason, among many, to have a unique key per row (built from the data, or otherwise) is to facilitate updates or deletions to that specific row.
In any case, this question is kind of silly, because there really isn't an engineering trade-off at stake. There is no real proposed benefit to not having the key, so what's the point? True/yes, rows should have unique identifiers.
Because your question is under datawarehousing:
Dimension tables should have a surrogate (meaningless) primary key, usually an auto-increment integer; and a business key which uniquely identifies an object that the table row describes -- like an email address, full name or similar.
Fact tables mostly (almost always) have a primary key which is combination of two or more foreign keys.
There should be no duplicates in fact tables when combining foreign keys into the primary key. To test this, simply try to load the same transaction twice -- it should fail. An auto generated primary key will not prevent this, because it does not exists outside the warehouse. The problem can be usually solved by including the time-stamp into the primary key.
Sometimes a fact table is used as a dimension, or in a view that may act as a dimension. In this case it is convenient to have one (big)integer as a primary key, instead of several FK fields -- however, the original combination of FKs and time-stamp(s) should still uniquely identify the fact row.
For data warehouses, fact tables often have a composite primary key, usually the composite of all the foreign keys to your dimension tables.
It's rather common to not have any primary key in your fact tables as well, as they often serve no purpose other than wasting space - and for large datawarehouses the space can be quite big. Your dimension tables will have primary keys though.
If you're talking about the OLTP part of your grocery store, you would normally follow standard OLTP database design, normalize your tables and provide a primary key.
True. Think conceptually, everything is unique even if its not defined by unique data. So if you enter data into a table and they have the exact same information, they are still unique as they were entered twice.
Leaving that, you gain the benefit of being able to easily select, update, delete based on the id at a relatively low cost (4 bytes). Which arguably, the larger the table, the more useful the id is. So the 4 bytes becomes less and less of a point the larger the table :-)
精彩评论