Database Modelling - Conceptually different entities with near identical fields
Suppose you have two sets of conceptual entities:
- MarketPriceDataSet which has multiple ForwardPriceEntries
- PoolPriceForecastDataSet which has multiple PoolPriceForecastEntry
Both different child objects have near identical fields:
ForwardPriceEntry has
- StartDate
- EndDate
- SimulationItemId
- ForwardPrice
- MarketPriceDataSetId (foreign key to parent table)
PoolPriceForecastEntry has
- StartDate
- EndDate
- SimulationItemId
- ForecastPoolPrice
- PoolPriceForecastDataSetId (foreign key to parent table)
If I modelled them as separate tables, the only difference would be the foreign key, and the name of the price field.
There has been a debate as to whether the two near identical tables should be merged into one.
Options I've thought of to model this is:
- Just keep them as two independent, separate tables
- Have both sets in the one table with an additional "type" field, and a parent_id equalling a foreign key to either parent table. This would sacrifice referential integrity checks.
- Have both sets in the one table with an additional "type" field, and create a complicated sequence of joining tables to maintain referential integrity.
What do 开发者_高级运维you think I should do, and why?
Other information which may or may not be relevant:
The two sets of data are loosely related: one set of each will be loaded into memory for batch processing, and occasionaly a set of PoolPriceForecastEntries will be generated by copying from ForwardPriceEntries.
MarketPriceDataSet and PoolPriceForecastDataSet do have differing fields. It would be possible to merge them into one table, but then you would have fields that are meaningless in half the entries.
"There has been a debate as to whether the two near identical tables should be merged into one."
Has there ?
In database design, tables should remain separate if their predicates are different. That is, if the meaning attached to the rows in the table is different. Whether the set of columns is almost or even completely equal, is immaterial.
Very often, in the situation you described, the subject being modeled fits a pattern called "generalization specialization". The fact that two prior answers discuss subtypes supports this. An object model tends to deal with gen-spec by using subtypes and inheritance.
The relational model can also deal with gen-spec, but it's a little more intricate and is not usally taught in database primers. What you do is have several tables, one for each subtype, holding a key and non key attributes that are peculiar to the subtype. Data that is relevant to the supertype, and inherited by all the subtypes, is generally consolidated into a single table.
There is a trick you can play with the keys to these tables that cuts down on both storage and processing. In the subtype tables, use the foreign key back to the supertype entry as the primary key to the subtype table. It's guaranteed to exist, and it's guaranteed to be unique. Why bother to invent yet another key?
If you search on "generalization specialization relational modeling" you'll get about a dozen articles on the subject. some of them are quite good. You can also follow the tag: class-table-inheritance
Since the entities share so many attributes you could think of them as subtypes.
Logically you would have a supertype of PriceEntry and subtypes of ForwardPriceEntry and PoolPriceForecastEntry. One question to answer would be whether the price is common to the supertype. I'll assume it is.
Now the question is how to physically implement the subtype. There are 3 approaches that you can take:
- Create a table for each subtype (Rolldown)
- Create a single table with all attributes (Rollup)
- Create one supertype table and a subtype table for each subtype (Identity)
Each of these approaches has pros and cons. For a discussion of the pros and cons of each approach see Getting Physical with Subtypes.
In this case, since the subtypes share so many attributes, you could take the rollup approach and create a single table.
The PriceEntry table could look like this:
PriceEntryId (PK)
PriceEntryTypeCode (NN)
StartDate (NN)
EndDate
SimulationItemId (NN)
Price (NN)
MarketPriceDataSetId (FK)
PoolPriceForecastDataSetId (FK)
You can still enforce the FK on the MarketPrice and PoolForecast columns. You could also add a table level check constraint to ensure that at least one of the FKs are populated.
However, since there is only one attribute that is different between the 2 subtypes, many of the pros and cons do not strongly point in one direction. So, in the end, I would probably opt for keeping the data model easy to understand and use. For me, the two table (rolldown) approach strikes a nice balance; conceptually it's easier to understand than the rollup and when developing it's easier to use than identity (no joins or multiple inserts/updates).
If it's your intention to use the data from the two sub-types in roughly the same way I'd suggest putting all the data into a single table housing all five of the fields...
- StartDate
- EndDate
- SimulationItemId
- ForwardPrice
- MarketPriceDataSetId (foreign key to parent table)
... and place a null into either the ForwardPrice or ForecastPoolPricedepending field pending the parent entity a particular row is supporting. The space cost of the null on each row would be virtually non-existent and compression success would be very high on the table but for indexing and reads your performance would be impressive.
精彩评论