Database design: objects with different attributes
I'm designing a product database where products can have very different attributes depending on their type, but attributes are fixed for each type and types are not manageable at all. E.g.:
magazine: title, issue_number, pages, copies, close_date, release_date
web_site: name, bandwidth, hits, date_from, date_toI want to use InnoDB and enforce database integrity as much as the engine allows. What's the recommended way to handle this?
I hate those designs where tables have 100 columns and most of the values are NULL so I thought about something like this:
product_type
============
product_type_id INT
product_type_name VARCHAR
product
=======
product_id INT
product_name VARCHAR
product_type_id INT -> Foreign key to product_type.product_type_id
valid_since DATETIME
valid_to DATETIME
magazine
========
magazine_id INT
title VARCHAR
product_id INT -> Foreign key to product.product_id
issue_number INT
pages INT
copies INT
close_date DATETIME
release_date DATETIME
web_site
========
web_site_id INT
name VARCHAR
product_id INT -> Foreign key to product.product_id
bandwidth INT
hits INT
date_from D开发者_如何学JAVAATETIME
date_to DATETIME
This can handle cascaded product deletion but... Well, I'm not fully convinced...
This is a classic OO design to relational tables impedance mismatch. The table design you've described is known as 'table per subclass'. The three most common designs are all compromises compared to what your objects actually look like in your app:
- Table per concrete class
- Table per hierarchy
- Table per subclass
The design you don't like - "where tables have 100 columns and most of the values are NULL" - is 2. one Table to store the whole specialization hierarchy. This is the least flexible for all kinds of reasons, including - if your app requires a new sub-class, you need to add columns. The design you describe accommodates change much better because you can add extend it by adding a new sub-class table described by a value in product_type.
The remaining option - 1. Table per concrete class - is usually undesirable because of the duplication involved in implementing all the common fields in each specialization table. Although, the advantages are that you wont need to perform any joins and the sub-class tables can even be on different db instances in a very large system.
The design you described is perfectly viable. The variation below is how it might look if you were using an ORM tool to do your CRUD operations. Notice how the ID in each sub-class table IS the FK value to the parent table in the hierarchy. A good ORM will automatically manage the correct sub-class table CRUD based on the value of the discriminator values in product.id and product.product_type_id alone. Whether you are planning on using an ORM or not, look at hibernate's joined sub-class documentation, if only to see the design decisions they made.
product
=======
id INT
product_name VARCHAR
product_type_id INT -> Foreign key to product_type.product_type_id
valid_since DATETIME
valid_to DATETIME
magazine
========
id INT -> Foreign key to product.product_id
title VARCHAR
..
web_site
========
id INT -> Foreign key to product.product_id INT
name VARCHAR
..
You seem to be roughly on the right track, except that you may need to consider the difference between "a product" and what's often called "a stock-keeping unit" (SKU). Is a 25-units box of paper clips (of a certain specific kind) the same "product" as a 50-units box thereof? In terms of a store, or any kind of inventory system, the distinction matters; in some cases, indeed, a simple distinction in packaging of what's otherwise the same amount of the same underlying "product" may give you distinct SKUs to keep track of.
You need to decide where you want to keep track of this issue, if it matters to your application (it may be OK to have the products laid out as you do, and deal with packaging for SKU purposes in other tables, for example, even though for some apps that might be a slight overhead).
This actually a standard way to "enforce" a sort of OO design in a classical RDBMS.
All the "common" attributes go on the master table (e.g. Price, if it is mantained at the product table level, could easily be part of the main table) while the specifics go on a subtable.
In theory if you have sub-sub-types (e.g. magazines could be subtyped in daily newspapers and 4-colours periodicals, maybe, with periodicals having a date interval for shelf-life) you could add one or more sublevels too...
This is pretty common (and proven) design. The only concern is that the master table will always be joined with at least a subtable for most operations. If you have zillions of items this could have performance implications.
On the other hand, common operation like deleting an item (I'd suggest a logical deletion, setting a flag to "true" on the master table) would be done once for every kind of subtype.
Anyway, go for it. And maybe google around for "Object oriented to RDBMS mappings" or somesuch for a complete discussion.
精彩评论