Database normalization design - single or multiple tables
Should this be represented in the database as 1 table or 3 tables? I and my friend have different opinions about this so I'd like to see the general views on this. (Maybe it should be a vote for either solution?)
Create Table Order
// Basic fields of the table
- ID (Primary key)
- CustomerID (integer, with a FK)
- Quantity
- ProductID (integer, with a FK)
// Then depending on user selection, either these fields need to be specified
// (could be factored out to a separate table):
{
- InternalAccountID (integer, with a FK)
- InternalCompanyID (integer, with a FK)
}
// Or these (could be factored out to a separate table):
{
- ExternalAccountNumber (free text string)
- ExternalCompanyName (free text string)
- ExtraInformation (free text string)
}
1 table approach:
Pros:
- performance (one insert as opposed to two, FK check, no joins)
- probably takes less space (the extra tables have overhead + indexes + extra ID field)
- one table as opposed to three
- hardly justifiable to have split out to new tables just for 2+3 fields (or what?)
Cons:
- Nullable fields
- Potentially extra "type" column (can be skipped) 开发者_如何学编程
- Breaks 3NF (?)
Pros and cons kindly requested as well as personal opinions. :)
EDIT: I tried simplifying the example by using different entities than I am actually using so any suggestions to altering the model wouldn't really help me. I.e. focus on the technical aspects more than the domain model please.
Hope this is self-explanatory.
My opinion would be that if
// Then depending on user selection, either these fields need to be specified
// (could be factored out to a separate table):
{
- InternalAccountID (integer, with a FK)
- InternalCompanyID (integer, with a FK)
}
// Or these (could be factored out to a separate table):
{
- ExternalAccountNumber (free text string)
- ExternalCompanyName (free text string)
- ExtraInformation (free text string)
}
are always 1:1 with an order (i.e., you can't have 3 accountIDs), then leave it as one table. To take care of your null issue, you could add one more column called InternalCustomer (boolean) or CustomerType (varChar) that you could use to define an internal or external customer to know which of the two sets of fields you should look at for a specific customer.
Since we don't know the full use of this data or the schema for the entire DB, any response on this can't really be fully qualified.
If you want to avoid data duplication, you should go with a 2- or 3-table solution. For example, if you have the External
columns in the Order table, value could exist multiple times. If the data looks like this:
ID ExternalCompanyName
1 ACME
2 ACME
3 My Company
4 ACME
Now, if ACME changes names to ACME, Inc. you must update many rows. If the tables are normalized, where external companies are in a separate table, you would update one row. Note, there may be an argument to put Account Number in it's own table, but we'll leave that for extreme normalization.
It doesn't appear to be a 1-to-1 relationship between an order and a company/account, unless each company/account can only have one order. it sounds more like a 1-to-many relationship.
Now, what happens if a mistake is made when updating the ExternalCompanyName in a single-table environment, and only some of the rows get updated. You have some rows with ACME and some rows with ACME, Inc. You end up with a bad-data situation.
Also, if this is really a 1-to-many relationship, you really aren't saving space. You are duplicating data in the order, rather than storing it once in another table.
As the volume increases selection from two tables may be a lot faster than one. Sometimes this kind of refactoring (partition) is done on mature databases to increase performance.
Imagine using this for a multi table join, where some criteria are on this table, but others are in different tables.
select from order join customer using (customer_id)
where
order.order_date between ? and ?
and customer.name = ?
It may end up fetching all order
rows for the dates from disk, then throwing many of them away because they don't match the join. This fetch from disk is bound to be slow and may to spoil your RAM cache.
select from order join order_detail using (order_id) join customer using (customer_id)
where
order.order_date between ? and ?
and customer.name = ?
In this case when it loads all order
rows from disk it's not going to hurt as bad as previously, because the table is narrower and smaller. It doesn't need to load all the lengthy fields which are irrelevant for filtering. Eventually, after join to customer
, it will only fetch those order_detail
rows which match all criteria.
If you expect this to be large, you should consider splitting the table so that the fields which are most critical for searches are in one table, and "data" fields in other one-to-one table(s).
The bottom line is: Normal form and domain is one thing, but performance often requires tradeoffs. You can hide some of them (cover the split with a view), but not all (duplicate/aggregate fields for the sake of faster selects).
I would absolutely not go with the 3-table solution. By breaking this data into 3 tables, you really can't have any queries return a full order header without joining with the foreign key, and every insert of a new order updates multiple tables and indexes, which is a problem for concurrency. I would suggest using 2 tables, one for InternalOrders and one for ExternalOrders. For those cases where you need a consolidated query of data from both sets of orders, define a view that is the union of both tables.
I am surprised to see product id and quantity as part of the order header. Every order tracking database I've ever seen breaks out order items as a separate table, using the order id as a foreign key, so that a single order can include multiple products (or the same product with various quantities, delivery times, etc.).
I'm no purist, so 3nf is good when it makes sense...but you don't have to take it for granted that it always will.
From a pragmatic standpoint, what is your goal? Your pros/cons list is a good start. I'd add a few more ideas to the list -- as you see fit.
1) Will any other table in your database need to relate (e.g., join) to this data? That's kind of the point of an RDB.
2) Will your database grow? Even if 1 table makes sense now, will it always make sense? You'll regret it, if you find yourself wanting to add more tables, and your non-normalized table is forcing you to 'work-around' it, dealing with extra rows returned, slower execution times, etc.
3) What happens when your customer gets a new external account, or what have you. Will you create a brand-new record? How will you answer questions like "What is customer so-and-so's account number?".
...
I think in general, i go for scalable, which in this case may mean 3nf. 1 table is easier to deal with in a very narrow scope, but if anything ever changes, you'll be dealing with "How do I split this table into properly related 3nf tables, without messing up all of the dependencies that have been created on it?". That one is no fun.
Is account information associated with the customer before he can order (i.e. you have enother table where you track which account ID-s given CustomerID can use)? Can you abstract all accounts to a reasonably uniform schema (that one can have a few nulls) fo that you have one universal AccountId (surrogate key) and then Account's table has say 3 varchar fields and one that tracks the kind of the account (used for billing etc.) ?
If you can do that then your order tracks just one AccountID since the order (as an entity) really doesn't care which payment method was used - it only cares that it's a legit/existing/approved AccountId for that user. Everything else is someone else's business so to speak (billing or checking funds etc.) and that enity and it's processing will need more data anyway.
This keeps your Order clean and null-free and facilitates separation of concerns as well.
Conceptually, your Order is really so called fact table - carrying only numbers and FK-s, small in item size but with a huge number of them.
So:
Table Order (
- OrderId
- Quantity
- ProductId
- DiscountId -- sonner or latter :-)
- AccountId
- PaymentStatus -- probaly FK as well or predefined constant
)
Table Account (
- AccountId
- BillingInfo -- akka ext acct number as text
- PrincialName -- akka ext company name, some equivalent for internal acct-s
- AdditionalData
)
精彩评论