开发者

Database Normalization. To infinity and beyond?

Exactly how far do you normalize the example below, and exactly which level of normaliza开发者_如何学Pythontion does this example meet?

CREATE TABLE "public"."contact_info" (
  "id" SERIAL, 
  "home_phone" TEXT, 
  "mobile_phone" TEXT, 
  "work_phone" TEXT, 
  "fax_phone" TEXT, 
  "email" TEXT, 
  "line1" TEXT, 
  "line2" TEXT, 
  "city" TEXT, 
  "state_id" INTEGER, 
  "zipcode" TEXT, 
  "preferred_type" TEXT, 
  "first_name" TEXT, 
  "last_name" TEXT,
  CONSTRAINT "contact_info_pkey" PRIMARY KEY("id"), 
  CONSTRAINT "contact_info_fk_state_id" FOREIGN KEY ("state_id")
    REFERENCES "public"."states"("id")
    ON DELETE NO ACTION
    ON UPDATE NO ACTION
    NOT DEFERRABLE
) WITH OIDS;

It should be noted that each record can have only 0 or 1 home, mobile, work, or fax number records. Each phone number is fully functionally dependent upon my primary key. From what I know about database normalization, I think this meets 5NF, but fails 6NF.

Since the phone number definitions are set in stone (no new phone number types, no multiple home_phone entries, etc.), is there any reason I would want to split the phone numbers into a separate relation tracked by (contact_info_id, phone_number, type)?


Because you have defined this system to have only 0 or 1 telephone numbers of each type, your system is meeting a higher level of normalization than a more accurate system would have.

Personally, I would store phone numbers in a separate table with id, public id, type, area code, exchange, suffix and extension. If you ever need to select all with a certain one of these values, it would be much more painful to do programatically in SQL.

In the same way, you have chosen to break the person's name into first and last parts (however neglecting prefix, middle name and suffix). If you can break one column into logical parts, one could argue that you have not passed BCNF.

The choices you'll have to make regarding how far you want to normalize your table depend mainly on how you choose to access the data. For most people, breaking a column like phone numbers into multiple parts is overkill, but if any of my above points seem useful for your needs, then consider it.

PS. What are line1 and line2?


I would not normalize it. Nothing much to be gained. we aren't running databases on 8-bit computers with 8 mb of ram and a 32mb hard-drive anymore. This is specific to this example. Complex data structures should be normalized.

The only real space saver is the phone number, and its on average 10-char wide, if you normalize that out and replace it with a bigint-foreign-reference you don't seem to gain much but runtime joins :P, and on top of that you need a int that represents what type the phone number is.


A general rule of thumb on table design is: if you expect your table to contain a high percentage of NULL values for some columns, then these columns should probably be in separate tables. Anyway a denormalized design may be ok if you don't expect to have performance problems and it helps you when maintaining the database.


Given you say:

It should be noted that each record can have only 0 or 1 home, mobile, work, or fax number records.

... then your table design is a good match as it tightly binds all of these data values to this contact record.

Depending on the application, I could propose a more flexible design where a contact could have multiple addresses of different types, multiple phone numbers of different types and multiple email addresses for different purposes and even multiple phone numbers and/or email addresses to an address.

But then I've worked on several different applications only one of which seriously needed such flexibility (and no, we didn't actually get there): it was records for a school. Each contact record represented a single person and could be any or all of a student (or ex-student), a parent, a teacher, a login-id or some other person of interest. And, yes, we had one school where we very nearly had people in all of those categories!

However, in another application, we did not need this diversity: a software license was allocated to an email address and thus a person. There was one mailing address associated with the purchase and by extension the person and that was pretty much it. We aggregated licenses by email address (one person with multiple licenses) and later added a way to aggregate people with multiple email addresses (one person with multiple email addresses). Physical addresses were more tightly bound to billing information.


One thing that I would point out is that using TEXT columns is a bit of a waste, why not VARCHAR?

An advantage to normalization would be a reduction in duplication of code for validation. Another place where you anticipate future variation is if you needed to add a second address (we have work phone here, but address is assumed to be home address?).


is there any reason I would want to split the phone numbers into a separate relation tracked by (contact_info_id, phone_number, type)?

The likelihood of business rules changing to allow for multiple number types to be associated to a specific person. If you normalize the table, you'd only have to tweak constraints & there's be little to no changes in supporting code to accommodate the change. In current form, you'd have a major data model change and subsequent application changes - very expensive.

I'd like to point out that address (lines 1 & 2, city, state & zipcode) could also be normalized to support storing a persons home, business addresses. I could have numerous summer homes...


Why not put a seperate Table with 3 fields: id, type_of and number. Then you could get rid of your *_phone stuff. Something like:

id      type_of      number
1       home         222 11 22
1       work         312 12 12
2       mobile       345 23 23
2       home         233 65 23
2       work         945 30 19

I would sugest to do the same with email, because the person could have an email at home and one at work and an other some where else.


I'd split out the phone numbers - for all the reasons mentioned by others, PLUS - if your users ever want to search by phone number, it would be a real hassle to write the sql with this design (assuming you wanted to search across multiple types of phone numbers).

In a "contact info" table, it's very likely that users may want to do this type of search.


One could argue that zipcode defines (city, state_id), so those should be normalized as dependent columns. However, unless you are writing the app for USPS, this is typically not done. Same with your original question - having 4 phone records doesn't hurt too much, I'd keep the design this way.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜