开发者

Which normal form does this table violate?

Consider this table:

   +-------+-------+-------+-------+  
   | name  |hobby1 |hobby2 |hobby3 |  
   +-------+-------+-------+-------+   
   | kris  | ball  | swim  | dance |  
   | james | eat   | sing  | sleep |  
   | amy   | swim  | eat   | watch |  
   +-------+-------+-------+-------+

There is no priority on the types of hobbies, thus all the hobbies belong to the same domain. That is, the hobbies in the table can be开发者_如何学编程 moved on any hobby# column. It doesn't matter on which column, a particular hobby can be in any column.

Which database normalization rule does this table violate?


Edit

Q. Is "the list of hobbies [...] in an arbitrary order"?

A. Yes.

Q. Does the table have a primary key?

A. Yes, suppose the key is an AUTO_INCREMENT column type named user_id.

The question is if the columns hobby# are repeating groups or not.


Sidenote: This is not a homework. It's kind of a debate, which started in the comments of the question SQL - match records from one table to another table based on several columns. I believe this question is a clear example of the 1NF violation.

However, the other guy believes that I "have fallen fowl of one of the fallacies of 1NF." That argument is based on the section "The ambiguity of Repeating Groups" of the article Facts and Fallacies about First Normal Form.

I am not writing this to humiliate him, me, or whomever. I am writing this, because I might be wrong, and there is something I am clearly missing and maybe this guy is not explaining it good enough to me.


You say that the hobbies belong to the same domain and that they can move around in the columns. If by this you mean that for any specific name the list of hobbies is in an arbitrary order and kriss could just as easily have dance, ball, swim as ball, swim, dance, then I would say you have a repeating group and the table violates 1NF.

If, on the other hand, there is some fundamental semantic difference between a particular person's first and second hobbies, then there may be an argument for saying that the hobbies are not repeating groups and the table may be in 3NF (assuming that hobby columns are FK to a hobby table). I would suggest that this argument, if it exists, is weak.

One other factor to consider is why there are precisely 3 hobbies and whether more or fewer hobbies are a potential concern. This factor is important not so much for normalization as for flexibility of design. This is one reason I would split the hobbies into rows, even if they are semantically different from one-another.


Your three-hobby table design probably violates what I usually call the spirit of the original 1NF (probably for the reasons given by dportas and others).

It turns out however, that it is extremely difficult to find [a set of] formal and precise "measurable" criteria that accurately express that original "spirit". That's what your other guy was trying to explain talking about "the ambiguity of repeating groups".

Stress "formal", "precise" and "measurable" here. Definitions for all other normal forms exist that satisfy "formal", "precise" and "measurable" (i.e. objectively observable). For 1NF it's just hard (/impossible ???) to do. If you want to see why, try this :

You stated that the question was "whether those three hobby columns constitute a repeating group". Answer this question with "yes", and then provide a rigorous formal underpinning for your answer.

You cannot just say "the column names are the same, except for the numbered suffix". Making a violation of such a rule objectively observable/measurable would require to enumerate all the possible ways of suffixing.

You cannot just say "swim, tennis" could equally well be "tennis, swim", because getting to know that for sure requires inspecting the external predicate of the table. If that is just "person <name> has hobby <hobby1> and also has <hobby2>" , then indeed both are equally valid (aside : and due to the closed world assumption it would in fact require all possible permutations of the hobbies to be present in the table !!!). However, if that external predicate is "person <name> spends the most time on <hobby1> and the least on <hobby2>", then "swim, tennis" could NOT equally well be "tennis,swim". But how do you make such interpretations of the external predicate of the table objective (for ALL POSSIBLE PREDICATES) ???

etc. etc.


This clearly "looks" like a design error.

It's not not a design error when this data is simply stored and retrieved. You need only 3 of the hobbies and you don't intend to use this data in any other way than retrieve.

Let's consider this relationship:

  • Hobby1 is the main hobby at some point in a person's life (before 18 years of age for example)
  • Hobby2 is the hobby at another point (19-30)
  • Hobby3 is her hobby at a another one.

Then this table seems definitely well designed and while the 1NF convention is respected the naming arguably "sucks".

In the case of an indiscriminate storage of hobbies this is clearly wrong in most if not all cases I can think of right now. Your table has duplicate rows which goes against the 1NF principles.

Let's not consider the reduced efficiency of SQL requests to access data from this table when you need to sort the results for paging or any other practical reason.

Let's take into consideration the effort required to work with your data when your database will be used by another developer or team:

  • The data here is "scattered". You have to look in multiple columns to aggregate related data.
  • You are limited to only 3 of the hobbies.
  • You can't use simple rules to establish unicity (same hobby only once per user).

You basically create frustration, anger and hatred and the Force is disturbed.


Well,

The point is that, as long as all hobby1, hobby2 and hobby3 values are not null, AND names are unique, this table could be considered more or less as abbiding by 1NF rules (see here for example ...)

But does everybody has 3 hobbies? Of course not! Do not forget that databases are basically supposed to hold data as a representation of reality! So, away of all theories, one cannot say that everybody has 3 hobbies, except if ... our table is done to hold data related to people that have three hobbies without any preference between them!

This said, and supposing that we are in the general case, the correct model could be

+------------+-------+
| id_person  |name   |
+------------+-------+  

for the persons (do not forget to unique key. I do not think 'name' is a good one

+------------+-------+
| id_hobby   |name   |
+------------+-------+ 

for the hobbies. id_hobby key is theoretically not mandatory, as hobby name can be the key ...

+------------+-----------+
| id_person  |id_hobby   |
+------------+-----------+  

for the link between persons and hobbies, as the physical representation of the many-to-many link that exists between persons and their hobbies.

My proposal is basic, and satisfies theory. It can be improved in many ways ...


Without knowing what keys exist and what dependencies the table is supposed to satisfy it's impossible to determine for sure what Normal Form it satisfies. All we can do is make guesses based on your attribute names.

Does the table have a key? Suppose for the sake of example that Name is a candidate key. If there is exactly one value permitted for each of the other attributes for each tuple (which means that no attribute can be null) then the table is in at least First Normal Form.


If any of the columns in the table accept nulls then then the table violates first normal form. Assuming no nulls, @dportas has already provided the correct answer.


The table does not violate first normal form.

First normal form does not have any prohibition against multiple columns of the same type. As long as they have distinct column names, it is fine.

The prohibition against "Repeating Groups" concerns nested records - a structure which is common in hierarchical databases, but typically not possible in relational databases.

The table using repeating groups would look something like this:

+-------+--------+  
| name  |hobbies |  
+-------+--------+
| kris  |+-----+ |  
|       ||ball | |
|       |+-----+ |
|       ||swim | |
|       |+-----+ |
|       ||dance| |
|       |+-----+ |
+-------+--------+
| james |+-----+ |  
|       ||eat  | |
|       |+-----+ |
|       ||sing | |
|       |+-----+ |
|       ||sleep| |
|       |+-----+ |
+-------+--------+
| amy   |+-----+ |  
|       ||swim | |
|       |+-----+ |
|       ||eat  | |
|       |+-----+ |
|       ||watch| |
|       |+-----+ |
+-------+--------+

In a table conforming to 1NF all values can be located though table name, primary key, and column name. But this is not possible with repeated groups, which require further navigation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜