Database design: columns with duplicate meaning
maybe the title of the question don't describe my problem very well but here it is :
let's say i have a table article that look like this:
+ title
+ author
.
.
.
+ status [choices : ('draft', ..., 'translate')]
And let's say in my business process i publish in my web page articles that have [status = 'translate']
Is it a good design decision to add another field :
+ read [bool]
to 开发者_如何学Cmy table that's mean that the article is ready to be publish or is it a bad design because i can test on the status == 'translate' for that and the new field will be just a duplicate ??
i hope that my question is clear , and thanks in advance.
Here's a fundamental DB design concept (it's actually a part of making your table comply with 3NF): Non of your columns should depend on anything but the primary key of the table. That should answer your question.
Here's a good quote to remember that:
every non-key attribute
"must provide a fact about the key, the whole key, and nothing but the key so help me Codd".
(that's also from wiki)
The reason for that, is that breaking this law might compromise Data Integrity.
Bad design.
First, what you do have here is a field that is basically the current state of a state engine.
Second, status should be a separate table - do NOT put status texts in the same table. You can then add additional info for every possible status to the status table.
Duplicate. If you can manage without a column, don't use it.
Think about the overhead it adds to your database (and besides, a boolean column cannot be indexed, so it also won't increase yourperformance).
(And of course, replace the status strings with numeric values).
Good luck.
In order for there not to be a potential data integrity conflict, you could make the "ready" column a computed column, or you could make a view which provides this translation service.
However, for this particular design, I would put the states into a table and have an IsReady column in the state table. Then you could add different states which are all IsReady. I have used designs like this many times, where certain states are equivalent for some operations, but not for others. Each has a flag. In my particular case, many batches in different states were allowed to be counted as "successful" for average timing/performance purposes, but batches which had completely successfully but were later invalidated were not considered "successful", etc.
This case has a name in normalization theory. It's called "harmful redundancy". Here are some potential drawbacks to harmful redundancy: the database can contradict itself; too much space is wasted; too much time is wasted.
Contradictions in the database gets the most air time in tutorials on database design. You have to take some measures to prevent this situation from arising, or live with the consequences. You can rely on careful programming to keep contradictions out of the database, or you can declare constraints that will prevent any transaction from leaving the database in a contradictory state.
The waste of space is usually, but not always, a trivial cost. Wasted space can result in wasted time as a consequence.
The waste of time is the one that preoccupies programmers the most. But here, the issue gets to be more subtle. Sometimes "harmful redundancy" results in saving time, not wasting it. Most often, it results in extra time during updates, but time saving during retrieval. Often, the time savings or wastage is trivial, and therefore so is the design decision, from the point of view of speed.
In your case, the speed consequences should be minimal. Here's a hint: how often do you update rows? How often do your read them? how important is speed in updating or reading? If the time you gain during reading has more weight for yuo than the time you spend updating, then go for it.
精彩评论