开发者

Database design and foreign keys: Where should they be added in related tables?

I have a relatively simple subset of tables in my database for tracking something called sessions. These are academic sessions (think offerings of a particular program). The tables to represent a sessions information are:

sessions
  session_terms
    session_subjects
      session_mark_item_info
        session开发者_开发技巧_marks

All of these tables have their own primary keys, and are like a tree, in that sessions have terms, terms have subjects, subjects have mark items, etc. So each on would have at least its "parent's" foreign key.

My question is, design wise is it a good idea to include the sessions primary key in the other tables as a foreign key to easily select related session items, or is that too much redundency?

If I include the session foreign key (or all parent foreign keys from tables up the heirarchy) in all the tables, I can easily select all the marks for a session. As an example, something like

SELECT mark FROM session_marks WHERE sessionID=...

If I don't, then I would have to combine selects with something like

WHERE something IN (SELECT...

Which approach is "more correct" or efficient?

Thanks in advance!


The second approach is more correct. And actually to pull the session information you would join tables, do not be afraid to JOIN that is the whole point of relational databases. You do not want to repeat yourself (normalization). So you would only keep a reference to the parent, and not the parent.parent.

This question comes up a lot for beginners, they think creating the same key in sub sub tables is going to make their life easier as there select can then just become:

SELECT blah FROM MyTableSubSub WHERE SessionID=340

The problem is you are introducing repeatitive data inside of tables that probably do not need to know their parents parent. In fact, somewhere in your design you can find this information by Joining to another table. For instance:

SELECT blah FROM MyTableSubSub mtss INNER JOIN ParentTable p ON p.ID = mtss.ID...

Once you are at that point you can then find the session id i nthe parent table. So do NOT repeate columns in tables that are related.


My question is, design wise is it a good idea to include the sessions primary key in the other tables as a foreign key to easily select related session items, or is that too much redundency?

Design-wise, this would add redundant data, and would break normalization. In general you should not be adding redundant fields just to avoid JOINs.

What you describe is called denormalization. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software, but this comes with many trade-offs that are often more important in the vast majority of situations.


I agree with all the prior posts (lots of upvotes there).

One reason you might be considering addng the extra column(s) is that, without them, you end up writing a lot of redundant code (five tables, four inner joins, once or more per database call, yuk). One work-around for this issue is to create a view that "pre-joins" the tables; with this in place, you wouldn't have to rewrite the N-way join each time you had to query down N levels. Some psuedo-code:

CREATE VIEW MyHierarchy AS
SELECT 
  sessions_id
  session_terms_id
  session_subjects_id
  session_mark_item_info_id
  session_marks_id
from sessions
inner join session_terms on ids
inner join session_subjects on ids
inner join session_mark_item_info on ids
inner join session_marks on ids

(Use left outer joins if parent items do not always have child items.)

As an example, to get the mark_item_info for a given session_term, you'd run something like

SELECT mii.*
 from MyHIerarchy mh
  inner join session_mark_item_info mii
   on mii.session_mark_item_info = mh.session_mark_item_info
 where mh.session_term_id = @session_term_id

thus saving a chunk of coding time. And, check the query optimization plan but I'm 95% sure of this, the tables mentioned in the view but not actually used in the query would be "dropped" from the execution plan since they weren't required.


Your instincts are good. You certainly don't want many (or any) redundant foreign keys around to confuse things. Redundant data should be avoided. In many projects though, there is a couple or small handful of places where reduntant data can really simplify things elsewhere. If this is the once duplicated foreign key, and you get a lot of simplicity for it elsewhere, then do it. If you do go this route, then your challenge is making it clear. Two things you can do:

  • have these superflous foreign keys way down the field list in each table, so a developer looking at it later will get a hint that this is extra data, and not the model's real foreign key relationship.
  • Name the field something different, not sessionID. Call it . . . ParentSession. Calling it SessionID will be a problem.

But remember, these modern relational DB's are good at joining, as long as your indexes are good. Only do this if it really makes things simpler.


The best design approach is really to have each of the tables have a foreign key reference to its immediate parent. Having keys of all the parents above the table, will only cause redundancy, and is not advisable. Your concern about performance can be done away with, if you take care to index the tables (in this case you wouldn't need indexing since it is the primary key that you are referencing in the tables below).

And also , using WHERE something IN(Select ..) will cause performance problems for sure. I suggest you can use/customize joins with the schema that i have suggested to create any kind of query for the tree in your database. As an example. For your requirement of finding sesssion marks in a session, you will write

Select * from session_marks
from session, session_terms, session_subjects, session_mark_item_info,session_marks

Where 
session_marks.parent_id = session_mark_item_info.id
and session_mark_item_info.parent_id = session_subjects.id
and session_subjects.parent_id = session_terms.id
and session_terms.parent_id = session.id
and session.id = <some value>

Since the joins are on the indexed(in fact primary) columns, they will be super-fast. You dont have to worry about the performance.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜