Database design and foreign keys: Where should they be added in related tables?
I have a relatively simple subset of tables in my database for tracking something called sessions. These are academic sessions (think offerings of a particular program). The tables to represent a sessions information are:
sessions
session_terms
session_subjects
session_mark_item_info
session开发者_开发技巧_marks
All of these tables have their own primary keys, and are like a tree, in that sessions have terms, terms have subjects, subjects have mark items, etc. So each on would have at least its "parent's" foreign key.
My question is, design wise is it a good idea to include the sessions primary key in the other tables as a foreign key to easily select related session items, or is that too much redundency?
If I include the session foreign key (or all parent foreign keys from tables up the heirarchy) in all the tables, I can easily select all the marks for a session. As an example, something like
SELECT mark FROM session_marks WHERE sessionID=...
If I don't, then I would have to combine selects with something like
WHERE something IN (SELECT...
Which approach is "more correct" or efficient?
Thanks in advance!
The second approach is more correct. And actually to pull the session information you would join tables, do not be afraid to JOIN that is the whole point of relational databases. You do not want to repeat yourself (normalization). So you would only keep a reference to the parent, and not the parent.parent.
This question comes up a lot for beginners, they think creating the same key in sub sub tables is going to make their life easier as there select can then just become:
SELECT blah FROM MyTableSubSub WHERE SessionID=340
The problem is you are introducing repeatitive data inside of tables that probably do not need to know their parents parent. In fact, somewhere in your design you can find this information by Joining to another table. For instance:
SELECT blah FROM MyTableSubSub mtss INNER JOIN ParentTable p ON p.ID = mtss.ID...
Once you are at that point you can then find the session id i nthe parent table. So do NOT repeate columns in tables that are related.
My question is, design wise is it a good idea to include the sessions primary key in the other tables as a foreign key to easily select related session items, or is that too much redundency?
Design-wise, this would add redundant data, and would break normalization. In general you should not be adding redundant fields just to avoid JOINs.
What you describe is called denormalization. In some cases, denormalization helps cover up the inefficiencies inherent in relational database software, but this comes with many trade-offs that are often more important in the vast majority of situations.
I agree with all the prior posts (lots of upvotes there).
One reason you might be considering addng the extra column(s) is that, without them, you end up writing a lot of redundant code (five tables, four inner joins, once or more per database call, yuk). One work-around for this issue is to create a view that "pre-joins" the tables; with this in place, you wouldn't have to rewrite the N-way join each time you had to query down N levels. Some psuedo-code:
CREATE VIEW MyHierarchy AS
SELECT
sessions_id
session_terms_id
session_subjects_id
session_mark_item_info_id
session_marks_id
from sessions
inner join session_terms on ids
inner join session_subjects on ids
inner join session_mark_item_info on ids
inner join session_marks on ids
(Use left outer joins if parent items do not always have child items.)
As an example, to get the mark_item_info for a given session_term, you'd run something like
SELECT mii.*
from MyHIerarchy mh
inner join session_mark_item_info mii
on mii.session_mark_item_info = mh.session_mark_item_info
where mh.session_term_id = @session_term_id
thus saving a chunk of coding time. And, check the query optimization plan but I'm 95% sure of this, the tables mentioned in the view but not actually used in the query would be "dropped" from the execution plan since they weren't required.
Your instincts are good. You certainly don't want many (or any) redundant foreign keys around to confuse things. Redundant data should be avoided. In many projects though, there is a couple or small handful of places where reduntant data can really simplify things elsewhere. If this is the once duplicated foreign key, and you get a lot of simplicity for it elsewhere, then do it. If you do go this route, then your challenge is making it clear. Two things you can do:
- have these superflous foreign keys way down the field list in each table, so a developer looking at it later will get a hint that this is extra data, and not the model's real foreign key relationship.
- Name the field something different, not sessionID. Call it . . . ParentSession. Calling it SessionID will be a problem.
But remember, these modern relational DB's are good at joining, as long as your indexes are good. Only do this if it really makes things simpler.
The best design approach is really to have each of the tables have a foreign key reference to its immediate parent. Having keys of all the parents above the table, will only cause redundancy, and is not advisable. Your concern about performance can be done away with, if you take care to index the tables (in this case you wouldn't need indexing since it is the primary key that you are referencing in the tables below).
And also , using WHERE something IN(Select ..) will cause performance problems for sure. I suggest you can use/customize joins with the schema that i have suggested to create any kind of query for the tree in your database. As an example. For your requirement of finding sesssion marks in a session, you will write
Select * from session_marks
from session, session_terms, session_subjects, session_mark_item_info,session_marks
Where
session_marks.parent_id = session_mark_item_info.id
and session_mark_item_info.parent_id = session_subjects.id
and session_subjects.parent_id = session_terms.id
and session_terms.parent_id = session.id
and session.id = <some value>
Since the joins are on the indexed(in fact primary) columns, they will be super-fast. You dont have to worry about the performance.
精彩评论