Database design: circular references
I have three database tables:
- users
- emails
- invitations
Emails are linked to users by a user_id field.
Invitations are also linked to users by a user_id field
Emails can be created without an invitation, but every invitation must have an email.
I would like to link the emails and invitations tables so it is开发者_StackOverflow中文版 possible to find the email for a particular invitation.
However this creates a circular reference, both an invitation and an email record hold the id for the same user.
Is this bad design and if so, how could I improve it?
My feeling is that with use of foreign keys and good business logic, it is fine.
users
-----
id
emails
------
id
users_id
invitations
-----------
id
users_id
emails_id
This is not a circular reference.
It would be if emails would have a strong integrity relationship to invitations and invitations an independent strong integrity relationship back to emails (for example).
EDIT: regarding the design
As Henk Holterman points out the question is if your design is normalized to desired extent.
Assiming tables: primary keys such as
users: id
emails: id, users_id
invitations: id, users_id, emails_id
and assuming foreign keys on table_id fields and that no other constraints are placed on the tables (such for example only a part of a key being unique) then you have modelled the following:
- for each user there can be several e-mails and you can not have emails with no corresponding user record
- for each email there can be several invitations and you can not have invitations with no corresponding e-mail nor user record (note: from the above definition we can not know if the user_id refers to entry in emails or in users)
Now only you can say if those rules correspond to the ones from the real world situation that you are trying to model.
One way to look at the database design is - there is actually no wrong database design, you can almost always find data that would make something that looks like an error justified. That's why without taking both rules (in form of sentences) and the tables (E-R diagram, description of tables and relationships) it is impossible to say if there is a problem in design (though it is possible to give suggestions from personal experience).
To illustrate - the above note that it is not clear which table user_id refers to might seem easy to answer. And the common answer, considering that you said that every invitation has a mail, is that it should refer to user_id from the mail table.
Otherwise there could exist an invitation for which user_id recorded on the invitation and the user_id recorded for a mail are different.
Normally, this should make a red light labelled 'normalize your data' go flashing in your mind. But, often unspoken assumption here is that the email_id determines the user_id, and that might not be true(!).
This depends on the semantics of your data (the predicate of each table) - for example if you are trying to model a situation where it is possible to send invitation to one person, and receive an e-mail reply from another person (for example inviting people through secretary and receiving direct replies), then the red light switches off and all is fine - that is what really happened and that is what you are going to allow for in your design.
I think the proper normal form here would be to let Invitation have a FK relation to Email and not to User. And I think that would work fine.
In your description you state that an Invitation belongs to a User but from the constraints it is more accurate to say that an Invitation belongs to an Email.
On the other hand, you are right, giving Invitation both an UserId and an EmailId would not be a great problem. It would not give much advantage either.
Every invitation must have an email, but emails can be created without invitations. Ok - but can emails be created without a user?
If an invitation must have an email and the email must also belong to a user, then the invitation is really related to a user. In this case, your current data model is OK and you don't need to add any fields - you just join invitations to emails via the user id.
If emails can exist without a user, then they are independent from both users and invitations. In this case you should have an "email-id" field in both "users" and "invitations" that link to the "emails" table. In this case though, you could end up with a situation where a user has one email, and his invitation has a separate (or duplicate) email.
Emails can be created without an invitation, but every invitation must have an email.
Can there be more than 1 invitation
per email
? If not, then your design is already wrong.
I would like to link the emails and invitations tables so it is possible to find the email for a particular invitation. However this creates a circular reference, both an invitation and an email record hold the id for the same user.
It's unclear (to me) whether the user_id
in invitation
refers to the same user_id
in email
. If so, then it's redundant and is a denormilization. Remove it, and use email
as your linking table to get back to the user_id
. If you require it for performance or such, create a unique constraint
across email.(id, user_id)
(already unique, since id
is your primary key) and a foreign key across both columns. That gives you the denormalization without sacrificing data integrity.
If it's a different user_id
, then you may want to improve your naming a bit. Something like, invited_user_id
and email.user_id
becomes email.sent_by_user_id
.
I think you are good about all the answers you have already, i just would not use table names in plural form.
It is confusing and unnecessary.
I share the opinion about the user_id not belonging to the invitations table, but just if you are really sure that you want to make a strong link between user_id and email_id.
If that's the case, then just remove the user_id from the invitations table, it is useless.
精彩评论