Using IDs from multiple tables in a single column
The system rules:
- Departments can have 0 to many divisions.
- A division must belong to only one department.
- An article can be assigned to either a department, or a division of that department.
One of my co-workers created this:
Department
----------
DepartmentID (PK) int NOT NULL
DepartmentName varchar(开发者_运维问答50) NOT NULL
Division
--------
DivisionID (PK) int NOT NULL
DepartmentID (FK) int NOT NULL
DivisonName varchar(50) NOT NULL
Article
-------
ArticleID (PK) int NOT NULL
UniqueID int NOT NULL
ArticleName varchar(50) NOT NULL
All DepartmentIDs would be between 1 and 100, and all DivisionIDs would be between 101 and 200. He states that when querying the Article table, you will know whether the UniqueID is from the Department table or the Division table based on what range it falls into.
That is a poor design and I proposed the following:
Department
----------
DepartmentID (PK) int NOT NULL
ParentDepartmentID (FK) int NULL /* Self-referencing foreign key. Divisions have parent departments. */
DepartmentName varchar(50) NOT NULL
Article
-------
ArticleID (PK) int NOT NULL
DepartmentID (FK) int NOT NULL
ArticleName varchar(50) NOT NULL
This is a properly normalized schema and properly enforces relationships and data integrity, while honoring the business rules.
Using one column to contain values from two domains is poor design, and I can argue the benefits of the foreign key in the Article table. However, what is a reference to a specific database design article / paper to back up my position?
Your coworker has implemented a design called Polymorphic Associations. That is, the "foreign key" refers to one of two different parent tables. Most people add another column parent_type
or something like that so that you can tell which parent table a given row references. In your coworker's case, he has instead subdivided the range of id's. This is a fragile design, because you can't enforce it at the database level. If you ever insert a department number > 100, you can't know if your articles apply to a department or a division.
Whereas you have developed a design that looks like Single Table Inheritance, where you store multiple related types in a single table, so the primary keys are assured to remain unique, and the Articles can reference any instance of any of the related types.
Here's another alternative:
Think of object-oriented design. If you wanted to allow two different classes to have articles, you might make a common superclass or a common interface for the two classes. You can do the same thing in SQL:
ArticleProducer
---------------
ProducerID (PK) int NOT NULL
Department
----------
DepartmentID (PK) int NOT NULL, (FK)->ArticleProducer
DepartmentName varchar(50) NOT NULL
Division
--------
DivisionID (PK) int NOT NULL, (FK)->ArticleProducer
DepartmentID (FK) int NOT NULL
DivisonName varchar(50) NOT NULL
Article
-------
ArticleID (PK) int NOT NULL, (FK)->ArticleProducer
UniqueID int NOT NULL
ArticleName varchar(50) NOT NULL
So an article must be produced by a single ArticleProducer
. Each Department or Division is an ArticleProducer.
See also Why can you not have a foreign key in a polymorphic association?
For more about Polymorphic Associations, see my presentation Practical Object-Oriented Models in SQL, or my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Re comments from Erwin Smout:
You're right, trying to enforce that no more than one row from all the subtype tables is a bit tricky. MySQL does not support CHECK constraints in any storage engine, unfortunately. You can achieve something similar with lookup tables:
CREATE TABLE ArticleProducerTypes (ProducerType TINYINT UNSIGNED PRIMARY KEY);
INSERT INTO ArticleProducerTypes VALUES (1), (2);
CREATE TABLE ArticleProducer (
ProducerID INT UNSIGNED NOT NULL PRIMARY KEY,
ProducerType TINYINT UNSIGNED NOT NULL,
UNIQUE KEY (ProducerID,ProducerType),
FOREIGN KEY (ProducerType)
REFERENCES ArticleProducerTypes(ProducerType)
) ENGINE=InnoDB;
CREATE TABLE DepartmentProducerType (ProducerType TINYINT UNSIGNED PRIMARY KEY);
INSERT INTO DepartmentProducerType VALUES (1);
CREATE TABLE Department (
DepartmentID INT UNSIGNED NOT NULL PRIMARY KEY,
DepartmentName VARCHAR(50) NOT NULL,
ProducerType TINYINT UNSIGNED NOT NULL,
FOREIGN KEY (DepartmentID, ProducerType)
REFERENCES ArticleProducer(ProducerID, ProducerType),
FOREIGN KEY (ProducerType)
REFERENCES DepartmentProducerType(ProducerType) -- restricted to '1'
) ENGINE=InnODB;
CREATE TABLE DivisionProducerType (ProducerType TINYINT UNSIGNED PRIMARY KEY);
INSERT INTO DivisionProducerType VALUES (2);
CREATE TABLE Division (
DivisionID INT UNSIGNED NOT NULL PRIMARY KEY,
ProducerType TINYINT UNSIGNED NOT NULL,
DepartmentID INT UNSIGNED NOT NULL,
FOREIGN KEY (DivisionID, ProducerType)
REFERENCES ArticleProducer(ProducerID, ProducerType),
FOREIGN KEY (ProducerType)
REFERENCES DivisionProducerType(ProducerType), -- restricted to '2'
FOREIGN KEY (DepartmentID)
REFERENCES Department(DepartmentID)
) ENGINE=InnODB;
CREATE TABLE Article (
ArticleID INT UNSIGNED NOT NULL PRIMARY KEY,
ArticleName VARCHAR(50) NOT NULL,
FOREIGN KEY (ArticleID)
REFERENCES ArticleProducer(ProducerID)
);
Now each given row in ArticleProducer can be referenced by either Department or Division, but not both.
If we want to add a new producer type, we add one row to the ArticleProducerTypes lookup table, and create a pair of new tables for the new type. For example:
INSERT INTO ArticleProducerTypes VALUES (3);
CREATE TABLE PartnerProducerType (ProducerType TINYINT UNSIGNED PRIMARY KEY);
INSERT INTO PartnerProducerType VALUES (3);
CREATE TABLE Partner (
PartnerID INT UNSIGNED NOT NULL PRIMARY KEY,
ProducerType TINYINT UNSIGNED NOT NULL,
FOREIGN KEY (PartnerID, ProducerType)
REFERENCES ArticleProducer(ProducerID, ProducerType),
FOREIGN KEY (ProducerType)
REFERENCES PartnerProducerType(ProducerType) -- restricted to '3'
) ENGINE=InnODB;
But we still have the possibility that neither contains a reference to that given row in ArticleProducer; i.e. we can't make a constraint that forces a row to be created in one of the dependent tables. I don't have a solution for that.
1NF
Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
http://en.wikipedia.org/wiki/First_normal_form#1NF_tables_as_representations_of_relations
The simplest way to solve your problem is to introduce "default" division for each department, which simply means "the whole department". After that simply link all articles to Divisions.
Maybe something like this (DepartmentDivisionNo = 0
means whole department):
I actually like Damir's answer -- it 'rethinks' the question, and produces the correct answer for that new question. However there is a difference between a department and a division -- presumably each division might have access to articles that belong to their department. Having articles that belong to a default or whole department division means there are two different types of divisions. From now on, you'll be making queries such as
select * from xxx x inner join division d where d.joinkey = x.joinkey and d.division != 0.
Instead, I call my solution "don't skimp on the relationships":
Department
----------
DepartmentID (PK) int NOT NULL
DepartmentName varchar(50) NOT NULL
Division
--------
DivisionID (PK) int NOT NULL
DepartmentID (FK) int NOT NULL
DivisonName varchar(50) NOT NULL
Article
-------
ArticleID (PK) int NOT NULL
ArticleName varchar(50) NOT NULL
ArticleBelongsToDepartment
--------------------------
ArticleID (PK) (FK) int NOT NULL
DepartmentID (FK) int NOT NULL
ArticleBelongsToDivision
--------------------------
ArticleID (PK) (FK) int NOT NULL
DivisionID (FK) int NOT NULL
Now, how to enforce some of the constraints that have been brought up? To address that, you can create a an 'article bin', where each article has to belong to a bin, and both departments and divisions have bins.
But that's getting way into the weeds, and you won't be able to solve every case -- Either an article is dependent on a department, or a division, or a bin, or it isn't. Either the department is dependent on a bin or the bin is dependent on the department. Some of these questions are best answered by transactions and stored procedures, perhaps with nightly integrity checks.
Departments and divisions should be stored in the same table. Like this:
DepDiv
----------
ID (PK) int NOT NULL
Name varchar(50) NOT NULL
Type int -- ex.: 1 for department, 2 for division, etc., incase you need to differentiate later
They are such similar elements -- you should ust treat them the same.
After that, no screwy logic re: ranges for id numbers will be necessary. That approach is too unscalable anyway.
Good luck.
re: He defined the schema using an imaginary rule (for lack of a better term), that all DepartmentIDs would be between 1 and 100, and all DivisionIDs would be between 101 and 200.
If that's what he wants to do, he should use another field, like isDepartment yes/no. Then he would have one table for departments and divisions with ID, name, and isDepartment and the ID field would be a FK in the Article table.
That would resolve the overlapping department and division IDs, but not the 1-to-many relationship between departments and divisions. To enforce that relationship, you need two tables.
You could also introduce an AuthorID field in both the department and division tables that has the FK relationship with Article. That could be an auto-generated field. It's a way to normalize the compound key in the division table.
精彩评论