Modeling many-to-one with constraints?
I'm attempting to create a database model for movie classifications, where each movie could have a single classification from each of one of multiple rating systems (e.g. BBFC, MPAA). This is the current design, with all implied PKs and FKs:
TABLE Movie
(
MovieId INT -- PK
)
TABLE ClassificationSystem
(
ClassificationSystemId TINYINT -- PK
)
TABLE Classification
(
ClassificationId INT, -- PK
ClassificationSystemId TINYINT -- FK
)
TABLE MovieClassification
(
MovieId INT, -- composite PK, FK
ClassificationId INT, -- composite PK, FK
Advice NVARCHAR(250) -- description of why the classification was given
)
The problem is with the MovieClassification
table whose constraints would allow multiple classifications from the same system, whereas it should ideally only permit either zero or one classifications from a given system.
Is there any reasonable way to restructure this so that a movie having exactly zero or one classifications from any given system is enforced by database constraints, given the following requirements?
- Do not duplicate information that could be looked up (i.e. duplicating
ClassificationSystemId
in theMovieClassification
table is not a good solution because this could get out of sync with the value in theClassification
table) - Remain extensible to multiple classification systems开发者_运维百科 (i.e. a new classification system does not require any changes to the table structure)?
Note also the Advice
column - each mapping of a movie to a classification needs to have a textual description of why that classification was given to that movie. Any design would need to support this.
You can enforce that with a check constraint that calls a user defined function. For example:
create function dbo.ClassificationSystemCheck()
returns int
as begin
return (select max(cnt)
from (
select count(*) as cnt
from MovieClassification mc
left join Classification c
on c.ClassificationId = mc.ClassificationId
group by mc.MovieId, c.ClassificationSystemId
) qry)
end
go
alter table MovieClassification
add constraint chk_MovieClassification
check (dbo.ClassificationSystemCheck() <= 1)
go
alter table Classification
add constraint chk_Classification
check (dbo.ClassificationSystemCheck() <= 1)
go
insert into Classification select 1,1
insert into MovieClassification select 1,1
insert into MovieClassification select 1,1 -- Boom!
This can be inefficient as the number of classifications grows. Alternatively, you could eliminate the Classification table and move the ClassificationSystemId to the MovieClassification table.
What if you remove the classification system id from the classification table and only keep it on the movie classification?
TABLE Movie
(
MovieId INT
)
TABLE ClassificationSystem
(
ClassificationSystemId TINYINT
)
TABLE Classification
(
ClassificationId INT,
)
TABLE MovieClassification
(
MovieId INT,
ClassificationId INT,
ClassificationSystemId TINYINT,
Advice NVARCHAR(250) -- description of why the classification was given
)
but you buy another problem, that a classification could be used ouside its intended system.
How many distinct classifications of the same classification system can be attributed to one single movie under your design ?
Does that match your intended concept of "classification" ?
From what you're saying, ClassificationSystemId
is part of the key for a MovieClassification
, since there can be only one (or zero) MovieClassification
for a given system for a given movie.
Now, there are three cases where the Classification
table can change:
- You add a new classification to a system.
- You remove an existing classification from a system.
- You change metadata for a classification (none shown in your posted schema).
In the first case, an example would be adding a new genre to an existing system of genres. It makes sense that you need to reclassify movies which belong to the new genre, so the model holds.
In the second case, an example would be removing a genre from an existing system. It still makes sense that you need to reclassify movies which belonged to the old genre, so the model still holds.
In the third case, you would change for instance the name of a genre. It makes sense that movies already classified as that genre gets their genre name changed. The model still holds.
From what I can understand, the correct normalization is to put ClassificationSystemId
in MovieClassification
and make it part of the MovieClassification
key (and make ClassificationSystemId
part of the key for Classification
rows in the provided schema):
-- Tables Movie, ClassificationSystem not included for brevity
CREATE TABLE Classification
(
ClassificationId INT,
ClassificationSystemId INT,
PRIMARY KEY(ClassificationId, ClassificationSystemId),
FOREIGN KEY(ClassificationSystemId) REFERENCES ClassificationSystem(ClassificationSystemId)
);
CREATE TABLE MovieClassification
(
ClassificationId INT,
ClassificationSystemId INT,
MovieId INT,
Advice NVARCHAR(MAX),
PRIMARY KEY(ClassificationId, ClassificationSystemId, MovieId),
FOREIGN KEY(ClassificationId, ClassificationSystemId) REFERENCES Classification(ClassificationId, ClassificationSystemId),
FOREIGN KEY(MovieId) REFERENCES Movie(MovieId),
UNIQUE(ClassificationSystemId, MovieId)
);
- add a unique constraint on Classification (ClassificationId,ClassificationSystemId)
- add a FK referring to it from MovieClassification table
- add a unique constraint on MovieClassification (MovieId,ClassificationSystemId)
OK. I had hoped my questions would trigger some thinking, but my point seems to have been missed.
Your classification table needs to be {movieID, classificationScheme classification}, with key {movieID classificationScheme}.
It can reference Movie via {movieID}, and it can reference a classifications table via {classificationScheme classification}.
This classifications table lists/enumerates/names all the valid classifications of each scheme. Since a classification scheme only exists, and makes sense, from the moment it has at least one classification, there may not be a real need for a fourth table whose sole purpose it is to list/name/enumerate all the relevant classification schemes.
Honestly, i'd just change the data model around a little bit.
create table #Movies (PK_moID int identity(1,1), moName varchar(50) primary key(PK_moID))
create table #ClassificationSystem (PK_csID int identity(1,1), csName varchar(50) primary key(PKcsID))
create table #Classification (PK_clID int identity(1,1), FK_csID int)
create table #MovieClassification (FK_moID int, FK_csID int, FK_clID int primary key (FK_moID, FK_csID))
now, with your movie classification you have a composite pk of the movie and the system, so you can only get one rating of a movie per system (even if you add new systems). you can also create a relationship from movie classification to the classificaiton table to enfore the data.
精彩评论