What is the best way to enforce a 'subset' relationship with integrity constraints
For example, given 3 tables:
- gastropod
- snail
- slug
and assuming we want to enforce that
- every row in 'gastropod' has exactly one corresponding row in 'snail' or 'slug' (but not both)
- every row in 'slug' has exactly one corresponding row in 'gastropod'
- every row in 'snail' has exactly one corresponding row in 'gastropod'
what is the best way to set up my schema to enforce these constraints?
I've provide one possible answer for postgres, and I am particularly interested in solutions for postgres and Oracle, but would also be interested to see solutions for other RDBMSs
EDIT
For reference, SO questions from from answers/comments below addressing similar problems开发者_运维知识库:- Maintaining subclass integrity in a relational database
- Database design - articles, blog posts, photos, stories
- Same data from different entities in Database - Best Practice - Phone numbers example
My own solution for postgres (but I have no idea if it is the best way):
enum:
create type gastropod_type as enum ('slug', 'snail');
tables and constraints:
create table gastropod(
gastropod_id serial unique,
gastropod_type gastropod_type,
slug_gastropod_id integer,
snail_gastropod_id integer,
average_length numeric,
primary key(gastropod_id, gastropod_type),
check( (case when slug_gastropod_id is null then 0 else 1 end)+
(case when snail_gastropod_id is null then 0 else 1 end)=1) );
create table slug(
gastropod_id integer unique,
gastropod_type gastropod_type check (gastropod_type='slug'),
is_mantle_visible boolean,
primary key(gastropod_id, gastropod_type),
foreign key(gastropod_id, gastropod_type)
references gastropod deferrable initially deferred );
create table snail(
gastropod_id integer unique,
gastropod_type gastropod_type check (gastropod_type='snail'),
average_shell_volume numeric,
primary key(gastropod_id, gastropod_type),
foreign key(gastropod_id, gastropod_type)
references gastropod deferrable initially deferred );
alter table gastropod
add foreign key(slug_gastropod_id, gastropod_type)
references slug deferrable initially deferred;
alter table gastropod
add foreign key(snail_gastropod_id, gastropod_type)
references snail deferrable initially deferred;
test:
insert into gastropod(gastropod_type, slug_gastropod_id, average_length)
values ('slug', currval('gastropod_gastropod_id_seq'), 100);
insert into slug(gastropod_id, gastropod_type, is_mantle_visible)
values (currval('gastropod_gastropod_id_seq'), 'slug', true);
select gastropod_id, gastropod_type, average_length, is_mantle_visible
from gastropod left outer join slug using(gastropod_id, gastropod_type)
left outer join snail using(gastropod_id, gastropod_type);
gastropod_id | gastropod_type | average_length | is_mantle_visible
--------------+----------------+----------------+-------------------
1 | slug | 100 | t
(1 row)
One of the problems with SQL is its poor level of support for integrity constraints, especially referential constraints.
For all practical purposes your problem cannot be solved using SQL constraints unless you disable the constraints when you want to insert a row to a table. The reason is that SQL requires tables to be updated one at a time and so the constraint must be violated whenever new rows are inserted. This is a fundamental limitation of SQL and all the major DBMSs suffer from it.
There are some workarounds but none of them is perfect. You could use DEFERRABLE constraints if your DBMS has them (Oracle for example). A DEFERRABLE constraint is really just an easy way of disabling a constraint. Or you could use triggers, which means the rule is enforced procedurally rather than through a proper database constraint.
Check out this thread: Maintaining subclass integrity in a relational database
The thread provides multiple suggestions for SQL Server implementations, and I would be surprised if the ideas couldn't be applied to Oracle as well.
I know this problem as a supertype/subtype issue. I've written about it several times on SO. In this post, it's presented as a solution to a problem with staff, customers, and suppliers. But this post has the most extended discussion behind the rationale and how the constraints work. It's written in terms of online publications.
This is a case where using a trigger is of value to have complex constraints like this enforced.
"and assuming we want to enforce that (1) every row in 'gastropod' has exactly one corresponding row in 'snail' or 'slug' (but not both) (2) every row in 'slug' has exactly one corresponding row in 'gastropod' (3) every row in 'snail' has exactly one corresponding row in 'gastropod'"
(1) is an inclusion dependency (aka "foreign key dependency") between 'GASTROPOD' and a virtual relvar (aka "view") defined as SLUG UNION SNAIL. (2) and (3) are the same kind of inclusion dependencies between 'SLUG' (/'SNAIL') and 'GASTROPOD'.
All of them taken together mean that you have an "equality dependence" between 'GASTROPOD' and 'SLUG UNION SNAIL' (at least as far as the identifiers are concerned).
Note that to be able update a database that is under such constraints, you are likely to need either a DBMS engine that supports this thing called "Multiple Assignment", or else one that supports "Deferred Constraint Checking".
Chapter 11 of the book "Applied Mathematics for Database Professionals" goes into great depth on the subject of how to enforce such constraints (and in fact, just any constraint, however complex) in SQL environments. The answer to your question is almost the entire contents of that chapter, and I hope you don't expect me to summarize it all here in a few words (the essence of the answer is "triggers" - as StarShip3000 also indicated).
I'd go with
DROP TABLE GASTROPOD PURGE;
DROP TABLE SNAIL PURGE;
CREATE TABLE GASTROPOD
(GASTROPOD_ID NUMBER,
GASTROPOD_TYPE VARCHAR2(5),
SNAIL_ID NUMBER,
SLUG_ID NUMBER,
CONSTRAINT GASTROPOD_PK PRIMARY KEY (GASTROPOD_ID),
CONSTRAINT GASTROPOD_TYPE_CK CHECK (GASTROPOD_TYPE IN ('SLUG','SNAIL')),
CONSTRAINT GASTROPOD_SLUG_CK CHECK
(SNAIL_ID IS NOT NULL OR SLUG_ID IS NOT NULL),
CONSTRAINT GASTROPOD_SNAIL_CK1 CHECK
(GASTROPOD_TYPE = 'SNAIL' OR SLUG_ID IS NULL),
CONSTRAINT GASTROPOD_SLUG_CK1 CHECK
(GASTROPOD_TYPE = 'SLUG' OR SNAIL_ID IS NULL),
CONSTRAINT GASTROPOD_SNAIL_CK2 CHECK (SNAIL_ID = GASTROPOD_ID),
CONSTRAINT GASTROPOD_SLUG_CK2 CHECK (SLUG_ID = GASTROPOD_ID),
CONSTRAINT GASTROPOD_SNAIL_UK UNIQUE (SNAIL_ID),
CONSTRAINT GASTROPOD_SLUG_UK UNIQUE (SLUG_ID)
);
So you check that a gastropod is a snail or slug and either slug_id or snail_id is set. If it is a snail, then slug_id must be null, and for a slug then snail_id must be null. Make sure slug and snail ids are unique (I've added checks to match them to gastropod_id too).
CREATE TABLE SNAIL
(SNAIL_ID NUMBER,
CONSTRAINT SNAIL_PK PRIMARY KEY (SNAIL_ID),
CONSTRAINT SNAIL_FK FOREIGN KEY (SNAIL_ID)
REFERENCES GASTROPOD (SNAIL_ID));
Snails must point to a row in gastropod where snail_id is not null, and it is also the primary key (and therefore unique)
ALTER TABLE GASTROPOD ADD CONSTRAINT SNAIL_GS_FK FOREIGN KEY (SNAIL_ID)
REFERENCES SNAIL (SNAIL_ID) DEFERRABLE INITIALLY DEFERRED;
Gastropods with a snail_id set must also have a corresponding row in snail. I've made this direction deferrable, otherwise you'll never get any new data it.
Foreign key referencing gastropod from slug and snail with a unique index on the foreign key columns enforces rules 2 and 3. Rule 1 is trickier though :-(
The only way I know of to enforce rule 1 is to write some database code that checks snail and slug for the presence of a row.
By the way - how do you intend to insert data? Whatever order you do it in, you will break a rule.
"@Erwin I'd much prefer solutions that do not involve triggers - I have a pathological aversion to them."
Sorry for new answer, not authorised to add a comment to this.
As far as I can see, you might be able to get away with "just using deferred constraints" in your particular case, owing to the nature of the constraint you wish to impose. If it works for you, and you are satisfied, then all is OK, no ?
My main point is, constraints (as in : "any imaginable business rule you might run into as a database designer") can get arbitrarily complex. Think of a genealogy database in which you want to enforce the rule that "no person can be an ancestor of himself, IN WHATEVER DEGREE" (that's my favourite example because it ultimately involves transitive closure and/or recursion). There is NO WAY you can get an SQL DBMS to enforce such rules without using triggers (or without using recursive SQL inside the trigger for that matter too, by the way).
Neither your DBMS nor I nor anyone else skilled in relational theory will care a Freudian shit about whatever pathologies you happen to have. But perhaps because of these pathologies that you mention, it might be interesting to observe that you can do all of the stuff you want, without having to define any triggers, if you use the DBMS I have developed myself (it does support trigger-like things, but you're not required to resort to them for enforcing data integrity).
All these examples have an atrocious level a complexity for something so simple as:
create table gastropod(
average_length numeric
);
create table slug(
like gastropod,
id serial primary key,
is_mantle_visible boolean
);
create table snail(
like gastropod,
id serial primary key,
average_shell_volume numeric
);
\d snail;
Column | Type | Modifiers
----------------------+---------+----------------------------------------------------
average_length | numeric |
id | integer | not null default nextval('snail_id_seq'::regclass)
average_shell_volume | numeric |
Indexes:
"snail_pkey" PRIMARY KEY, btree (id)
Before you say this is not an answer think about the requirements.
- every row in 'gastropod' has exactly one corresponding row in 'snail' or 'slug' (but not both)
- every row in 'slug' has exactly one corresponding row in 'gastropod'
- every row in 'snail' has exactly one corresponding row in 'gastropod'
Having the column in the table is an equivalence of data integrity without all the nonsense.
Note: LIKE
in the DDL can copy all the columns (even constraints and indexes in 9.0) into the new table. So you can sort of fake inheritance.
You have two issues here:
- Presence: there cannot be a parent row without at least one child row.
- Exclusivity: there cannot be a parent row with more than one child row.
On a DBMS that supports deferred constraints (including PostgreSQL and Oracle), both of these goals can be achieved declaratively:
There is a circular foreign key between gastropod.snail_id
and snail.snail_id
, and also between gastropod.slug_id
and slug.slug_id
. There is also a CHECK that ensures exactly one of them matches gastropod.gastropod_id
(and the other is NULL).
To break the chicken-and-egg problem when inserting new data, defer one direction of foreign keys.
Here is how this would be implemented in PostgreSQL:
CREATE TABLE gastropod (
gastropod_id int PRIMARY KEY,
snail_id int UNIQUE,
slug_id int UNIQUE,
CHECK (
(slug_id IS NULL AND snail_id IS NOT NULL AND snail_id = gastropod_id)
OR (snail_id IS NULL AND slug_id IS NOT NULL AND slug_id = gastropod_id)
)
);
CREATE TABLE snail (
snail_id int PRIMARY KEY,
FOREIGN KEY (snail_id) REFERENCES gastropod (snail_id) ON DELETE CASCADE
);
CREATE TABLE slug (
slug_id int PRIMARY KEY,
FOREIGN KEY (slug_id) REFERENCES gastropod (slug_id) ON DELETE CASCADE
);
ALTER TABLE gastropod ADD FOREIGN KEY (snail_id) REFERENCES snail (snail_id)
DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE gastropod ADD FOREIGN KEY (slug_id) REFERENCES slug (slug_id)
DEFERRABLE INITIALLY DEFERRED;
New data is inserted like this:
START TRANSACTION;
INSERT INTO gastropod (gastropod_id, snail_id) VALUES (1, 1);
INSERT INTO snail (snail_id) VALUES (1);
COMMIT;
However, attempting to insert only parent but not child fails:
START TRANSACTION;
INSERT INTO gastropod (gastropod_id, snail_id) VALUES (2, 2);
COMMIT; -- FK violation.
Inserting the wrong kind of child fails:
START TRANSACTION;
INSERT INTO gastropod (gastropod_id, snail_id) VALUES (2, 2);
INSERT INTO slug (slug_id) VALUES (2); -- FK violation.
COMMIT;
And inserting setting too few, too many, or mismatched fields in the parent also fails:
INSERT INTO gastropod (gastropod_id) VALUES (2); -- CHECK violation.
...
INSERT INTO gastropod (gastropod_id, snail_id, slug_id) VALUES (2, 2, 2); -- CHECK violation.
...
INSERT INTO gastropod (gastropod_id, snail_id) VALUES (1, 2); -- CHECK violation.
On a DBMS that doesn't support deferred constraints, exclusivity (but not presence) can be declaratively enforced like this:
Under a DBMS that supports calculated fields (such as Oracle 11 virtual columns), the type discriminator type
doesn't need to be physically stored at the level of the child tables (only the parent table).
The unique constraint U1
may be necessary on DBMSes that don't support FK referencing super-set of key (pretty much all of them, as far as I know), so we make this super-set artificially.
Whether all this should actually be done in practice is another matter. This is one of these situations where enforcing some aspects of data integrity at the application level may be justified by the reduction of overhead and complexity.
Ideally, I would make a single table "gastropod" with a "type" field, and then have views "gastropod" (selecting all fields except "type", with no "where" clause), "snail" (using a "where" clause to limit to type snail), and "slug" (using a "where" clause to limit to type slug). Exceptions may exist if one of the two types is much smaller and there are many fields relevant to only the smaller type, but for the most part making it different views from a single table will ensure the proper integrity constraints.
精彩评论