What is the best schema design for child collection with a "primary" entity
Here is the scenario: You have a Persons table with a one-to-many relationship with an Addresses table, where one of the Address rows is the "primary" Address.
Is it better in a normalized schema to
- Use a Persons.PrimaryAddressID to access the "primary" Address for a Person
or
- Use an Addresses.IsPrimary bit column to reference the "primar开发者_开发问答y" Address for a Person via Addresses.PersonID
or
- Other
and why?
It depends on whether the person-to-address relationship is one-to-zero-plus or one-to-one-plus.
If a person is required to have a primary address, I would put it in the Persons
table itself (since it's a required attribute).
In the other hand, if a person can exist in your schema without an address, I would leave all addresses in the Addresses
table as equal and use an attribute of the Persons
table to select the primary (either NULL or a pointer to the relevant Addresses
row).
If you store the primality of an address in the Addresses
table, what do you do when two addresses for Bob Smith both claim to be the primary? You could stop that with triggers but it's far more efficient to design your schema properly.
And, if two room-mates share the same address, but one lives there all the time and the other spends most of his time shacked up with his girlfriend, what happens then? If the primality is in the Addresses table, you won't be able to share address rows between persons.
What I'm trying to get across is that you need to allocate your attributes to the right objects. A person's primary address belongs to a person, not an address.
For maximum efficiency and flexibility, I would have the following schema:
Persons:
Id primary key
PrimaryAddressId
OtherStuff
Addresses:
Id primary key
OtherStuff
PersonAddresses:
Id primary key
PersonId foreign key on Persons(Id)
AddressId foreign key on Addresses(Id)
You have the minor data integrity problem that Persons.PrimaryAddressId
may be a hanging pointer. You can't make it a foreign key to one of the primary keys since you want it to allow NULL
. That means you'll have to cater for the possibility that it might point to a non-existent Addresses.Id
.
I would simply fix that as a before-delete trigger on Addresses
so that the relevant Persons
rows are updated (setting PrimaryAddressid
to NULL).
Or you could be tricky and have one address of "Unknown" in the Addresses
table so that every row in Persons
has at least one address (those whose primary address is unknown automatically get their PrimaryAddressid
set to the "Unknown" address row.
Then you could make it a proper constrained relationship and simplify your SQL somewhat. Pragmatism often beats dogmatism in the real world :-)
I would go for "Use a Persons.PrimaryAddressID to access the "primary" Address for a Person". Primary Address has a meaning only when Person is linked to Addresses. So it should belong to Person. Think about following scenarios where 2nd approach fails.
a) Address is used with another entity without referenced to person where Addresses.IsPrimary is meaningless.
b) Same address is used by two persons where 1st uses as primary while 2nd not.
If you want the constraint to be that one person has at most one primary address, a Persons.PrimaryAddressID
is clearly simpler -- indeed, it's enforced by the very schema. It's also easy to ensure exactly one primary address per person (just make that column be not null), and if you need to, even say that no two people can share a primary address (just make that column be unique).
Of course, exactly because this approach excels at enforcing such simple constraints, it's bad when you don't want these constraints -- for example, if you want to make it possible for a person to have more than one "primary" address, the approach in question would not work.
Incidentally, I would not consider a one-Person/many-addresses relationship to be particularly good unless you want to enforce the fact that no two people can share the same address: in general, in a normalizing mood, I'd rather have a table of people, one of addresses, and one for the relationship (which in most contexts would naturally be many-to-many, since many people can, and in real life do, share the same address).
If you choose to go this route, then, especially if you need high flexibility (multiple primary addresses &c), having the relationship table carry the "primality" boolean would be an attractive choice (it still makes it not too hard to enforce some of the above-mentioned constraints, though other constraints, such as "an address belongs to at least one person" or vice versa, can be tricky to express simply).
Lesson to retain: express exactly what constraints your schema needs to express simply, and the right schema for that purpose often emerges quite clearly. If the constraints of interest are a mystery, so will be the answer to the question "what's the right schema";-).
精彩评论