When is referential integrity not appropriate?
I understand the need to have referential integrity for limiting specific values on entry or possibly preventing them from removal upon a request of deletion. However, I am unclear as to a valid use case which would exclude this mechanism from always being used.
I guess this would fall into several sub-questions:
- When is referential integrity not appropriate?
- Is it appropriate to have fields 开发者_开发知识库containing multiple and/or possibly incomplete subsets of a foreign key's list?
- Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
Thoughts?
When is referential integrity not appropriate?
Referential intergrity if typically not used on Data Warehouses where the data is a read only copy of a transactional datbase. Another example of when you'd not need RI is when you want to log information which includes row ids; maintaining referential integrity for a read-only log table is a waste of database overhead.
Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
Sometimes you care more about capturing data than data quality. Imagine you are aggregating a large amount of data from disparate systems which each in their own right suffer from data quality issues. Sometimes you are after the greater good of data quality and having everything in one place even with broken keys etc. represents a starting point for moving towards true data quality. It's not ideal, but it does happen as the beenfits could outweigh the tradeoffs.
Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
Everything about systems development is centered around information security, and a key element of that is data integrity. The database structure should lean towards enforcing these things when possible, however you often are not dealing with modern database systems. Sometimes your data source is an old school AS400 with long-antiquated apps. Sometimes you have to build a data and business layer which provide for data integrity.
Just my thoughts.
The only case I have heard of is if you are going to load a vast amount of data into your database; in that case, it may make sense to turn referential integrity off, as long as you know for certain that the data is valid. Once your loading/migration is complete, referential integrity should be turned back on.
There are arguments about putting data validation rules in programming code vs. the database, and I think it depends on the use cases of your software. If a single application is the only path to the database, you could put validation into the program itself and probably be alright. But if several different programs are using the database at the same time (e.g. your application and your friend's application), you'll want business rules in the database so that your data is always valid.
By 'validation rules', I am talking about rules such as 'items in cart > 0'. You may or may not want validation rules. But I think that primary/foreign keys are always important (or you could find later on that you wish you had them). I think they are required if you want to do replication at some point.
When is referential integrity not appropriate?
Sometimes when you are copying lots of records in bulk, or restoring data from some sort of backup, it is convenient to temporarily turn off the constraints of referential integrity.
Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?
Duplicating data in this way goes against the concept of normalization. There are are advantages and disadvantages to this approach.
Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)
I would consider it a schema design decision. Think about the best way to model your problem in relational terms. Use the database in the way it was intended.
Referential integrity would always be appropriate if it didn't come at the cost of performance, scalability, and/or other features.
In some applications, referential integrity may be traded for something more important than the quality of the data.
- Never, though a few people in the NoSQL, the multi-value, and oo-db realms will feel differently. Don't listen to them, they're wrong.
- Yes. For example, if a vehicle is identified uniquely as (lotid,vin) then lotid is a foreign key to the lot table. If you want to find all pictures for a lot you can join the vehicle_pictures table right to the lot table, by using a subset of the vehicle_pictures key (lotid in (lotid,vin)). Or, am I not understanding you?
- Schema, interface comes second. If the schema is bad, having a nice interface is not a long term goal.
精彩评论