Database Design for many-to-many revisions?
My question is related to Database Design for Revisions? but more complicated.
I have two tables with many-to-many relations between them. Let's call them A and B.
A: A.id, A.data, A.last_change_time
B: B.id, B.data
A_B: A_B.A_id, A_B.B_id, A_B.connected_since_开发者_开发技巧time, A_B.some_status_enum
The A_B connector table must support revisions. If connection between element in A to element in B is removed historical information that it existed before must be kept. If connection status changes historic info on old status must be kept.
Some of the reports that I need to generate are:
- List of all elements in A that are not connected to B at all (now).
- List of all elements in A that are connected to at at least one element in B with certain status.
- List of all A-B pairs that were once connected but no longer connected.
- List of all A-B connections where A's last change time is after the connection time.
- Count of all A-B connections where A's last change time is after the connection time grouped by status.
I considered simply adding A_B.is_current boolean field to the connecting table. When connection is deleted I simply set is_current to false. When status changes I set is_current to false for old records and add a new record with new status.
Answers to similar previously asked questions often claim that "is_current" is bad design and there should be a better solution. Previous solutions talked about revision of record in single table and not relations between them.
Is it wrong to use is_current column to track the history of many-to-many connections? If it is wrong what problems it may cause? What is the better solution?
Typically...
- have 2 extra columns:
ValidFromDateTime
andValidToDateTime
- primary key is then
A_id, B_id, ValidFromDateTime
- ValidToDateTime can be NULL or a sentinel value (eg
99991231
)
When a new revision happens, you populate ValidToDateTime
of the "current row" and add a new row with the same datetime value for ValidFromDateTime. Or just set ValidToDateTime if this is your business rule.
This gives you state etc (or lack of) for any point in time
Your queries are then quite simple
- NOT EXISTS on A_ID
- EXISTS on A_ID
- EXISTS on A_ID with ValidToDateTime NOT NULL (no current row)
- EXISTS on A_ID also comparing ValidToDateTime and last_change_time
- LEFT JOIN (same conditions as 4) with aggregate
The main issue I see with using an is_current
flag is that it's impossible to tell which records are the most recent once they become not current. I would typically add a date field that denotes the start date of when the it is current. Then your query would normally just grab the row with the latest date.
精彩评论