开发者

Database Design for many-to-many revisions?

My question is related to Database Design for Revisions? but more complicated.

I have two tables with many-to-many relations between them. Let's call them A and B.

A: A.id, A.data, A.last_change_time
B: B.id, B.data
A_B: A_B.A_id, A_B.B_id, A_B.connected_since_开发者_开发技巧time, A_B.some_status_enum

The A_B connector table must support revisions. If connection between element in A to element in B is removed historical information that it existed before must be kept. If connection status changes historic info on old status must be kept.

Some of the reports that I need to generate are:

  1. List of all elements in A that are not connected to B at all (now).
  2. List of all elements in A that are connected to at at least one element in B with certain status.
  3. List of all A-B pairs that were once connected but no longer connected.
  4. List of all A-B connections where A's last change time is after the connection time.
  5. Count of all A-B connections where A's last change time is after the connection time grouped by status.

I considered simply adding A_B.is_current boolean field to the connecting table. When connection is deleted I simply set is_current to false. When status changes I set is_current to false for old records and add a new record with new status.

Answers to similar previously asked questions often claim that "is_current" is bad design and there should be a better solution. Previous solutions talked about revision of record in single table and not relations between them.

Is it wrong to use is_current column to track the history of many-to-many connections? If it is wrong what problems it may cause? What is the better solution?


Typically...

  • have 2 extra columns: ValidFromDateTime and ValidToDateTime
  • primary key is then A_id, B_id, ValidFromDateTime
  • ValidToDateTime can be NULL or a sentinel value (eg 99991231)

When a new revision happens, you populate ValidToDateTime of the "current row" and add a new row with the same datetime value for ValidFromDateTime. Or just set ValidToDateTime if this is your business rule.

This gives you state etc (or lack of) for any point in time

Your queries are then quite simple

  1. NOT EXISTS on A_ID
  2. EXISTS on A_ID
  3. EXISTS on A_ID with ValidToDateTime NOT NULL (no current row)
  4. EXISTS on A_ID also comparing ValidToDateTime and last_change_time
  5. LEFT JOIN (same conditions as 4) with aggregate


The main issue I see with using an is_current flag is that it's impossible to tell which records are the most recent once they become not current. I would typically add a date field that denotes the start date of when the it is current. Then your query would normally just grab the row with the latest date.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜