开发者

Where should I break up my user records to keep track of revisions

I am putting together a staff database and I need to be 开发者_JAVA百科able to revise the staff member information, but also keep track of all the revisions. How should I structure the database so that I can have multiple revisions of the same user data but be able to query against the most recent revision? I am looking at information that changes rarely, like Last Name, but that I will need to be able to query for out of date values. So if Jenny Smith changes her name to Jenny James I need to be able to find the user's current information when I search against her old name.

I assume that I will need at least 2 tables, one that contains the uid and another that contains the revisions. Then I would join them and query against the most recent revision. But should I break it out even further, depending on how often the data changes or the type of data? I am looking at about 40 fields per record and only one or two fields will probably change per update. Also I cannot remove any data from the database, I need to be able to look back on all previous records.


A simple way of doing this is to add a deleted flag and instead of updating records you set the deleted flag on the existing record and insert a new record.

You can of course also write the existing record to an archive table, if you prefer. But if changes are infrequent and the table is not big I would not bother.

To get the active record, query with 'where deleted = 0', the speed impact will be minimal when there is an index on this field.

Typically this is augmented with some other fields like a revision number, when the record was last updated, and who updated it. The revision number is very useful to get the previous versions and also to do optimistic locking. The 'who updated this last and when' questions usually come once the system is running instead of during requirements gathering, and are useful fields to put in any table containing 'master' data.


I would use the separate table because then you can have a unique identifier that points to all the other child records that is also the PK of the table which I think makes it less likely you will have data integrity issues. For instance, you have Mary Jones who has records in the address table and the email table and performance evaluation table, etc. If you add a change record to the main table, how are you going to relink all the existing information? With a separate history table, it isn't a problem.

With a deleted field in one table, you then have to have an non-autogenerated person id and an autogenrated recordid.

You also have the possiblity of people forgetting to use the where deleted = 0 where clause that is needed for almost every query. (If you do use the deleted flag field, do yourself a favor and set a view with the where deleted = 0 and require developers to use the view in queries not the orginal table.)

With the deleted flag field you will also need a trigger to ensure one and only one record is marked as active.


@Peter Tillemans' suggestion is a common way to accomplish what you're asking for. But I don't like it.

The structure of a database should reflect the real-world facts that are being modeled.

I would create a separate table for obsolete_employee, and just store the historical information that would need to be searched in the future. This way you can keep your real employee data table clean and keep only the old data that is necessary. This approach will also simplify reporting and other features of the application that are not related to searching historical data.

Just think of that warm feeling you'll get when you type select * from employee and nothing but current, correct goodness comes flowing back!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜