开发者

SQL Best Practices - Ok to rely on auto increment field to sort rows chronologically?

I'm working with a client who wants to add timestamps to a bunch of tables so开发者_开发技巧 that they may sort the records in those tables chronologically. All of the tables also have an auto incrementing integer field as their primary key (id).

The (simple) idea - save the overhead/storage and rely on the primary key to sort fields chronologically. Sure this works, but I'm uncertain whether or not this approach is acceptable in sound database design.

Pros: less storage required per record, simpler VO classes, etc. etc.

Con: it implies a characteristic of that field, an otherwise simple identifer, whose definition does not in any way define or guarantee that it should/will function as such.

Assume for the sake of my question that the DB table definitions are set in stone. Still - is this acceptable in terms of best practices?

Thanks


You asked for "best practices", rather than "not terrible practices" so: no, you should not rely on an autoincremented primary key to establish chronology. One day you're going to introduce a change to the db design and that will break. I've seen it happen.

A datetime column whose default value is GETDATE() has very little overhead (about as much as an integer) and (better still) tells you not just sequence but actual date and time, which often turns out to be priceless. Even maintaining an index on the column is relatively cheap.

These days, I always put a CreateDate column data objects connected to real world events (such as account creation).

Edited to add:

If exact chronology is crucial to your application, you can't rely on either auto-increment or timestamps (since there can always be identical timestamps, no matter how high the resolution). You'll probably have to make something application-specific instead.


Further to egrunin's answer, a change to the persistence or processing logic of these rows may cause rows to be inserted into the database in a non-sequential or nondeterministic manner. You may implement a parallelized file processor that throws a row into the DB as soon as the thread finishes transforming it, which may be before another thread has finished processing a row that occurred earlier in the file. Using an ORM for record persistence may result in a similar behavior; the ORM may just maintain a "bag" (unordered collection) of object graphs awaiting persistence, and grab them at random to persist them to the DB when it's told to "flush" its object buffer.

In either case, trusting the autoincrement column to tell you the order in which records came into the SYSTEM is bad juju. It may or may not be able to tell you the order in which records his the DATABASE; that depends on the DB implementation.


You can acheive the same goal in the short term by sorting on the ID column. This would be better that adding additional data to acheive the same result. I don't think that it would be confusing for anyone to look at the data table and know that it's chronological when they see that it's an identity column.

There are a few drawbacks or limitations that I see however.

  • The chronological sort can be messed up if someone re-seeds the column
  • Chronology for a date period cannot be ascertained without the additional data
  • This setup prevents you from sorting chronologically if the system ever accepts new, non-chronological data

Based on the realistic evaluation of these "limitations" you should be able to advise a proper approach.


Auto-incrementing ID will give you an idea of order as Brad points out, but do it right - if you want to know WHEN something was added, have a datetime column. Then you can not only chronologically sort but also apply filters.


Don't do it. You should never rely on the actual value of your ID column. Treat it like a black box, only useful for doing key lookups.

You say "less storage required per record," but how important is that? How big are the rows we're talking about? If you've got 200-byte rows, another 4 bytes probably isn't going to matter much.

Don't optimize without measuring. Get it working right first, and THEN optimize.


@MadBreaker

There's to separate things, if you need to know the order you create a column order with autoincrement, however if you want to know the date and time it was inserted you use datetime2.

Chronological order can be garanteed if you don't allow updates or deletes, but if you want time control over select you should use datetime2.


You didnt mention if you are running on a single db or clustered. If you are clustered, be wary of increment implementations, as you are not always guaranteed things will come out in the order you would naturally think. For example, Oracle sequences can cache groups of next values (depending on your setup) and give you a 1,3,2,4,5 sort of list...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜