Which is the most common ID type in SQL Server databases, and which is better?
Is it better to use a Guid (UniqueIdentifier) as your Primary/Surrogate Key column, or a serialized "identity" integer column; and 开发者_运维知识库why is it better? Under which circumstances would you choose one over the other?
I personally use INT IDENTITY for most of my primary and clustering keys.
You need to keep apart the primary key which is a logical construct - it uniquely identifies your rows, it has to be unique and stable and NOT NULL. A GUID works well for a primary key, too - since it's guaranteed to be unique. A GUID as your primary key is a good choice if you use SQL Server replication, since in that case, you need an uniquely identifying GUID column anyway.
The clustering key in SQL Server is a physical construct is used for the physical ordering of the data, and is a lot more difficult to get right. Typically, the Queen of Indexing on SQL Server, Kimberly Tripp, also requires a good clustering key to be uniqe, stable, as narrow as possible, and ideally ever-increasing (which a INT IDENTITY is).
See her articles on indexing here:
- GUIDs as PRIMARY KEYs and/or the clustering key
- The Clustered Index Debate Continues...
- Ever-increasing clustering key - the Clustered Index Debate..........again!
A GUID is a really bad choice for a clustering key, since it's wide, totally random, and thus leads to bad index fragmentation and poor performance. Also, the clustering key row(s) is also stored in each and every entry of each and every non-clustered (additional) index, so you really want to keep it small - GUID is 16 byte vs. INT is 4 byte, and with several non-clustered indices and several million rows, this makes a HUGE difference.
In SQL Server, your primary key is by default your clustering key - but it doesn't have to be. You can easily use a GUID as your NON-Clustered primary key, and an INT IDENTITY as your clustering key - it just takes a bit of being aware of it.
Use a GUID in a replicated system where you need to guarantee uniqueness.
Use ints where you have a non-replicated database and you want to maximise performance.
Very Seldomly use GUID.
Use rather a primary key/Surrogate Key for stoage purposes.
Also this will make it easier for human interaction with the data.
Creating Indexes will be a lot more efficient too.
See
- How Using GUIDs in SQL Server Affect Index Performance
- Performance Effects of Using GUIDs as Primary Keys
When considering using integers, be sure to allow for the maximum possible value that might occur. You often end up with skipped numbers because of deletions, so the actual maximum ID might be much larger than the total number of records in the table.
For example, if you aren't sure that a 32-bit integer will do, use a 64-bit integer.
You might also find these other SO discussions useful:
How do you like your primary keys?
What’s the best practice for Primary Keys in tables?
Picking the best primary key + numbering system.
And if you search here in SO for "primary key", you'll find those and a lot more very useful discussions.
There's no single answer to this. The issues that people are quick to jump on with Guid's (that their random nature combined with the default behavior of the primary key also acting as the clustered key) can be easily mitigated. Guids have a larger range than integers do, but as you start to fill that range with values you increase your risk of a collision.
Guid's can be very useful when you have a distributed system (for example, replicated databases) where a non-trivial amount of work would have to go into a key generation mechanism that wouldn't cause collisions between the portions of the system. Likewise, integers are useful because they're simple to use (every language has an integral type, not every language has a Guid type) and can be sequential (Guids can, too, but that's not their intended use).
It's all about what you're storing and how. The people that say "never use Guid's!" are just spreading FUD, but they also aren't the answer to every problem.
I believe it is almost always a serialized identy integer, but some will disagree. It does depend on the situation.
The reasons for identity is efficiency and simplicity. It's smaller. More easily indexed. It makes a great clustered index. Less fragmentation as new records are kept in order. Great for indexes on joins. Easier when eyeballing records in a db.
There is definately a place for Guids in certain circumstances. When merging disparate data, or when records have to be created in certain places. Guids should be in your bag of tricks but usually will not be your first choice.
This is an oft debated topic, but I tend to lean more towards identities for a couple of reasons. Firstly, an integer is only 4 bytes vs a 16 byte GUID. This means narrower indexes and more efficient queries. Secondly, we make use of @@IDENTITY
and SCOPE_IDENTITY
a lot in stored procs, etc which goes out the window with GUIDs.
Here's a nice little article by Jeff Atwood.
Use a GUID if you think you'll ever need to use the data outside the database, i.e. other databases). Some would argue, that is always the case, but it's a judgment call.
精彩评论