How to choose my primary key?
I found this reading material on choosing a primary key.
- Is there a guide / blog post on how to choose t开发者_如何转开发he primary key for a given table?
- Should I use a auto-incremented/generated key, or should I base the primary key on the data being modeled (assuming it has a truly unique field)?
- Should the primary key always be long for performance's sake, or can I take an external unique id as primary key, even if it's a string?
I believe that in practice using a natural key is rarely better than a surrogate key.
The following are the main disadvantages of using a natural key as the primary key:
You might have an incorrect key value, or you may simply want to rename a key value. To edit it, you would have to update all the tables that would be using it as a foreign key.
It is often difficult to have a truly unique natural key.
Natural keys are often strings. An index on an numeric field will be much more compact than one on a string field.
There is no hard rule on what the data type of the primary key should be. A numeric key normally performs better, but you could use a string, especially if the table is not big, and the tables that reference it are not big either.
A key is a set of attributes with two fundamental features: uniqueness and minimality. Minimality means the key has only the minimum number of attributes required to ensure uniqueness.
There are three criteria commonly applied as a guide to choosing a good key:
- Familiarity - keys should be meaningful and familiar to the people who use them
- Simplicity - keys should be as simple and concise as possible
- Stability - key values should change infrequently
These are good guidelines but are not absolute requirements. In all cases functional requirements and the needs of data integrity should determine what keys to use.
I use surrogate keys, often referred to as non-sensical keys, made up of an autogenerated int/bigint datatype.
Here are some of the reasons I like using these keys.
- When deleting several items from a list (such as old email) you can supply a comma separated list of integers instead of guids or natural keys
- I find it makes writing your own cascade deletes easier
- I think inner-joins are faster on integer fields
- It can make learning a new system without documentation easier to understand.
Here are a couple of blog posts about primary keys:
http://www.mysqlperformanceblog.com/2006/10/03/long-primary-key-for-innodb-tables/
http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/
I have worked with a lot of different data models in professional systems (mostly bank software) and there were different solutions. There was the GUID solution I have seen and it seemed not to have impacted the performances too much. I have seen the "number provided by a service as a system wide unique number". I have seen algorithms of providing something like a GUID "but shorter". I have seen also that the business key was used (like the account number) which is poor design and caused problems and I would not recommend it. I have seen the auto-incremented key for each table.
What did I like the most? The number provided by a service as a system wide number. It works well. And with a simple key translation table one can use a user key (like an account number) to find out what unique number and what sort of data object (not necessarily the table because the same unique key may apply to several tables if a data object is split up on different tables depending on its type).
So is there a blog or something? Well I have a book to recommend called "Data Modeling Essentials" by Graeme Simsion and Graham Witt. They might not suggest my preferred solution but they give many real live examples and show the different kind of solutions that are possible.
I always choose uuid as a primary key. In comparison to int/long key, there is a slight overhead, but there are a lot of benefits: you cannot run into type overflow, you can shard database later on without changing primary keys, you can integrate with other systems and be sure that your primary keys are always unique, uuid cannot be guessed etc.
精彩评论