Using "varchar" as the primary key? bad idea? or ok?
Is it really that bad to use "varchar" as the primary key?
(will be storing user documents, and yes it can exceed 2+ billion documents)
It totally depends on the data. There are plenty of perfectly legitimate cases where you might use a VARCHAR
primary key, but if there's even the most remote chance that someone might want to update the column in question at some point in the future, don't use it as a key.
If you are going to be joining to other tables, a varchar, particularly a wide varchar, can be slower than an int.
Additionally if you have many child records and the varchar is something subject to change, cascade updates can causes blocking and delays for all users. A varchar like a car VIN number that will rarely if ever change is fine. A varchar like a name that will change can be a nightmare waiting to happen. PKs should be stable if at all possible.
Next many possible varchar Pks are not really unique and sometimes they appear to be unique (like phone numbers) but can be reused (you give up the number, the phone company reassigns it) and then child records could be attached to the wrong place. So be sure you really have a unique unchanging value before using.
If you do decide to use a surrogate key, then make a unique index for the varchar field. This gets you the benefits of the faster joins and fewer records to update if something changes but maintains the uniquess that you want.
Now if you have no child tables and probaly never will, most of this is moot and adding an integer pk is just a waste of time and space.
I realize I'm a bit late to the party here, but thought it would be helpful to elaborate a bit on previous answers.
It is not always bad to use a VARCHAR() as a primary key, but it almost always is. So far, I have not encountered a time when I couldn't come up with a better fixed size primary key field.
VARCHAR requires more processing than an integer (INT) or a short fixed length char (CHAR) field does.
In addition to storing extra bytes which indicate the "actual" length of the data stored in this field for each record, the database engine must do extra work to calculate the position (in memory) of the starting and ending bytes of the field before each read.
Foreign keys must also use the same data type as the primary key of the referenced parent table, so processing further compounds when joining tables for output.
With a small amount of data, this additional processing is not likely to be noticeable, but as a database grows you will begin to see degradation.
You said you are using a GUID as your key, so you know ahead of time that the column has a fixed length. This is a good time to use a fixed length CHAR(36) field, which incurs far less processing overhead.
I think int or bigint is often better.
- int can be compared with less CPU instructions (join querys...)
- int sequence is ordered by default -> balanced index tree -> no reorganisation if you use an PK as clustered index
- index need potentially less space
Use an ID (this will become handy if you want to show only 50 etc...). Than set a constraint UNIQUE on your varchar with the file-names (I assume, that is what you are storing).
This will do the trick and will increase speed.
精彩评论