开发者

Should I use integer primary IDs?

For example, I always generate an auto-increment field for the users table, but I also specify a UNIQUE index on their usernames. There are situations that I first need to get the userId for a given username and then execute the desired query, or use a JOIN in the de开发者_C百科sired query. It's 2 trips to the database or a JOIN vs. a varchar index.

Should I use integer primary IDs?

Is there a real performance benefit on INT over small VARCHAR indexes?


There are several advantages of having a surrogate primary key, including:

When you have a foreign key in another table, if it is an integer it takes up only a few bytes extra space and can be joined quickly. If you use the username as the primary key it will have to be stored in both tables - taking up more space and it takes longer to compare when you need to join.

If a user wishes to change their username, you will have big problems if you have used it as a primary key. While it is possible to update a primary key, it is very unwise to do so and can cause all sorts of problems as this key might have been sent out to all sorts of other systems, used in links, saved in backups, logs that have been archived, etc. You can't easily update all these places.


It's not just about performance. You should never key on a meaningful value, for reasons that are well documented elsewhere.

By the way, I often scale the type of int to the size of the table. When I know that a table will not exceed 255 rows, I use a tinyint key, and the same for smallint.


In addition to what others have said, you need to think about the clustering of the table.

In SQL Server for instance (and possibly other vendors), if the primary key is also used as the clustered index of the table (which is quote common), an incrementing integer benefits over other field types. This is because new rows are entered with a primary key that is always greater than the previous rows, meaning that the new row can be stored at the end of the table instead of in the middle (this same scenario can be created with other field types for the primary key, but an integer type lends itself better).

Compare this with a guid primary key - new rows have to be inserted into the middle of the table because guids are non-sequential, making inserts very inefficient.


First, as is obvious, on small tables, it will make no difference with respect to performance. Only on very large tables (how large depends on numerous factors), can it make a difference for a handful of reasons:

  1. Using a 32-bit will only consume 4 bytes of space. Presumably, your usernames will be longer than four non-Unicode characters and thus consume more than 4 bytes of space. The more space used, the few pieces of data fit on a page, the fatter the index and the more IO you incur.

  2. Your character columns are going to require the use of varchar over char unless you force everyone to have usernames of identical size. This too will have a tiny performance and storage impact.

  3. Unless you are using a binary sort collation, the system has to do relatively sophisticaed matching when comparing two strings. Do the two columns use the same colllation? For each character, are they cased the same? What are the casing and accent rules in terms of matching? and so on. While this can be done quickly, it is more work which, in a very large tables, can make a difference in comparison to matching on an integer.

I'm not sure why you would ever have to do two trips to the database or join on a varchar column. Why couldn't you do one trip to the database (where creation returns your new PK) where you join to the users table on the integer PK?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜