How does the Data Type of an SQL table's PK impact query performance?
How does the Data Type of an SQL table's PK impact query performance?
Specifically, I am interested in:
Wh开发者_开发知识库at is the difference between string datatypes (e.g.
nvarchar(n)
,varchar(n)
) and numeric datatypes (int
,bigint
,uniqueidentifier
)?What is the difference between the different string data types?
How does the maximum length of a string data type affect performance? Is there a specific
varchar
ornvarchar
length at which the performance sharply declines?What is the difference between the different numeric data types?
How do these variations impact:
Equality comparison of Primary Keys?
Joins on Primary Keys ?
Updates by Primary Key ?
Complex value comparisons by Primary Key (e.g. with
LIKE
on avarchar
or<=
on anint
)?
If there is a significant disparity between the different options, then, What measures can be taken to optimize performance with the slower data types?
How does a composite PK compare to the other options?
Update: To be clear, I understand this is a long question and I am not asking to be spoon-fed all this information. An answer that provides links to reliable online resources where I can find this information is completely sufficient.
Update 2:
I am using SQL Server Express 2008.
I don't have any hard numbers - but from experience and from everything I have learned over the years, I would say:
try to use a fixed-length key -
INT
,BIGINT
,CHAR(x)
(for x <= 6 characters) - those tend to be easier to deal with, and give SQL Server less overhead to work with. Avoid largerVARCHAR
valuessince SQL Server has a limitation on 900 bytes for each index entry - don't even try to use a
VARCHAR(MAX)
or something outrageous like that.....since the primary key in SQL Server is by default your clustering key, all those rules for the clustering key will apply. A good clustering key is:
- narrow (4-8 bytes are perfect)
- static (never or hardly ever changes)
- unique (otherwise SQL Server will have to add a 4-byte uniqueifier .....)
- ever-increasing (i.e.
INT IDENTITY
is perfect) to reduce the index and page fragmentation due to page splits in your index structures
By far the best, most authoritative and most exhaustive resource on SQL Server indexing (and what kind of things to do and what to avoid) would be Kimberly Tripp's blog, especially her Indexes category. Great stuff !
The "more narrow" the data type is, meaning the smaller the amount of bytes the data type takes, the better the performance will be.
For example, INT generally takes 4 bytes. VARCHAR(4) does too on most databases, but VARCHAR(5+) uses more bytes than INT.. and vice versa for VARCHAR(less than 4). To re-iterate: INT and VARCHAR(4) are [roughly] equivalent, but VARCHAR(less than 4) would be less (therefore "faster") and VARCHAR(5+) would be more (therefore "slower") than using INT.
Honestly, I'm not going to address differences between data types because
- The database isn't defined -- they aren't all the same
- The data is available online
I will assume that by "primary key" you are referring to the clustered index on the table, since by default they are the same thing in SQL Server.
The size of the clustered index is important, because all other indexes will use the clustered index to refer to individual rows within the table. Therefore, a large clustered index will cause all other indexes to be large. Large indexes can harm performance, because there are fewer rows in each page and more pages get swapped in an out of the working set.
Therefore, if given a choice you should use a smaller rather than a larger column or set of columns for the primary key.
What is the difference between the different string data types?
nvarchar
can contain strings of various widths. nchar
contains strings of a constant, pre-defined width. (There are also varchar
and char
data types which are included for backwards-compatability, but they should be avoided, since they require converting data to and from legacy character encodings whenever they are written or read.)
I highly recomend reading the SQL Server documentation on data types for the answers to your other questions.
精彩评论