INSERTs with sequential GUID key on clustered index not significantly faster
In SQL Server 2008 I have tried to reproduce the results from the experiments on clustered index on sequential vs. non-sequential GUID keys seen here http://sqlblog.com/blogs/denis_gobo/archive/2009/02/05/11743.aspx but I do not experience the significant speedup for insertions that I would expect (and the author experiences). The page utilization is clearly improved with the sequential GUID, but for some reasons, inserting 10,000 rows is only around 100 ms faster (out of 10,300 ms).
I use the following code:
CREATE TABLE TestGuid1 (Id UNIQUEIDENTIFIER not null DEFAULT newid(),
SomeDate DATETIME, batchNumber BIGINT)
CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER not null DEFAULT newsequentialid(),
SomeDate DATETIME, batchNumber BIGINT)
CREATE CLUSTERED INDEX ix_id1 ON TestGuid1(id)
CREATE CLUSTERED INDEX ix_id2 ON TestGuid2(id)
go
SET NOCOUNT ON
INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),3)
go 10000
SET NOCOUNT ON
INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),3)
go 10000
DBCC showcontig ('TestGuid1') WITH tableresults
DBCC showcontig ('TestGuid2') WITH tableresults
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid1
GROUP BY batchNumber
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid2
GROUP BY batchNumber
Can anyone explain why I do not experience a more significant speedup for insertions on TestGuid2?
Follow-up: As requested in the thread below, I have expanded the test: the test results tend to vary significantly over time, so now the experiments are repeated N times, and the total and average time usage reported. I have also added a third test, namely for primary keys on sequential integer columns. This should be the fastest and most compact of all three methods as the integer type is smaller and IDENTITY(1,1) is (or at least should be) fast. At least by my intuition. The average execution time is now to the benefit of the sequential GUID, but surprisingly insertions in the third experiment (with sequential integer keys) is slower than sequential GUIDs. I have no explanation of this. Here is the code for the new experiments:
SET NOCOUNT ON
CREATE TABLE TestGuid1 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))
CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))
CREATE TABLE TestInt (Id Int NOT NULL identity(1,1) PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))
DECLARE @BatchCounter INT = 1
DECLARE @Numrows INT = 100000
WHILE (@BatchCo开发者_JAVA技巧unter <= 20)
BEGIN
BEGIN TRAN
DECLARE @LocalCounter INT = 0
WHILE (@LocalCounter <= @NumRows)
BEGIN
INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
SET @LocalCounter +=1
END
SET @LocalCounter = 0
WHILE (@LocalCounter <= @NumRows)
BEGIN
INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
SET @LocalCounter +=1
END
SET @LocalCounter = 0
WHILE (@LocalCounter <= @NumRows)
BEGIN
INSERT TestInt (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
SET @LocalCounter +=1
END
SET @BatchCounter +=1
COMMIT
END
DBCC showcontig ('TestGuid1') WITH tableresults
DBCC showcontig ('TestGuid2') WITH tableresults
DBCC showcontig ('TestInt') WITH tableresults
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [NEWID()]
FROM TestGuid1
GROUP BY batchNumber
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [NEWSEQUENTIALID()]
FROM TestGuid2
GROUP BY batchNumber
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [IDENTITY()]
FROM TestInt
GROUP BY batchNumber
DROP TABLE TestGuid1
DROP TABLE TestGuid2
DROP TABLE TestInt
And the average execution times:
NEWID() 3064
NEWSEQUENTIALID() 1977
IDENTITY() 2223
The page usage is as follows:
Table Pages AveragePageDensity
----------------------------------------
TestGuid1 50871 68,4
TestGuid2 35089 99,2
TestInt 32259 98,7
I fail to see, why these page statistics (which are best for TestInt) doesn't imply that experiment three is fastest.
Can you try this modified script and post your results?
SET NOCOUNT ON
CREATE TABLE TestGuid1 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))
CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))
DECLARE @BatchCounter INT = 1
WHILE (@BatchCounter <= 20)
BEGIN
BEGIN TRAN
DECLARE @LocalCounter INT = 0
WHILE (@LocalCounter <= 100000)
BEGIN
INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
SET @LocalCounter +=1
END
SET @LocalCounter = 0
WHILE (@LocalCounter <= 100000)
BEGIN
INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
SET @LocalCounter +=1
END
SET @BatchCounter +=1
COMMIT
END
DBCC showcontig ('TestGuid1') WITH tableresults
DBCC showcontig ('TestGuid2') WITH tableresults
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [NEWID()]
FROM TestGuid1
GROUP BY batchNumber
SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [NEWSEQUENTIALID()]
FROM TestGuid2
GROUP BY batchNumber
DROP TABLE TestGuid1
DROP TABLE TestGuid2
I see quite wildly varying results between individual runs (on my laptop not a server!) but a definite trend for sequential to be faster.
NEWID()
Average 5168.9
batchNumber NEWID()
-------------------- -----------
1 4270
2 2480
3 2706
4 3333
5 7480
6 5346
7 4306
8 7713
9 7313
10 4760
11 4680
12 4113
13 3433
14 2686
15 4963
16 8040
17 5313
18 8160
19 9533
20 2750
NEWSEQUENTIALID()
Average 3000.85
batchNumber NEWSEQUENTIALID()
-------------------- -----------------
1 2016
2 1820
3 1886
4 1870
5 4873
6 3473
7 3730
8 3690
9 1983
10 2020
11 1906
12 5596
13 2100
14 1950
15 2096
16 1876
17 5196
18 2110
19 2113
20 7713
Since I wrote that original blog post, I decided to run your code, here is what I get
3 8726 -- newid()
3 12550 -- newsequantialID
Remember I am running this on a server with 32 GB of RAM and 8 procs, not on a laptop
on my local machine, I almost see no difference between the two
Remember, besides inserts, reads will be much slower because the table is fragmented
Here is what I get when running Martin's script on the server
batchNumber NEWID()
17 1696
19 1706
14 1680
16 1706
5 1660
6 1890
7 1650
8 1663
13 1673
15 1683
2 1656
9 1673
20 1750
1 2033
3 1673
10 1673
12 1670
4 1650
11 1690
18 1696
batchNumber NEWSEQUENTIALID()
2 1276
9 1260
20 1290
13 1266
15 1280
17 1266
19 1266
5 1260
6 1266
7 1260
8 1260
1 1243
3 1256
10 1270
12 1263
14 1266
16 1276
4 1256
11 1270
18 1270
Here is what happens on my desktop, files are not sized BTW
batchNumber NEWID()
1 9470
2 4446
3 5996
4 3860
5 4170
6 2403
7 3283
8 3573
9 1883
10 3980
11 2580
12 2780
13 1643
14 2836
15 3250
16 4303
17 3250
18 3376
19 8723
20 2616
batchNumber NEWSEQUENTIALID()
1 2566
2 1336
3 1256
4 3123
5 3023
6 1166
7 2396
8 1180
9 2386
10 3896
11 3790
12 3066
13 1396
14 2010
15 1183
16 3110
17 4060
18 4260
19 1896
20 2013
精彩评论