TSQL: Create a Custom Identity off a Custom Identity? (Managing Database Revisions)
I would like to create a custom identity based off an custom identity. Or perhaps something similar to an identity that functions like an auto-incrementing key.
For example if I have an primary key for a drawing I would like its revision to be based off the the drawing number.
Example
DRAWING ID | REV | INFO ------+-------+------ 1 | 0 | "Draw1" 2 | 0 | "Draw2" 2 | 1 | "Draw2Edit" 2 | 2 | "Draw2MoreEdit" 3 | 0 | "Draw3" 4 | 0 | "Draw4"
If I was to insert a few more records into my table such a:
INSERT INTO DRAWING (INFO) VALUES ("Draw5")
INSERT INTO DRAWING (ID,INFO) VALUES (3,"Draw3Edit")
My table would like:
DRAWING ID | REV | INFO ------+-------+------ 1 | 0 | "Draw1" 2 | 0 | "Draw2" 2 | 1 | "Draw2Edit" 2 | 2 | "Draw2MoreEdit" 3 | 0 | "Draw3" 3 | 1 | "Draw3Edit" --NEW ROW 4 | 0 | "Draw4" 5 | 0 | "Draw5" --NEW ROW
T-SQL
CREATE TABLE D开发者_C百科RAWING
(
ID INT,
REV INT,
INFO VARCHAR(50),
PRIMARY KEY (ID,REV)
);
CREATE TABLE CURRENT_DRAWING
(
ID INT IDENTITY (1,1),
DRAWING_ID INT,
DRAWING_REV INT,
PRIMARY KEY (ID),
FOREIGN KEY (DRAWING_ID,DRAWING_REV) REFERENCES DRAWING (ID,REV)
ON UPDATE CASCADE
ON DELETE CASCADE
);
I am using SQL Server Management Studio 2005 and working on a SQL Server 2000 Database.
I will also accept possible alternatives. The primary goal is for the ID to auto-increment for new drawings. The ID will remain the same and the REV will increment on new drawing revisions.
Update:
I think I have it close to what I want:
DROP TABLE DRAW
GO
CREATE TABLE DRAW
(
ID INT DEFAULT(0),
REV INT DEFAULT(-1),
INFO VARCHAR(10),
PRIMARY KEY(ID, REV)
)
GO
CREATE TRIGGER TRIG_DRAW ON DRAW
FOR INSERT
AS
BEGIN
DECLARE @newId INT,
@newRev INT,
@insId INT,
@insRev INT
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
SELECT @insId = ID FROM inserted
SELECT @insRev = REV FROM inserted
PRINT 'BEGIN TRIG'
PRINT @insId
PRINT @insRev
PRINT @newId
PRINT @newRev
--IF ID=0 THEN IT IS A NEW ID
IF @insId <=0
BEGIN
--NEW DRAWING ID=MAX+1 AND REV=0
SELECT @newId = COALESCE(MAX(ID), 0) + 1 FROM DRAW
SELECT @newRev = 0
END
ELSE
--ELSE IT IS A NEW REV
BEGIN
--CHECK TO ENSURE ID EXISTS
IF EXISTS(SELECT * FROM DRAW WHERE ID=@insId AND REV=0)
BEGIN
PRINT 'EXISTS'
SELECT @newId = @insId
SELECT @newRev = MAX(REV) + 1 FROM DRAW WHERE ID=@insID
END
ELSE
--ID DOES NOT EXIST THEREFORE NO REVISION
BEGIN
RAISERROR 50000 'ID DOES NOT EXIST.'
ROLLBACK TRANSACTION
GOTO END_TRIG
END
END
PRINT 'END TRIG'
PRINT @insId
PRINT @insRev
PRINT @newId
PRINT @newRev
SELECT * FROM DRAW
UPDATE DRAW SET ID=@newId, REV=@newRev WHERE ID=@insId
COMMIT TRANSACTION
END_TRIG:
END
GO
INSERT INTO DRAW (INFO) VALUES ('DRAW1')
INSERT INTO DRAW (INFO) VALUES ('DRAW2')
INSERT INTO DRAW (ID,INFO) VALUES (2,'DRAW2EDIT1') --PROBLEM HERE
INSERT INTO DRAW (ID,INFO) VALUES (2,'DRAW2EDIT2')
INSERT INTO DRAW (INFO) VALUES ('DRAW3')
INSERT INTO DRAW (INFO) VALUES ('DRAW4')
GO
--SHOULD THROW
INSERT INTO DRAW (ID,INFO) VALUES (9,'DRAW9')
GO
SELECT * FROM DRAW
GO
However, I keep getting Violation of PRIMARY KEY constraint
.
I have put debug statements and it seems unlikely that I am violating my primary key:
BEGIN TRIG 0 -1 END TRIG 0 -1 1 0 (1 row(s) affected) (1 row(s) affected) (1 row(s) affected) BEGIN TRIG 0 -1 END TRIG 0 -1 2 0 (2 row(s) affected) (1 row(s) affected) (1 row(s) affected) BEGIN TRIG 2 -1 EXISTS END TRIG 2 -1 2 1 (3 row(s) affected) Msg 2627, Level 14, State 1, Procedure TRIG_DRAW, Line 58 Violation of PRIMARY KEY constraint 'PK__DRAW__56D3D912'. Cannot insert duplicate key in object 'DRAW'. The statement has been terminated.
It prints
ID | REV | INFO ----+--------+------------ 1 | 0 | DRAW1 2 | -1 | DRAW2EDIT1 --This row is being updated to 2 1 2 | 0 | DRAW2
Just before it fails and the row 2 -1 is being updated to 2 1. It should not violate my primary key.
I would actually recommend an alternative data design. This kind of key-and-sequence pattern is very difficult to implement properly in a relational database, and the drawbacks often outweigh the benefits.
You have quite a few options, but the simplest ones start with splitting the table in two:
CREATE TABLE DRAWING
(
ID INT IDENTITY(1, 1),
PRIMARY KEY (ID)
);
CREATE TABLE DRAWING_REVISION
(
ID INT IDENTITY(1, 1),
DRAWING_ID INT,
INFO VARCHAR(50),
PRIMARY KEY (ID),
CONSTRAINT FK_DRAWING_REVISION_DRAWING FOREIGN KEY (DRAWING_ID) REFERENCES DRAWING(ID)
);
This has the benefit of representing the data accurately and working correctly with no additional effort on your part. Simply add a row to the DRAWING_REVISION
table when you want to add a new revision to a drawing. Because the primary keys use the IDENTITY
specification, you don't have to do the work of finding the next ID
.
The Obvious Solution and Its Shortcoming
If you need a human-readable revision number, though, rather than for-your-server's-eyes-only ID
, that can be done in two ways. They both start by adding REV INT
to the data definition for DRAWING_REVISION
, along with a CONSTRAINT UK_DRAWING_REVISION_DRAWING_ID_REV UNIQUE (DRAWING_ID, REV)
. The trick then, of course, is to find out the next revision number for a given drawing.
If you expect to only every have a tiny number of concurrent users, you can simply SELECT MAX(REV) + 1 FROM DRAWING_REVISION WHERE DRAWING_ID = @DRAWING_ID
, either in your application code, or in an INSTEAD OF INSERT
trigger. With high concurrency or bad luck, though, users could end up blocking one another, because they could try to insert the same combination of DRAWING_ID
and REV
into DRAWING_REVISION
.
Some Background
There's really only one solution to this problem, though explaining why there's only one solution requires a little bit of background information. Consider the following code:
BEGIN TRAN
INSERT DRAWING DEFAULT VALUES;
INSERT DRAWING DEFAULT VALUES;
SELECT ID FROM DRAWING; -- Output: 1, 2
ROLLBACK TRAN
BEGIN TRAN
INSERT DRAWING DEFAULT VALUES;
SELECT ID FROM DRAWING; -- Output: 3
ROLLBACK TRAN
Of course, the output would differ on subsequent executions. Behind the scenes, SQL server is doling out IDENTITY
values and incrementing a counter. If you never actually commit the value, the server makes no attempt to "back-fill" holes in the sequence – the values are provided on a forward-only basis.
This is a feature, not a bug. IDENTITY
columns are designed to be ordered and unique, but not necessary tightly packed. The only way to guarantee tight-packing is to serialize all incoming requests, making sure that each one either completes or terminates before the next one begins; otherwise, the server could try to back-fill an IDENTITY
value that was issued a half hour ago, only to have a long-running transaction (i.e., the initial recipient of that IDENTITY
value) commit a row with a duplicate primary key.
(It's worth pointing out that when I say "transaction," that doesn't need to refer to a TSQL TRANSACTION
, though I would recommend their use. It could be absolutely any procedure on the application or SQL server side that might take any amount of time, even if that time is only the time it takes to SELECT
the next revision number and immediately thereafter INSERT
the new DRAWING_REVISION
.)
This attempt to back-fill values is just serialization in disguise, since, in a situation with two simultaneous INSERT
requests, it punishes the second request to commit. This forces the last-comer to try again (possibly several times, until it just happens that there is no conflict). There is one successful submission at a time: serialization, though without the benefit of a queue.
The SELECT MAX(REV) + 1
approach has the same disadvantage. Naturally, a MAX
approach doesn't make any attempt to back-fill values, but it does force every concurrent request to fight over the same revision number, with the same results.
Why is this bad? Database systems are designed for parallelism and currency: this ability is one of the primary advantages of a managed database over a flat-file format.
Faking It Right
So, after all that long-winded exposition, what can you do to solve the problem? You could cross your fingers and hope that your never see many concurrent users, but why would you wish against the wide-spread use of your own application? You don't want success to be your downfall, after all.
The solution is to do what SQL Server does with IDENTITY
columns: dole them out, and then throw them a way. You could use something like the following SQL code, or use equivalent application code:
ALTER TABLE DRAWING ADD REV INT NOT NULL DEFAULT(0);
GO
CREATE PROCEDURE GET_REVISION_NUMBER (@DRAWING_ID INT) AS
BEGIN
DECLARE @ATTEMPTS INT;
SET @ATTEMPTS = 0;
DECLARE @ATTEMPT_LIMIT INT;
SET @ATTEMPT_LIMIT = 5;
DECLARE @CURRENT_REV INT;
LOOP:
SET @CURRENT_REV = (SELECT REV FROM DRAWING WHERE DRAWING.ID = @DRAWING_ID);
UPDATE DRAWING SET REV = @CURRENT_REV + 1 WHERE DRAWING.ID = @DRAWING_ID AND REV = @CURRENT_REV;
SET @ATTEMPTS = @ATTEMPTS + 1;
IF (@@ROWCOUNT = 0)
BEGIN
IF (@ATTEMPTS >= @ATTEMPT_LIMIT) RETURN NULL;
GOTO LOOP;
END
RETURN @CURRENT_REV + 1;
END
The @@ROWCOUNT check is very important – this procedure needs to be non-transactional, because you don't want to hide conflicts from concurrent requests; you want to resolve them. The only way to make sure that your update definitely went through is to check whether any rows were updated.
Of course, you might have guessed that this approach isn't fool-proof. The only way to "resolve" conflicts is to try a few times before giving up. No home-brew solution will every be quite as good as the one hard-coded into the database server software. But it can get pretty close!
The stored procedure doesn't eliminate conflicts, but it does massively shrink the time span over which a conflict can occur. Rather than "reserving" a revision number for a pending INSERT
transaction, you receive the latest revision number and update the static counter as quickly as possible, getting out of the way for the next call to GET_REVISION_NUMBER
. (This is serialized, to be sure, but only for the very tiny portion of the procedure that needs to be executed in a serial fashion; unlike in many other methods, the rest of the algorithm is free to execute in parallel.)
My team used a solution similar to the one outlined above, and we found that the incidence of blocking conflicts went down by several orders of magnitude. We were able to submit thousands of back-to-back requests from a half dozen machines on the local network before one of them ended up stuck.
The stuck machine got trapped in a loop, requesting a new number from the SQL server, always getting a null result. It couldn't get a word in edgewise, so to speak. This is similar to the conflict behavior in the SELECT MAX
case, but much, much rarer. You trade the guaranteed consecutive numbering of the SELECT MAX
approach (and any related approach) for a thousand-fold increase in scalability. This trade-off is more or less fundamental: there is, to my knowledge, no guaranteed-consecutive, non-serialized solution.
The Takeaway
Of course, all of this goop is predicated upon the need for a localized, semi-consecutive number. If you can live with less-user-friendly revision numbers, you could simply expose DRAWING_REVISION.ID
. (Exposing surrogates keys is unsavory in its own way, though, if you ask me.)
The real takeaway here is that custom identity columns are harder to implement than it may first appear, and any application that may one day require scalability must be very careful about how it fetches new custom identity values.
You could create an insert trigger that sets the rev value
CREATE TRIGGER RevTrigger ON DRAWING
FOR INSERT
AS
WITH ins AS
(
SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY {another-column}) AS sequence
FROM inserted
WHERE REV IS NULL -- only update rows where REV is not included
),
draw AS
(
SELECT ID, MAX(REV) AS REV
FROM DRAWING
GROUP BY ID
)
UPDATE DRAWING
SET REV = COALESCE(draw.REV + ins.sequence, 0)
FROM DRAWING
JOIN ins ON DRAWING.ID = ins.ID AND DRAWING.{another-column} = ins.{another-column}
JOIN draw ON DRAWING.ID = draw.ID
You don't indicate how to assign a REV value if more than one row is inserted at the same time that have the same ID value. In other words, how would revision be assigned if more than one revision is added at the same time?
This solution assumes that there's an additional column that will determine revision sequence in this case (see {another-column} above). If you don't have a column like this, then change the ORDER BY {another-column}
with ORDER BY 0
in the ROW_NUMBER
function. And, remove the following AND DRAWING.{another-column} = ins.{another-column}
. After making this change, all rows in the insert with the same ID will get the same REV.
EDIT
The script above only works on SQL Server 2005 and later. Here's a solution that will work on SQL Server 2000, but doesn't address the issue of multiple revisions in one insert.
CREATE TRIGGER RevTrigger ON DRAWING
FOR INSERT
AS
UPDATE DRAWING
SET REV = COALESCE(draw.REV + 1, 0)
FROM DRAWING
JOIN inserted ON DRAWING.ID = inserted.ID AND DRAWING.{another-column} = inserted.{another-column}
AND inserted.REV IS NULL
JOIN
(
SELECT ID, MAX(REV) AS REV
FROM DRAWING
GROUP BY ID
) AS draw ON DRAWING.ID = draw.ID
精彩评论