开发者

TSQL: Create a Custom Identity off a Custom Identity? (Managing Database Revisions)

I would like to create a custom identity based off an custom identity. Or perhaps something similar to an identity that functions like an auto-incrementing key.

For example if I have an primary key for a drawing I would like its revision to be based off the the drawing number.

Example

DRAWING
ID    | REV   | INFO
------+-------+------
1     | 0     | "Draw1"
2     | 0     | "Draw2"
2     | 1     | "Draw2Edit"
2     | 2     | "Draw2MoreEdit"
3     | 0     | "Draw3"
4     | 0     | "Draw4"

If I was to insert a few more records into my table such a:

INSERT INTO DRAWING (INFO) VALUES ("Draw5")
INSERT INTO DRAWING (ID,INFO) VALUES (3,"Draw3Edit")

My table would like:

DRAWING
ID    | REV   | INFO
------+-------+------
1     | 0     | "Draw1"
2     | 0     | "Draw2"
2     | 1     | "Draw2Edit"
2     | 2     | "Draw2MoreEdit"
3     | 0     | "Draw3"
3     | 1     | "Draw3Edit"      --NEW ROW
4     | 0     | "Draw4"
5     | 0     | "Draw5"          --NEW ROW

T-SQL

CREATE TABLE D开发者_C百科RAWING
(
    ID INT,
    REV INT,  
    INFO VARCHAR(50),
    PRIMARY KEY (ID,REV)
);

CREATE TABLE CURRENT_DRAWING
(
    ID INT IDENTITY (1,1),
    DRAWING_ID INT,
    DRAWING_REV INT,
    PRIMARY KEY (ID),
    FOREIGN KEY (DRAWING_ID,DRAWING_REV) REFERENCES DRAWING (ID,REV)
        ON UPDATE CASCADE
        ON DELETE CASCADE
);

I am using SQL Server Management Studio 2005 and working on a SQL Server 2000 Database.

I will also accept possible alternatives. The primary goal is for the ID to auto-increment for new drawings. The ID will remain the same and the REV will increment on new drawing revisions.

Update:

I think I have it close to what I want:

DROP TABLE DRAW

GO

CREATE TABLE DRAW
(
    ID INT DEFAULT(0), 
    REV INT DEFAULT(-1), 
    INFO VARCHAR(10), 
    PRIMARY KEY(ID, REV)
)

GO

CREATE TRIGGER TRIG_DRAW ON DRAW
FOR INSERT
AS
BEGIN
    DECLARE @newId INT,
            @newRev INT,
            @insId INT,
            @insRev INT

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED
    BEGIN TRANSACTION

    SELECT @insId = ID FROM inserted
    SELECT @insRev = REV FROM inserted

    PRINT 'BEGIN TRIG'
    PRINT @insId
    PRINT @insRev
    PRINT @newId
    PRINT @newRev


    --IF ID=0 THEN IT IS A NEW ID
    IF @insId <=0
    BEGIN
        --NEW DRAWING ID=MAX+1 AND REV=0
        SELECT @newId = COALESCE(MAX(ID), 0) + 1 FROM DRAW
        SELECT @newRev = 0
    END
    ELSE
    --ELSE IT IS A NEW REV
    BEGIN
        --CHECK TO ENSURE ID EXISTS
        IF EXISTS(SELECT * FROM DRAW WHERE ID=@insId AND REV=0)
        BEGIN
            PRINT 'EXISTS'
            SELECT @newId = @insId
            SELECT @newRev = MAX(REV) + 1 FROM DRAW WHERE ID=@insID
        END
        ELSE
        --ID DOES NOT EXIST THEREFORE NO REVISION
        BEGIN
            RAISERROR 50000 'ID DOES NOT EXIST.'
            ROLLBACK TRANSACTION
            GOTO END_TRIG
        END
    END

    PRINT 'END TRIG'
    PRINT @insId
    PRINT @insRev
    PRINT @newId
    PRINT @newRev

    SELECT * FROM DRAW

    UPDATE DRAW SET ID=@newId, REV=@newRev WHERE ID=@insId



    COMMIT TRANSACTION
    END_TRIG:
END

GO


INSERT INTO DRAW (INFO) VALUES ('DRAW1')
INSERT INTO DRAW (INFO) VALUES ('DRAW2')
INSERT INTO DRAW (ID,INFO) VALUES (2,'DRAW2EDIT1') --PROBLEM HERE
INSERT INTO DRAW (ID,INFO) VALUES (2,'DRAW2EDIT2')
INSERT INTO DRAW (INFO) VALUES ('DRAW3')
INSERT INTO DRAW (INFO) VALUES ('DRAW4')

GO

--SHOULD THROW
INSERT INTO DRAW (ID,INFO) VALUES (9,'DRAW9')

GO

SELECT * FROM DRAW

GO

However, I keep getting Violation of PRIMARY KEY constraint.

I have put debug statements and it seems unlikely that I am violating my primary key:

BEGIN TRIG
0
-1


END TRIG
0
-1
1
0

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)
BEGIN TRIG
0
-1


END TRIG
0
-1
2
0

(2 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)
BEGIN TRIG
2
-1


EXISTS
END TRIG
2
-1
2
1

(3 row(s) affected)
Msg 2627, Level 14, State 1, Procedure TRIG_DRAW, Line 58
Violation of PRIMARY KEY constraint 'PK__DRAW__56D3D912'. Cannot insert duplicate key in object 'DRAW'.
The statement has been terminated.

It prints

ID  | REV    | INFO
----+--------+------------
1   |   0    |  DRAW1
2   |  -1    |  DRAW2EDIT1  --This row is being updated to 2 1 
2   |   0    |  DRAW2

Just before it fails and the row 2 -1 is being updated to 2 1. It should not violate my primary key.


I would actually recommend an alternative data design. This kind of key-and-sequence pattern is very difficult to implement properly in a relational database, and the drawbacks often outweigh the benefits.

You have quite a few options, but the simplest ones start with splitting the table in two:

CREATE TABLE DRAWING
(
    ID INT IDENTITY(1, 1),
    PRIMARY KEY (ID)
);

CREATE TABLE DRAWING_REVISION
(    
    ID INT IDENTITY(1, 1),
    DRAWING_ID INT,
    INFO VARCHAR(50),
    PRIMARY KEY (ID),
    CONSTRAINT FK_DRAWING_REVISION_DRAWING FOREIGN KEY (DRAWING_ID) REFERENCES DRAWING(ID)
);

This has the benefit of representing the data accurately and working correctly with no additional effort on your part. Simply add a row to the DRAWING_REVISION table when you want to add a new revision to a drawing. Because the primary keys use the IDENTITY specification, you don't have to do the work of finding the next ID.

The Obvious Solution and Its Shortcoming

If you need a human-readable revision number, though, rather than for-your-server's-eyes-only ID, that can be done in two ways. They both start by adding REV INT to the data definition for DRAWING_REVISION, along with a CONSTRAINT UK_DRAWING_REVISION_DRAWING_ID_REV UNIQUE (DRAWING_ID, REV). The trick then, of course, is to find out the next revision number for a given drawing.

If you expect to only every have a tiny number of concurrent users, you can simply SELECT MAX(REV) + 1 FROM DRAWING_REVISION WHERE DRAWING_ID = @DRAWING_ID, either in your application code, or in an INSTEAD OF INSERT trigger. With high concurrency or bad luck, though, users could end up blocking one another, because they could try to insert the same combination of DRAWING_ID and REV into DRAWING_REVISION.

Some Background

There's really only one solution to this problem, though explaining why there's only one solution requires a little bit of background information. Consider the following code:

BEGIN TRAN

INSERT DRAWING DEFAULT VALUES;
INSERT DRAWING DEFAULT VALUES;
SELECT ID FROM DRAWING; -- Output: 1, 2

ROLLBACK TRAN

BEGIN TRAN

INSERT DRAWING DEFAULT VALUES;
SELECT ID FROM DRAWING; -- Output: 3

ROLLBACK TRAN

Of course, the output would differ on subsequent executions. Behind the scenes, SQL server is doling out IDENTITY values and incrementing a counter. If you never actually commit the value, the server makes no attempt to "back-fill" holes in the sequence – the values are provided on a forward-only basis.

This is a feature, not a bug. IDENTITY columns are designed to be ordered and unique, but not necessary tightly packed. The only way to guarantee tight-packing is to serialize all incoming requests, making sure that each one either completes or terminates before the next one begins; otherwise, the server could try to back-fill an IDENTITY value that was issued a half hour ago, only to have a long-running transaction (i.e., the initial recipient of that IDENTITY value) commit a row with a duplicate primary key.

(It's worth pointing out that when I say "transaction," that doesn't need to refer to a TSQL TRANSACTION, though I would recommend their use. It could be absolutely any procedure on the application or SQL server side that might take any amount of time, even if that time is only the time it takes to SELECT the next revision number and immediately thereafter INSERT the new DRAWING_REVISION.)

This attempt to back-fill values is just serialization in disguise, since, in a situation with two simultaneous INSERT requests, it punishes the second request to commit. This forces the last-comer to try again (possibly several times, until it just happens that there is no conflict). There is one successful submission at a time: serialization, though without the benefit of a queue.

The SELECT MAX(REV) + 1 approach has the same disadvantage. Naturally, a MAX approach doesn't make any attempt to back-fill values, but it does force every concurrent request to fight over the same revision number, with the same results.

Why is this bad? Database systems are designed for parallelism and currency: this ability is one of the primary advantages of a managed database over a flat-file format.

Faking It Right

So, after all that long-winded exposition, what can you do to solve the problem? You could cross your fingers and hope that your never see many concurrent users, but why would you wish against the wide-spread use of your own application? You don't want success to be your downfall, after all.

The solution is to do what SQL Server does with IDENTITY columns: dole them out, and then throw them a way. You could use something like the following SQL code, or use equivalent application code:

ALTER TABLE DRAWING ADD REV INT NOT NULL DEFAULT(0);

GO

CREATE PROCEDURE GET_REVISION_NUMBER (@DRAWING_ID INT) AS
BEGIN
    DECLARE @ATTEMPTS INT;
    SET @ATTEMPTS = 0;
    DECLARE @ATTEMPT_LIMIT INT;
    SET @ATTEMPT_LIMIT = 5;
    DECLARE @CURRENT_REV INT;
    LOOP:
        SET @CURRENT_REV = (SELECT REV FROM DRAWING WHERE DRAWING.ID = @DRAWING_ID);
        UPDATE DRAWING SET REV = @CURRENT_REV + 1 WHERE DRAWING.ID = @DRAWING_ID AND REV = @CURRENT_REV;
        SET @ATTEMPTS = @ATTEMPTS + 1;
        IF (@@ROWCOUNT = 0)
        BEGIN
            IF (@ATTEMPTS >= @ATTEMPT_LIMIT) RETURN NULL;
            GOTO LOOP;
        END
    RETURN @CURRENT_REV + 1;
END

The @@ROWCOUNT check is very important – this procedure needs to be non-transactional, because you don't want to hide conflicts from concurrent requests; you want to resolve them. The only way to make sure that your update definitely went through is to check whether any rows were updated.

Of course, you might have guessed that this approach isn't fool-proof. The only way to "resolve" conflicts is to try a few times before giving up. No home-brew solution will every be quite as good as the one hard-coded into the database server software. But it can get pretty close!

The stored procedure doesn't eliminate conflicts, but it does massively shrink the time span over which a conflict can occur. Rather than "reserving" a revision number for a pending INSERT transaction, you receive the latest revision number and update the static counter as quickly as possible, getting out of the way for the next call to GET_REVISION_NUMBER. (This is serialized, to be sure, but only for the very tiny portion of the procedure that needs to be executed in a serial fashion; unlike in many other methods, the rest of the algorithm is free to execute in parallel.)

My team used a solution similar to the one outlined above, and we found that the incidence of blocking conflicts went down by several orders of magnitude. We were able to submit thousands of back-to-back requests from a half dozen machines on the local network before one of them ended up stuck.

The stuck machine got trapped in a loop, requesting a new number from the SQL server, always getting a null result. It couldn't get a word in edgewise, so to speak. This is similar to the conflict behavior in the SELECT MAX case, but much, much rarer. You trade the guaranteed consecutive numbering of the SELECT MAX approach (and any related approach) for a thousand-fold increase in scalability. This trade-off is more or less fundamental: there is, to my knowledge, no guaranteed-consecutive, non-serialized solution.

The Takeaway

Of course, all of this goop is predicated upon the need for a localized, semi-consecutive number. If you can live with less-user-friendly revision numbers, you could simply expose DRAWING_REVISION.ID. (Exposing surrogates keys is unsavory in its own way, though, if you ask me.)

The real takeaway here is that custom identity columns are harder to implement than it may first appear, and any application that may one day require scalability must be very careful about how it fetches new custom identity values.


You could create an insert trigger that sets the rev value

CREATE TRIGGER RevTrigger ON DRAWING
FOR INSERT
AS
WITH ins AS
    (
    SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY {another-column}) AS sequence
    FROM inserted
    WHERE REV IS NULL  -- only update rows where REV is not included
    ),
  draw AS
    (
    SELECT ID, MAX(REV) AS REV
    FROM DRAWING
    GROUP BY ID
    )

UPDATE DRAWING
SET REV = COALESCE(draw.REV + ins.sequence, 0)
FROM DRAWING
JOIN ins ON DRAWING.ID = ins.ID AND DRAWING.{another-column} = ins.{another-column}
JOIN draw ON DRAWING.ID = draw.ID

You don't indicate how to assign a REV value if more than one row is inserted at the same time that have the same ID value. In other words, how would revision be assigned if more than one revision is added at the same time?

This solution assumes that there's an additional column that will determine revision sequence in this case (see {another-column} above). If you don't have a column like this, then change the ORDER BY {another-column} with ORDER BY 0 in the ROW_NUMBER function. And, remove the following AND DRAWING.{another-column} = ins.{another-column}. After making this change, all rows in the insert with the same ID will get the same REV.

EDIT
The script above only works on SQL Server 2005 and later. Here's a solution that will work on SQL Server 2000, but doesn't address the issue of multiple revisions in one insert.

CREATE TRIGGER RevTrigger ON DRAWING
FOR INSERT
AS

UPDATE DRAWING
SET REV = COALESCE(draw.REV + 1, 0)
FROM DRAWING
JOIN inserted ON DRAWING.ID = inserted.ID AND DRAWING.{another-column} = inserted.{another-column}
            AND inserted.REV IS NULL
JOIN
    (
    SELECT ID, MAX(REV) AS REV
    FROM DRAWING
    GROUP BY ID
    ) AS draw ON DRAWING.ID = draw.ID
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜