开发者

How to keep a rolling checksum in SQL?

I am trying to keep a rolling checksum to account for order, so take the previous 'checksum' and xor it with the current one a开发者_如何学Cnd generate a new checksum.

Name      Checksum     Rolling Checksum
------    -----------  -----------------
foo       11829231     11829231
bar       27380135     checksum(27380135 ^ 11829231) = 93291803
baz       96326587     checksum(96326587 ^ 93291803) = 67361090

How would I accomplish something like this?

(Note that the calculations are completely made up and are for illustration only)


This is basically the running total problem.

Edit:

My original claim was that is one of the few places where a cursor based solution actually performs best. The problem with the triangular self join solution is that it will repeatedly end up recalculating the same cumulative checksum as a subcalculation for the next step so is not very scalable as the work required grows exponentially with the number of rows.

Corina's answer uses the "quirky update" approach. I've adjusted it to do the check sum and in my test found that it took 3 seconds rather than 26 seconds for the cursor solution. Both produced the same results. Unfortunately however it relies on an undocumented aspect of Update behaviour. I would definitely read the discussion here before deciding whether to rely on this in production code.

There is a third possibility described here (using the CLR) which I didn't have time to test. But from the discussion here it seems to be a good possibility for calculating running total type things at display time but out performed by the cursor when the result of the calculation must be saved back.

CREATE TABLE TestTable
(
PK int identity(1,1) primary key clustered,
[Name] varchar(50),
[CheckSum] AS CHECKSUM([Name]),
RollingCheckSum1 int NULL,
RollingCheckSum2 int NULL
)


/*Insert some random records (753,571 on my machine)*/
INSERT INTO TestTable ([Name])
SELECT newid() FROM sys.objects s1, sys.objects s2, sys.objects s3

Approach One: Based on the Jeff Moden Article

DECLARE @RCS int

 UPDATE TestTable
    SET @RCS = RollingCheckSum1 = 
                                 CASE WHEN @RCS IS NULL THEN 
                                                        [CheckSum] 
                                 ELSE 
                                             CHECKSUM([CheckSum] ^ @RCS) 
                                 END
   FROM TestTable WITH (TABLOCKX)
 OPTION (MAXDOP 1)

Approach Two - Using the same cursor options as Hugo Kornelis advocates in the discussion for that article.

SET NOCOUNT ON
BEGIN TRAN

DECLARE @RCS2 INT
DECLARE @PK INT, @CheckSum INT

DECLARE curRollingCheckSum CURSOR LOCAL STATIC READ_ONLY
    FOR
    SELECT PK, [CheckSum]
    FROM         TestTable
    ORDER BY PK

   OPEN curRollingCheckSum

  FETCH NEXT FROM curRollingCheckSum
   INTO @PK, @CheckSum

  WHILE @@FETCH_STATUS = 0
  BEGIN

  SET @RCS2 = CASE WHEN @RCS2 IS NULL THEN @CheckSum ELSE CHECKSUM(@CheckSum ^ @RCS2) END


 UPDATE dbo.TestTable 
    SET RollingCheckSum2 = @RCS2
  WHERE @PK = PK

  FETCH NEXT FROM curRollingCheckSum
   INTO @PK, @CheckSum

    END

COMMIT

Test they are the same

SELECT * FROM TestTable
WHERE RollingCheckSum1<> RollingCheckSum2


I'm not sure about a rolling checksum, but for a rolling sum for instance, you can do this using the UPDATE command:

declare @a table (name varchar(2), value int, rollingvalue int)
insert into @a
    select 'a', 1, 0 union all select 'b', 2, 0 union all select 'c', 3, 0 

select * from @a

declare @sum int
set @sum = 0

update @a
set @sum =  rollingvalue = value + @sum 

select * from @a


Select Name, Checksum
    , (Select T1.Checksum_Agg(Checksum)
        From Table As T1
        Where T1.Name < T.Name) As RollingChecksum
From Table As T
Order By T.Name

To do a rolling anything, you need some semblance of an order to the rows. That can be by name, an integer key, a date or whatever. In my example, I used name (even though the order in your sample data isn't alphabetical). In addition, I'm using the Checksum_Agg function in SQL.

In addition, you would ideally have a unique value on which you compare the inner and outer query. E.g., Where T1.PK < T.PK for an integer key or even string key would work well. In my solution if Name had a unique constraint, it would also work well enough.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜