How to keep a rolling checksum in SQL?
I am trying to keep a rolling checksum to account for order, so take the previous 'checksum' and xor it with the current one a开发者_如何学Cnd generate a new checksum.
Name Checksum Rolling Checksum
------ ----------- -----------------
foo 11829231 11829231
bar 27380135 checksum(27380135 ^ 11829231) = 93291803
baz 96326587 checksum(96326587 ^ 93291803) = 67361090
How would I accomplish something like this?
(Note that the calculations are completely made up and are for illustration only)
This is basically the running total problem.
Edit:
My original claim was that is one of the few places where a cursor based solution actually performs best. The problem with the triangular self join solution is that it will repeatedly end up recalculating the same cumulative checksum as a subcalculation for the next step so is not very scalable as the work required grows exponentially with the number of rows.
Corina's answer uses the "quirky update" approach. I've adjusted it to do the check sum and in my test found that it took 3 seconds rather than 26 seconds for the cursor solution. Both produced the same results. Unfortunately however it relies on an undocumented aspect of Update behaviour. I would definitely read the discussion here before deciding whether to rely on this in production code.
There is a third possibility described here (using the CLR) which I didn't have time to test. But from the discussion here it seems to be a good possibility for calculating running total type things at display time but out performed by the cursor when the result of the calculation must be saved back.
CREATE TABLE TestTable
(
PK int identity(1,1) primary key clustered,
[Name] varchar(50),
[CheckSum] AS CHECKSUM([Name]),
RollingCheckSum1 int NULL,
RollingCheckSum2 int NULL
)
/*Insert some random records (753,571 on my machine)*/
INSERT INTO TestTable ([Name])
SELECT newid() FROM sys.objects s1, sys.objects s2, sys.objects s3
Approach One: Based on the Jeff Moden Article
DECLARE @RCS int
UPDATE TestTable
SET @RCS = RollingCheckSum1 =
CASE WHEN @RCS IS NULL THEN
[CheckSum]
ELSE
CHECKSUM([CheckSum] ^ @RCS)
END
FROM TestTable WITH (TABLOCKX)
OPTION (MAXDOP 1)
Approach Two - Using the same cursor options as Hugo Kornelis advocates in the discussion for that article.
SET NOCOUNT ON
BEGIN TRAN
DECLARE @RCS2 INT
DECLARE @PK INT, @CheckSum INT
DECLARE curRollingCheckSum CURSOR LOCAL STATIC READ_ONLY
FOR
SELECT PK, [CheckSum]
FROM TestTable
ORDER BY PK
OPEN curRollingCheckSum
FETCH NEXT FROM curRollingCheckSum
INTO @PK, @CheckSum
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RCS2 = CASE WHEN @RCS2 IS NULL THEN @CheckSum ELSE CHECKSUM(@CheckSum ^ @RCS2) END
UPDATE dbo.TestTable
SET RollingCheckSum2 = @RCS2
WHERE @PK = PK
FETCH NEXT FROM curRollingCheckSum
INTO @PK, @CheckSum
END
COMMIT
Test they are the same
SELECT * FROM TestTable
WHERE RollingCheckSum1<> RollingCheckSum2
I'm not sure about a rolling checksum, but for a rolling sum for instance, you can do this using the UPDATE command:
declare @a table (name varchar(2), value int, rollingvalue int)
insert into @a
select 'a', 1, 0 union all select 'b', 2, 0 union all select 'c', 3, 0
select * from @a
declare @sum int
set @sum = 0
update @a
set @sum = rollingvalue = value + @sum
select * from @a
Select Name, Checksum
, (Select T1.Checksum_Agg(Checksum)
From Table As T1
Where T1.Name < T.Name) As RollingChecksum
From Table As T
Order By T.Name
To do a rolling anything, you need some semblance of an order to the rows. That can be by name, an integer key, a date or whatever. In my example, I used name (even though the order in your sample data isn't alphabetical). In addition, I'm using the Checksum_Agg function in SQL.
In addition, you would ideally have a unique value on which you compare the inner and outer query. E.g., Where T1.PK < T.PK
for an integer key or even string key would work well. In my solution if Name
had a unique constraint, it would also work well enough.
精彩评论