开发者

How to create a relative range standard deviation (stdev) function in SQL Server (T-SQL) without using loops?

I'm trying to come up with a way to use a relative stdev range function in SQL Server 2005/8 without any looping. Assuming a test dataset (test_data) with 10 records and columns:

column_id identity(1,1), column_data, column_stdev

I would need my stdev() to work as follows:

stdev(column_data where id between i and i + 3) with i = column_id, autoincrementing.

So far I'm only able to acoomplish it via the following query, which is extra slow:

de开发者_StackOverflowclare @i int
set @i=1
while @i < 11
 begin
 update test_data
 set column_stdev = (select stdev(column_data) from test_data 
                     where column_id between @i and @i+2)
 set @i=@i+1
end

Could you, please, let me know if there is a way to avoid looping in order to set column_stdev to get stdev of the past 3 records in column_data.


One obvious improvement would be to get rid of the WHILE loop and the row by row processing. If you are not doing it in an explicit transaction you will need to wait for 10 individual commits to complete and the total amount of work required is greater anyway as shown by the stats below.

NB: I'm not claiming that my query can't be improved upon however. You may be able to adjust the "quirky update" approach for your needs and do it with a single scan through the data for example.

create table t (
column_id int identity(1,1) primary key, 
column_data float, 
column_stdev float)

insert into t (column_data)
select top 10 CHECKSUM(newid()) from sys.objects

SET STATISTICS IO ON

UPDATE t 
SET  column_stdev = (SELECT stdev(t2.column_data)
                     FROM t t2 
                     WHERE t2.column_id BETWEEN t.column_id AND t.column_id + 2)

/*Table 't'. Scan count 11, logical reads 42, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 23, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.*/

declare @i int
set @i=1
while @i < 11
 begin
 update t
 set column_stdev = (select stdev(column_data) from t 
                     where column_id between @i and @i+2)
 set @i=@i+1
end


/* (Aggregated the 10 results)
Table 't'. Scan count 20, logical reads 240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 10, logical reads 230, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.*/


This answer uses the (undocumented) "Quirky Update" approach from here. I couldn't figure out a way of doing it with a single scan so this approach uses 2 scans.

The stats are much better at Table 't'. Scan count 2, logical reads 4

It requires some schema changes though. An additional column and the clustered index needs to be sorted descending. I've assumed that your column_data is an int and 4 bytes. If this is not the case you will need to adjust the lengths of the binary column and substrings.

create table t(
column_id int identity(1,1), 
column_data int, 
column_stdev float,
prev3 varbinary(12))

ALTER TABLE [dbo].[t] ADD PRIMARY KEY CLUSTERED ([column_id] DESC)

GO

insert into t (column_data)
select top 10 CHECKSUM(newid()) from sys.objects

DECLARE @prev3 varbinary(12)
DECLARE @Anchor INT

UPDATE t
SET @prev3 = prev3 = SUBSTRING(CAST(column_data AS binary(4)) + ISNULL(@prev3,0x),1,12), 
@Anchor = column_id
   FROM t WITH (TABLOCKX)
 OPTION (MAXDOP 1)


UPDATE t
SET    column_stdev = 
              CASE Len(prev3)
                WHEN 12 THEN (SELECT STDEV(c)
                              FROM   (SELECT CAST(Substring(prev3, 1, 4) AS INT) AS c
                                      UNION ALL
                                      SELECT Substring(prev3, 5, 4)
                                      UNION ALL
                                      SELECT Substring(prev3, 9, 4)) t)
                WHEN 8 THEN (SELECT STDEV(c)
                             FROM   (SELECT CAST(Substring(prev3, 1, 4) AS INT) AS c
                                      UNION ALL
                                      SELECT Substring(prev3, 5, 4)
                                    ) t)
              END  
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜