Window moving average in sql server
I am trying to create a function that computes a windowed moving average in SQLServer 2008. I am quite new to SQL so I am having a fair bit of difficulty. The data that I am trying to perform the moving average on needs to be grouped by day (it is all timestamped data) and then a variable moving average window needs to be applied to it.
I already have a function that groups the data by day (and @id) which is shown at the bottom. I have a few questions:
Would it be better to call the grouping function inside the moving average function or should I do it all at once?
Is it possible to get the moving average for the dates input into the function, but go back n days to begin the moving average so that the first n days of the returned data will not have 0 for their average? (ie. if they want a 7 day moving average from 01-08-2011 to 02-08-2011 that I start the moving average calculation on 01-01-2011 so that the first day they defined has a value?)
I am in the process of looking into how to do the moving average, and know that a moving window seems to be the best option (currentSum = prevSum + todayCount - nthDayAgoCount) / nDays but I am still working on figuring out the SQL implementation of this.
I have a grouping 开发者_如何学运维function that looks like this (some variables removed for visibility purposes):
SELECT
'ALL' as GeogType,
CAST(v.AdmissionOn as date) as dtAdmission,
CASE WHEN @id IS NULL THEN 99 ELSE v.ID END,
COUNT(*) as nVisits
FROM dbo.Table1 v INNER JOIN dbo.Table2 t ON v.FSLDU = t.FSLDU5
WHERE v.AdmissionOn >= '01-01-2010' AND v.AdmissionOn < DATEADD(day,1,'02-01-2010')
AND v.ID = Coalesce(@id,ID)
GROUP BY
CAST(v.AdmissionOn as date),
CASE WHEN @id IS NULL THEN 99 ELSE v.ID END
ORDER BY 2,3,4
Which returns a table like so:
ALL 2010-01-01 1 103
ALL 2010-01-02 1 114
ALL 2010-01-03 1 86
ALL 2010-01-04 1 88
ALL 2010-01-05 1 84
ALL 2010-01-06 1 87
ALL 2010-01-07 1 82
EDIT: To answer the first question I asked:
I ended up creating a function which declared a temporary table and inserted the results from the count function into it, then used the example from user662852
to compute the moving average.
Take the hardcoded date range out of your query. Write the output (like your sample at the end) to a temp table (I called it #visits below).
Try this self join to the temp table:
Select list.dtadmission
, AVG(data.nvisits) as Avg
, SUM(data.nvisits) as sum
, COUNT(data.nvisits) as RollingDayCount
, MIN(data.dtadmission) as Verifymindate
, MAX(data.dtadmission) as Verifymaxdate
from #visits as list
inner join #visits as data
on list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission) group by list.dtadmission
EDIT: I didn't have enough room in Comments to say this in response to your question:
My join is "kinda cartesian" because it uses a between in the join constraint. Each record in list is going up against every other record, and then I want the ones where the date I report is between a lower bound of (-7) days and today. Every data date is available to list date, this is the key to your question. I could have written the join condition as
list.dtadmission between DATEADD(DD,-6,data.dtadmission) and data.dtadmission
But what really happened was I tested it as
list.dtadmission between DATEADD(DD,6,data.dtadmission) and data.dtadmission
Which returns no records because the syntax is "Between LOW and HIGH". I facepalmed on 0 records and swapped the arguments, that's all.
Try the following, see what I mean: This is the cartesian join for just one listdate:
SELECT
list.[dtAdmission] as listdate
,data.[dtAdmission] as datadate
,data.nVisits as datadata
,DATEADD(dd,6,list.dtadmission) as listplus6
,DATEADD(dd,6,data.dtAdmission ) as datapplus6
from [sandbox].[dbo].[admAvg] as list inner join [sandbox].[dbo].[admAvg] as data
on
1=1
where list.dtAdmission = '5-Jan-2011'
Compare this to the actual join condition
SELECT
list.[dtAdmission] as listdate
,data.[dtAdmission] as datadate
,data.nVisits as datadata
,DATEADD(dd,6,list.dtadmission) as listplus6
,DATEADD(dd,6,data.dtAdmission ) as datapplus6
from [sandbox].[dbo].[admAvg] as list inner join [sandbox].[dbo].[admAvg] as data
on
list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission)
where list.dtAdmission = '5-Jan-2011'
See how list date is between datadate and dataplus6 in all the records?
精彩评论