Can `ddply` (or similar) do a sliding window?
Something like
sliding = function(df, n, f)
ldply(1:(nrow(df) - n + 1), function(k)
f(df[k:(k + n - 1), ])
)
That would be used like
> df
n a
1 1 0.8021891
2 2 0.9446330
...
> sliding(df, 2, function(df) with(df,
+ data.frame(n = n[1], a = a[1], b = sum(n - a))
+ ))
n 开发者_开发技巧 a b
1 1 0.8021891 1.253178
...
Except straight inside ddply
, so that I could get the nice syntactic sugar
that comes with it?
Since there hasn't been an answer posted to this question, I thought I'd put one up to make the case that there is actually an even better way to go about this type of problem - one that can also be potentially thousands of times faster. (If this isn't helpful, please let me know, but I thought it would be better than nothing here)
Whenever I hear "moving average" or "sliding window", FFT convolution immediately pops into my mind. This is because it can handle these types of problems in an extremely efficient manner. Since all the "sliding" is done behind the scenes, I think that it also has all the syntactic beauty you could ever ask for.
(The following code is available in one file at https://gist.github.com/1320175)
We start by simulating some data (I'm using integers here for simplicity, but of course you don't need to).
require(plyr)
set.seed(12345)
n = 10
n.sum = 2
a = sample.int(10, n, replace=T)
df = data.frame(n=1:n, a)
> df
n a
1 1 8
2 2 9
3 3 8
4 4 9
5 5 5
6 6 2
7 7 4
8 8 6
9 9 8
10 10 10
Now, we will precompute n-a
all in one go.
n.minus.a = with(df, n - a)
Next, define a kernel k
that, when convolved with our input n.minus.a
, will do the summing (or averaging/smoothing/whatever else) to our data.
k = rep(0, n)
k[1:n.sum] = 1
With everything set up, we can define a function to do this convolution efficiently in the frequency domain via fft()
.
myConv <- function(x, k){
Fx = fft(x)
Fk = fft(k)
Fxk = Fx * Fk
xk = fft(Fxk, inverse=T)
(Re(xk) / n)[-(1:(n.sum-1))]
}
The syntax to execute this is nice and simple:
> myConv(n.minus.a, k)
[1] -14 -12 -10 -5 4 7 5 3 1
All this also happens under the hood when you use the convolve()
convenience function in R.
> convolve(n.minus.a, k)[1:(length(n.minus.a)-n.sum+1)]
[1] -14 -12 -10 -5 4 7 5 3 1
We now compare this to the manual method to show that the results are all equivalent:
> sliding(df, 2, function(df) with(df, data.frame(n = n[1], a = a[1], b = sum(n - a))))
n a b
1 1 8 -14
2 2 9 -12
3 3 8 -10
4 4 9 -5
5 5 5 4
6 6 2 7
7 7 4 5
8 8 6 3
9 9 8 1
Finally, we will make n=10^4
and test all these methods for speed:
> system.time(myConv(n.minus.a, k))
user system elapsed
0.002 0.000 0.002
> system.time(convolve(n.minus.a, k, type='circ')[1:(length(n.minus.a)-n.sum+1)])
user system elapsed
0.002 0.000 0.002
> system.time(sliding(df, 2, function(df) with(df, data.frame(n = n[1], a = a[1], b = sum(n - a)))))
user system elapsed
7.944 0.018 7.962
The FFT methods return almost instantaneously, and, even with this rough timing, are almost 4000 times faster than the manual method.
Of course not every sort of sliding problem can be pigeon-holed into this paradigm, but for numerical problems like this one using sum()
(and also means, weighted averages, etc.) it works perfectly. At any rate, it is usually well worth it to at least google a bit to see if there is a filter kernel available that will do the trick for a given probelm. Good luck!
精彩评论