Mimicking createFolds using time-series cross validation
The R package caret provides a handy function createFolds, which returns a list of indexes for training sets to be used in cross-validation:
set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)
$Fold1
[1] 1 2 5 6 7 8 9 10
$Fold2
[1] 1 3 4 5 6 8 9 10
$Fold3
[1] 1 2 3 4 5 7 8 10
$Fold4
[1] 1 2 3 4 6 7 8 9
$Fold5
[1] 2 3 4 5 6 7 9 10
I would like to create a similar function, except I want to return a list of indexes to be used in time-series cross validation. I found some example code in R, but I want to generalize and functionalize things more. Here's what I initially came up with:
createTSfolds <- function(y, Min=max(frequency(y),3)) {
i <- seq(along=y)
stops <- i[Min:(length(i)-1)]
starts <- rep(1,length(stops))
out <- mapply(seq,starts,stops)
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
out
}
createTSfolds(x)
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4
$Fold3
[1] 1 2 3 4 5
$Fold4
[1] 1 2 3 4 5 6
$Fold5
[1] 1 2 3 4 5 6 7
$Fold6
[1] 1 2 3 4 5 6 7 8
$Fold7
[1] 1 2 3 4 5 6 7 8 9
(Min is the minimum number of observation needed to fit a model)
This function works pretty well for now, but I'd like to add 2 functions that Rob Hyndman discusses:
- Windowing: Instead of the training set extending back to the 1st observation, it extends back n observations.
- Variable forecast horizons: Instead adding 1 index to the training set each fold, add k to the training set each fold.
Here is how I implemented windowing:
createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
i <- seq(along=y)
stops <- i[Min:(length(i)-1)]
if (is.na(lookback)) {
starts <- as.list(rep(1,length(stops)))
out <- mapply(seq,starts,stops)
} else {
starts <- st开发者_JS百科ops-Min+1
out <- mapply(seq,starts,stops)
out <- split(t(out),1:nrow(t(out)))
}
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
out
}
createTSfolds(x,Min=4,lookback=4)
I can't figure out how to implement variable forecast horizons, which would look like this: For example if k=3:
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4 5 6
$Fold3
[1] 1 2 3 4 5 6 7 8 9
I'm looking for ways to improve my existing code, as well as ways to add variable increments to the training set each fold.
Thank you
Here is one approach. It is not entirely robust, as I am not sure about the output you seek when both lookback
and k
are present. Let me know if this is what you were looking for.
createTSfolds2 <- function(y, Min = max(frequency(y), 3), lookback = NA, k = NA){
out = llply(Min:(length(y) - 1), seq)
if (!is.na(k)) {out = out[seq(1, length(out), k)]}
if (!is.na(lookback)) {
out = plyr::llply(out, function(z) z[(length(z) - lookback + 1):length(z)])
}
names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
return(out)
}
createTSfolds2(x, Min = 3, lookback = NA, k = 3)
$Fold1
[1] 1 2 3
$Fold2
[1] 1 2 3 4 5 6
$Fold3
[1] 1 2 3 4 5 6 7 8 9
createTSfolds2(x, Min = 3, lookback = 3, k = 3)
$Fold1
[1] 1 2 3
$Fold2
[1] 4 5 6
$Fold3
[1] 7 8 9
精彩评论