开发者

Mimicking createFolds using time-series cross validation

The R package caret provides a handy function createFolds, which returns a list of indexes for training sets to be used in cross-validation:

set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)

$Fold1
[1]  1  2  5  6  7  8  9 10

$Fold2
[1]  1  3  4  5  6  8  9 10

$Fold3
[1]  1  2  3  4  5  7  8 10

$Fold4
[1] 1 2 3 4 6 7 8 9

$Fold5
[1]  2  3  4  5  6  7  9 10

I would like to create a similar function, except I want to return a list of indexes to be used in time-series cross validation. I found some example code in R, but I want to generalize and functionalize things more. Here's what I initially came up with:

createTSfolds <- function(y, Min=max(frequency(y),3)) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    starts <- rep(1,length(stops))
    out <- mapply(seq,starts,stops)
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4

$Fold3
[1] 1 2 3 4 5

$Fold4
[1] 1 2 3 4 5 6

$Fold5
[1] 1 2 3 4 5 6 7

$Fold6
[1] 1 2 3 4 5 6 7 8

$Fold7
[1] 1 2 3 4 5 6 7 8 9

(Min is the minimum number of observation needed to fit a model)

This function works pretty well for now, but I'd like to add 2 functions that Rob Hyndman discusses:

  1. Windowing: Instead of the training set extending back to the 1st observation, it extends back n observations.
  2. Variable forecast horizons: Instead adding 1 index to the training set each fold, add k to the training set each fold.

Here is how I implemented windowing:

createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    if (is.na(lookback)) { 
        starts <- as.list(rep(1,length(stops)))
        out <- mapply(seq,starts,stops)
    } else {
        starts <- st开发者_JS百科ops-Min+1
        out <- mapply(seq,starts,stops)
        out <- split(t(out),1:nrow(t(out)))
    }
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x,Min=4,lookback=4)

I can't figure out how to implement variable forecast horizons, which would look like this: For example if k=3:

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

I'm looking for ways to improve my existing code, as well as ways to add variable increments to the training set each fold.

Thank you


Here is one approach. It is not entirely robust, as I am not sure about the output you seek when both lookback and k are present. Let me know if this is what you were looking for.

 createTSfolds2 <- function(y, Min = max(frequency(y), 3), lookback = NA, k = NA){
   out = llply(Min:(length(y) - 1), seq)
   if (!is.na(k)) {out = out[seq(1, length(out), k)]}
   if (!is.na(lookback)) {
     out = plyr::llply(out, function(z) z[(length(z) - lookback + 1):length(z)])
   }
   names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
   return(out)
 }

createTSfolds2(x, Min = 3, lookback = NA, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

createTSfolds2(x, Min = 3, lookback = 3, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 4 5 6

$Fold3
[1] 7 8 9
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜