开发者

R: Confusion with apply() vs for loop

I know that I should avoid for-loops, but I'm not exactly sure how to do what I want to do with an apply func开发者_开发问答tion.

Here is a slightly simplified model of what I'm trying to do. So, essentially I have a big matrix of predictors and I want to run a regression using a window of 5 predictors on each side of the indexed predictor (i in the case of a for loop). With a for loop, I can just say something like:

results<-NULL
window<-5
for(i in 1:ncol(g))
{
    first<-i-window #Set window boundaries
    if(first<1){
        1->first
    }
    last<-i+window-1
    if(last>ncol(g)){
        ncol(g)->last
    }
    predictors<-g[,first:last]

    #Do regression stuff and return some result
    results[i]<-regression stuff
}

Is there a good way to do this with an apply function? My problem is that the vector that apply would be shoving into the function really doesn't matter. All that matters is the index.


This question touches several points that are made in 'The R Inferno' http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

There are some loops you should avoid, but not all of them. And using an apply function is more hiding the loop than avoiding it. This example seems like a good choice to leave in a 'for' loop.

Growing objects is generally bad form -- it can be extremely inefficient in some cases. If you are going to have a blanket rule, then "not growing objects" is a better one than "avoid loops".

You can create a list with the final length by:

result <- vector("list", ncol(g))
for(i in 1:ncol(g)) {
    # stuff
    result[[i]] <- #results
}

In some circumstances you might think the command:

window<-5

means give me a logical vector stating which values of 'window' are less than -5.

Spaces are good to use, mostly not to confuse humans, but to get the meaning directly above not to confuse R.


Using an apply function to do your regression is mostly a matter of preference in this case; it can handle some of the bookkeeping for you (and so possibly prevent errors) but won't speed up the code.

I would suggest using vectorized functions though to compute your first's and last's, though, perhaps something like:

window <- 5
ng <- 15 #or ncol(g)
xy <- data.frame(first = pmax( (1:ng) - window, 1 ), 
                  last = pmin( (1:ng) + window, ng) )

Or be even smarter with

xy <- data.frame(first= c(rep(1, window), 1:(ng-window) ), 
                 last = c((window+1):ng, rep(ng, window)) )

Then you could use this in a for loop like this:

results <- list()
for(i in 1:nrow(xy)) {
  results[[i]] <- xy$first[i] : xy$last[i]
}
results

or with lapply like this:

results <- lapply(1:nrow(xy), function(i) {
  xy$first[i] : xy$last[i]
})

where in both cases I just return the sequence between first and list; you would substitute with your actual regression code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜