开发者

vector binding in R

I would like to implement a simulation program, which requires the following structure:

It has a for loop, the program will generate an vector in each iteration. I need each generated vector is appended 开发者_高级运维to the existing vector.

I do not how how to do this in R. Thanks for the help.


These answers work, but they all require a call to a non-deterministic function like sample() in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for loop. The trick is to use the n argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n calls, which matters if your n is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)

steps <- 30
n <- 100
directions <- c(-1, 1)

results <- vector('list', n)

for (i in seq_len(n)) {
  walk <- numeric(steps)
  for (s in seq_len(steps)) {
    walk[s] <- sample(directions, 1)
  }
  results[[i]] <- sum(walk)
}

We can rewrite this with one call to sample():

all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)

Proof of speed increase (n=10000):

> system.time({
+ for (i in seq_len(n)) {
+   walk <- numeric(steps)
+   for (s in seq_len(steps)) {
+     walk[s] <- sample(directions, 1)
+   }
+   results[[i]] <- sum(walk)
+ }})
   user  system elapsed 
  4.231   0.332   4.758 

> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
   user  system elapsed 
  0.010   0.001   0.012 

If your simulation needs just one random variable per simulation function call, use sapply(), or better yet the multicore package's mclapply(). Revolution Analytics's foreach package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).

Take home points:

  • Preallocate with numeric(n) or vector('list', n)
  • Push invariant code out of for loops. Cleverly push stochastic functions out of code with their n argument.
  • Try hard for sapply() or lapply(), or better yet mclapply.
  • Don't use x <- c(x, rnorm(100)). Every time you do this, a member of R-core kills a puppy.


Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.

n <- 10
start <- vector("list", n)
for (i in 1:n) {
  a[[i]] <- sample(10)
}
start <- unlist(start)

You could do it the old nasty way. This may be slow for larger vectors.

start <- c()
for (i in 1:n) {
  add <- sample(10)
  start <- c(start, add)
}


x <- rnorm(100)
for (i in 100) {
   x <- c(x, rnorm(100))
}

This link should be useful: http://www.milbo.users.sonic.net/ra/


Assuming your simulation function -- call it func -- returns a vector with the same length each time, you can store the results in the columns of a pre-allocated matrix:

sim1 <- function(reps, func) {
  first <- func()
  result <- matrix(first, nrow=length(first), ncol=reps)
  for (i in seq.int(from=2, to=reps - 1)) {
    result[, i] <- func()
  }
  return(as.vector(result))
}

Or you could express it as follows using replicate:

sim2 <- function(reps, func) {
  return(as.vector(replicate(reps, func(), simplify=TRUE)))
}

> sim2(3, function() 1:3)
[1] 1 2 3 1 2 3 1 2 3
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜