Modified Bootstrapping

2023-02-21 02:56 问答作者：

I'm interested in developing a modified bootstrap that samples some vector of length x, with replacement, but must meet a number of number of criteria before stopping the sampling. I'm attempting to calculate confidence intervals for lambda of a populations growth rate, 10000 iterations, but in some groupings of individuals, say vector 13, there are very few individuals growing out of the group. Typical bootstrapping would lead to a fair number instances where growth in this vector does not occur and hence the model falls apart. Each vector consists of a certain number of 1's, 2's, and 3's where 1 represents staying within a group, 2 growing out of a group, and 3 death. Here is what I have so far without the modification, it is likely not the best approach time wise, but I am new to R.

st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,  
          1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
#runs
n <- 10000
stage <- st13
stagestay <- vector()
stagemoved <- vector(开发者_StackOverflow社区)
stagedead <- vector()
for(i in 1:n){
      index <- sample(stage, replace=T)
      stay <- ((length(index[index==1]))/(length(index)))
      moved <- ((length(index[index==2]))/(length(index)))
      stagestay <- rbind(stagestay,stay)
      stagemoved <- rbind(stagemoved,moved)
}

Currently, this samples My question is then: In what way can I modify the sample function to continue sampling these numbers until the length of "index" is at least the same as st13 AND until at least 1 instance of a 2 is present in "index"?

Thanks very much, Kristopher Hennig Masters Student University of Mississippi Oxford, MS, 38677

Update: The answer from @lselzer reminded me that the requirement was for the length of the sample to be at least as long as st13. My code above just keeps sampling until it finds a bootstrap sample that contains a 2. The code of @lselzer grows the sample, 1 new index at a time, until the sample contains a 2. This is quite inefficient as you might have to call sample() many times till you get 2. My code might repeat a long time before a 2 is returned in the sample. So can we do any better?

One way would be to sample a large sample with replacement using a single call to sample(). Check which are 2s and see if there is a 2 within the first length(st13) entries. If there is, return those entries, if not, find the first 2 in the large sample and return all entries up to an including that one. If there are no 2s, add on another large sample and repeat. Here is some code:

#runs
n <- 100 #00
stage <- st13
stagedead <- stagemoved <- stagestay <- Size <- vector()
sampSize <- 100 * (len <- length(stage)) ## sample size to try
for(i in seq_len(n)){
    ## take a large sample
    samp <- sample(stage, size = sampSize, replace = TRUE)
    ## check if there are any `2`s and which they are
    ## and if no 2s expand the sample
    while(length((twos <- which(samp == 2))) < 1) {
        samp <- c(samp, sample(stage, size = sampSize, replace = TRUE))
    }
    ## now we have a sample containing at least one 2
    ## so set index to the required set of elements
    if((min.two <- min(twos)) <= len) {
        index <- samp[seq_len(len)]
    } else {
        index <- samp[seq_len(min.two)]
    }
    stay <- length(index[index==1]) / length(index)
    moved <- length(index[index==2]) / length(index)
    stagestay[i] <- stay
    stagemoved[i] <- moved
    Size[i] <- length(index)
}

Here is a really degenerate vector with only a single 2 in 46 entries:

R> st14 <- sample(c(rep(1, 45), 2))
R> st14
 [1] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[39] 1 1 1 1 1 1 1 1

If I use the above loop on it rather than st13, I get the following for the minimum sample size required to get a 2 on each of the 100 runs:

R> Size
  [1]  65  46  46  46  75  46  46  57  46 106  46  46  46  66  46  46  46  46
 [19]  46  46  46  46  46 279  52  46  63  70  46  46  90 107  46  46  46  87
 [37] 130  46  46  46  46  46  46  60  46 167  46  46  46  71  77  46  46  84
 [55]  58  90 112  52  46  53  85  46  59 302 108  46  46  46  46  46 174  46
 [73] 165 103  46 110  46  80  46 166  46  46  46  65  46  46  46 286  71  46
 [91] 131  61  46  46 141  46  46  53  47  83

So it would suggest that the sampSize I chose (100 * length(stage)) is a bit of overkill here but as all the operators we are using are vectorised we probably don't incur to much penalty for the overly long initial sample size, and we certainly don't incur any extra sample() calls.

Original: If I understand you correctly, the problem is that sample() might not return any 2 indicies at all. If so, we can continue sampling until it does using the repeat control flow construct.

I've altered your code accordingly, and optimised it a bit because you never grow objects in a loop like you were doing. There are other ways this could be improved, but I'll stick with the loop for now. Explanation comes below.

st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,  
          1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
#runs
n <- 10000
stage <- st13
stagedead <- stagemoved <- stagestay <- vector()
for(i in seq_len(n)){
    repeat {
        index <- sample(stage, replace = TRUE)
        if(any(index == 2)) {
            break
        }
    }
    stay <- length(index[index==1]) / length(index)
    moved <- length(index[index==2]) / length(index)
    stagestay[i] <- stay
    stagemoved[i] <- moved
}

This is the main change related to your Q:

    repeat {
        index <- sample(stage, replace = TRUE)
        if(any(index == 2)) {
            break
        }
    }

what this does is repeat the code contained in the braces until a break is triggered to jump us out of the repeat loop. So what happens is we take a bootstrap sample, then check if any of the sample contains the index 2. If there are any 2s then we break out and carry on with the rest of the current for loop iteration. If the sample doesn't contain any 2s, the break is not triggered and we go round again taking another sample. This will happen until we do get a sample with a 2 in it.

For starters, sample has a size argument which you could use to match the length of st13. The second part of your question could be solved using a while loop.

st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,  
          1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
    #runs
    n <- 10000
    stage <- st13
    stagestay <- vector()
    stagemoved <- vector()
    stagedead <- vector()
    for(i in 1:n){
          index <- sample(stage, length(stage), replace=T)
          while(!any(index == 2)) {
            index <- c(index, sample(stage, 1, replace = T))
          }
          stay <- ((length(index[index==1]))/(length(index)))
          moved <- ((length(index[index==2]))/(length(index)))
          stagestay[i] <- stay
          stagemoved[i] <- moved
    }

While I was writing this Gavin posted his answer which is similar to mine, but I added the size argument to be sure index has at least the lenght of st13

继续阅读：matrix modeling r sample statistics-bootstrap

Modified Bootstrapping

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？