recursive sampling in r
I´m trying to simulate death over 7 years with the cumulative probability as follows:
tab <- data.frame(id=1:1000,char=rnorm(1000,7,4))
cum.prob <- c(0.05,0.07,0.08,0.09,0.1,0.11,0.12)
How can I sample from tab$id
without replacement in a vectorized fashion according to the cumulative probability in cum.prob
? The ids sampled from yr 1 can necessarily not be sampled again in yr 2. Hence the lapply(cum.prob,function(x) sample(tab$id,x*1000))
will not work. Is it possible to vectorize this?
开发者_C百科//M
Here's one way: First get the probability of a given individual's dying in a given year as probYrDeath
, i.e. probYrDeath[i] = Prob( individual dies in year i )
, where i=1,2,...,7
.
probYrDeath <- c(diff(c(0,cum.prob)).
Now generate a random sample of 1000 "Death Years", with replacement, from the sequence 1:8, according to the probabilities in probYrDeath
, augmented by the probability of not dying by year 7:
set.seed(1) ## for reproducibility
tab$DeathYr <- sample( 8, 1000, replace = TRUE,
prob = c(probYrDeath, 1-sum(probYrDeath)))
We interpret "'DeathYr = 8'" as "not dying within 7 years", and extract the subset of tab
where DeathYr != 8
:
tab_sample <- subset(tab, DeathYr != 8 )
You can verify that the cumulative proportions of deaths in each year approximate the values in cum.prob
:
> cumsum(table(tab_sample$DeathYr)/1000)
1 2 3 4 5 6 7
0.045 0.071 0.080 0.094 0.105 0.115 0.124
Does this work for you:
prob.death.per.year<-c(1-cum.prob[length(cum.prob)], cum.prob - c(0, cum.prob[-length(cum.prob)]))
dead.in.years<-as.vector(rmultinom(1, length(tab$id),prob.death.per.year))[-1]
totsamp<-sum(dead.in.years)
data.frame(id=sample(tab$id, totsamp), dead.after=rep(seq_along(dead.in.years), dead.in.years))
Depending upon which form you want the result in, you can change the last step.
精彩评论