开发者

Sampling from a contingency table

I've managed as far as the code below in writing a function to sample from a contingency table - proportional to the frequencies in the cells.

It uses expand.grid and then table to get back to the original size table. Which works fine as long as the sample size is large enough that some categories are not completely missing. Otherwise the table command returns a开发者_StackOverflow中文版 table that is of smaller dimensions than the original one.

FunSample<- function(Full, n) {
  Frame <- expand.grid(lapply(dim(Full), seq))
  table(Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
Full<-array(c(1,2,3,4), dim=c(2,2,2))
FunSample(Full, 100) # OK
FunSample(Full, 1) # not OK, I want it to still have dim=c(2,2,2)!

My brain has stopped working, I know it has to be a small tweak to get it back on track!?


A crosstab is also a multinomial distribution, so you can use rmultinom and reset the dimension on the output. This should give a substantial performance boost and cut down on the code you need to maintain.

> X <- rmultinom(1, 500, Full)
> dim(X) <- dim(Full)
> X
, , 1

     [,1] [,2]
[1,]   18   92
[2,]   45   92

, , 2

     [,1] [,2]
[1,]   28   72
[2,]   49  104

> X2 <-rmultinom(1, 4, Full)
> dim(X2) <- dim(Full)
> X2
, , 1

     [,1] [,2]
[1,]    0    1
[2,]    0    0

, , 2

     [,1] [,2]
[1,]    0    1
[2,]    1    1


If you don't want table() to "drop" missing combinations, you need to force the columns of Frame to be factors:

FunSample <- function(Full, n) {
  Frame <- as.data.frame( lapply( expand.grid(lapply(dim(Full), seq)), factor) )  
  table( Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}   

> dim( FunSample(Full, 1))
[1] 2 2 2
> dim( FunSample(Full, 100))
[1] 2 2 2


You could use tabulate instead of table; it works on integer-valued vectors, as you have here. You could also get the output into an array by using array directly, just like when you created the original data.

FunSample<- function(Full, n) {
  samp <- sample(1:length(Full), n, prob = Full, replace = TRUE)
  array(tabulate(samp), dim=dim(Full))
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜