Sampling from a contingency table
I've managed as far as the code below in writing a function to sample from a contingency table - proportional to the frequencies in the cells.
It uses expand.grid
and then table
to get back to the original size table. Which works fine as long as the sample size is large enough that some categories are not completely missing. Otherwise the table
command returns a开发者_StackOverflow中文版 table that is of smaller dimensions than the original one.
FunSample<- function(Full, n) {
Frame <- expand.grid(lapply(dim(Full), seq))
table(Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
Full<-array(c(1,2,3,4), dim=c(2,2,2))
FunSample(Full, 100) # OK
FunSample(Full, 1) # not OK, I want it to still have dim=c(2,2,2)!
My brain has stopped working, I know it has to be a small tweak to get it back on track!?
A crosstab is also a multinomial distribution, so you can use rmultinom
and reset the dimension on the output. This should give a substantial performance boost and cut down on the code you need to maintain.
> X <- rmultinom(1, 500, Full)
> dim(X) <- dim(Full)
> X
, , 1
[,1] [,2]
[1,] 18 92
[2,] 45 92
, , 2
[,1] [,2]
[1,] 28 72
[2,] 49 104
> X2 <-rmultinom(1, 4, Full)
> dim(X2) <- dim(Full)
> X2
, , 1
[,1] [,2]
[1,] 0 1
[2,] 0 0
, , 2
[,1] [,2]
[1,] 0 1
[2,] 1 1
If you don't want table()
to "drop" missing combinations, you need to force the columns of Frame
to be factors:
FunSample <- function(Full, n) {
Frame <- as.data.frame( lapply( expand.grid(lapply(dim(Full), seq)), factor) )
table( Frame[sample(1:nrow(Frame), n, prob = Full, replace = TRUE), ])
}
> dim( FunSample(Full, 1))
[1] 2 2 2
> dim( FunSample(Full, 100))
[1] 2 2 2
You could use tabulate
instead of table
; it works on integer-valued vectors, as you have here. You could also get the output into an array by using array
directly, just like when you created the original data.
FunSample<- function(Full, n) {
samp <- sample(1:length(Full), n, prob = Full, replace = TRUE)
array(tabulate(samp), dim=dim(Full))
}
精彩评论