Margin totals in xtabs
If you have 2 cross classifying variables you can use rowSums
and colSums
to produce margin totals on an xtabs
output. But how can it be done if you have 3 classify开发者_开发技巧ing variables (ie margin totals in each sub table)?
Aniko mentioned this in a comment, but it was never provided as an answer.
I found this independently and then noticed it was here in a comment, so credit to Aniko for getting it first.
addmargins
is the answer:
For a given table one can specify which of the classifying factors to expand by one or more levels to hold margins to be calculated. One may for example form sums and means over the first dimension and medians over the second. The resulting table will then have two extra levels for the first dimension and one extra level for the second. The default is to sum over all margins in the table. Other possibilities may give results that depend on the order in which the margins are computed. This is flagged in the printed output from the function.
The general approach is to use the apply
function, but specifically for totals the margin.table
function might be more convenient:
#create 3 factors
a <- gl(2,4, length=20)
b <- gl(3,2, length=20)
d <- gl(4,2, length=20)
# table
tt <- xtabs(~a+b+d)
# marginal sums
margin.table(tt, 1)
apply(tt, 1, sum) #same answer
#multi-way margins
margin.table(tt, 1:2)
apply(tt, 1:2, sum) #same answer
If you are not tied to xtabs, the Deducer package has some nice functions for contingency tables:
> a <- gl(2,4, length=20)
> b <- gl(3,2, length=20)
> d <- rnorm(20)>0
> dat <- data.frame(a,b,d)
> tables<-contingency.tables(
+ row.vars=a,
+ col.vars=b,
+ stratum.var=d,data=dat)
> tables
================================================================================
==================================================
========== Table: a by b ==========
| -- Stratum = FALSE --
| b
a | 1 | 2 | 3 | Row Total |
-----------------------|-----------|-----------|-----------|-----------|
1 Count | 2 | 2 | 1 | 5 |
Row % | 40.000% | 40.000% | 20.000% | 55.556% |
Column % | 40.000% | 100.000% | 50.000% | |
Total % | 22.222% | 22.222% | 11.111% | |
-----------------------|-----------|-----------|-----------|-----------|
2 Count | 3 | 0 | 1 | 4 |
Row % | 75.000% | 0.000% | 25.000% | 44.444% |
Column % | 60.000% | 0.000% | 50.000% | |
Total % | 33.333% | 0.000% | 11.111% | |
-----------------------|-----------|-----------|-----------|-----------|
Column Total | 5 | 2 | 2 | 9 |
Column % | 55.556% | 22.222% | 22.222% | |
| -- Stratum = TRUE --
| b
a | 1 | 2 | 3 | Row Total |
-----------------------|-----------|-----------|-----------|-----------|
1 Count | 2 | 2 | 3 | 7 |
Row % | 28.571% | 28.571% | 42.857% | 63.636% |
Column % | 66.667% | 50.000% | 75.000% | |
Total % | 18.182% | 18.182% | 27.273% | |
-----------------------|-----------|-----------|-----------|-----------|
2 Count | 1 | 2 | 1 | 4 |
Row % | 25.000% | 50.000% | 25.000% | 36.364% |
Column % | 33.333% | 50.000% | 25.000% | |
Total % | 9.091% | 18.182% | 9.091% | |
-----------------------|-----------|-----------|-----------|-----------|
Column Total | 3 | 4 | 4 | 11 |
Column % | 27.273% | 36.364% | 36.364% | |
================================================================================
(if I understand correctly) You could use ddply:
ff <- data.frame(f1=c("a", "b", "b", "b", "b", "b", "b"), f2=c("p", "p", "p", "q", "q", "q", "q"), f3=c("x","x","x","x","y", "y", "y"), val=c(1:7))
ddply(ff, .(f1), numcolwise(sum))
ddply(ff, .(f2), numcolwise(sum))
ddply(ff, .(f3), numcolwise(sum))
Comments aren't working above. Thanks for the answers, but they didn't do what I was expecting - individual totals in each subgrouping.
After a little digging around, I found that the xtabs output in this case is a 3 dimensional array, and wrote the following function to achieve my desired result (note its incomplete, but works for column totals so far):
xtabTotals <- function(tabs,margin=1)
# takes a 3 dimensional xtabs array and performs margin total on each sub table
# only doing column margins so far
{
out <- array(0,dim(tabs)+c(1,0,0))
dnout <- dimnames(tabs)
dnout[[1]] <- c(dnout[[1]],"Total")
dimnames(out) <- dnout
for (i in 1:dim(tabs)[3])
{
out[,,i] <- rbind(tabs[,,i],colSums(tabs[,,i]))
}
out
}
精彩评论