Calculating percent of row total with plyr
I am currently using cast
on a melted table to calculate the total of each value at the combination of ID variables ID1 (row names) and ID2 (column headers), along with grand totals for each row using margins="grand_col"
.
c <- cast(m, ID1 ~ ID2, sum, margins="grand_col")
ID1 ID2a ID2b ID2c ID2d ID2e (all)
1 ID1a 6459695 885473 648019 453613 1777308 10224108
2 ID1b 7263529 1411355 587785 612730 2458672 12334071
3 ID1c 7740364 1253524 682977 886897 3559283 14123045
So far, so R-like.
Then I divide each cell by its row 开发者_Go百科total to get a percentage of the total.
c[,2:6]<-c[,2:6] / c[,7]
This looks kludgy. Is there something I should be doing in cast
or maybe in plyr
to handle the percent of margin calculation in the first command?
Thanks, Matt
Assuming your source table looks something like this:
dfm <- structure(list(ID1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("ID1a", "ID1b", "ID1c"
), class = "factor"), ID2 = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("ID2a",
"ID2b", "ID2c", "ID2d", "ID2e"), class = "factor"), value = c(6459695L,
7263529L, 7740364L, 885473L, 1411355L, 1253524L, 648019L, 587785L,
682977L, 453613L, 612730L, 886897L, 1777308L, 2458672L, 3559283L
)), .Names = c("ID1", "ID2", "value"), row.names = c(NA,
-15L), class = "data.frame")
> head(dfm)
ID1 ID2 value
1 ID1a ID2a 6459695
2 ID1b ID2a 7263529
3 ID1c ID2a 7740364
4 ID1a ID2b 885473
5 ID1b ID2b 1411355
6 ID1c ID2b 1253524
Using ddply
first to calculate the percentages, and cast
to present the data in the required format
library(reshape)
library(plyr)
df1 <- ddply(dfm, .(ID1), summarise, ID2 = ID2, pct = value / sum(value))
dfc <- cast(df1, ID1 ~ ID2)
dfc
ID1 ID2a ID2b ID2c ID2d ID2e
1 ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
2 ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
3 ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195
Compared to your example, this is missing the row totals, these need to be added separately.
Not sure though, whether this solution is more elegant than the one you currently have.
Here is a one-liner using tapply
and prop.table
. It does not rely on any auxilliary packages:
prop.table(tapply(dfm$value, dfm[1:2], sum), 1)
giving:
ID2
ID1 ID2a ID2b ID2c ID2d ID2e
ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195
or this which is even shorter:
prop.table( xtabs(value ~., dfm), 1 )
精彩评论