开发者

Calculating percent of row total with plyr

I am currently using cast on a melted table to calculate the total of each value at the combination of ID variables ID1 (row names) and ID2 (column headers), along with grand totals for each row using margins="grand_col".

c <- cast(m, ID1 ~ ID2, sum, margins="grand_col")

  ID1      ID2a  ID2b     ID2c     ID2d   ID2e    (all)
1  ID1a  6459695  885473  648019  453613 1777308 10224108
2  ID1b  7263529 1411355  587785  612730 2458672 12334071
3  ID1c  7740364 1253524  682977  886897 3559283 14123045

So far, so R-like.

Then I divide each cell by its row 开发者_Go百科total to get a percentage of the total.

c[,2:6]<-c[,2:6] / c[,7]

This looks kludgy. Is there something I should be doing in cast or maybe in plyr to handle the percent of margin calculation in the first command?

Thanks, Matt


Assuming your source table looks something like this:

dfm <- structure(list(ID1 = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("ID1a", "ID1b", "ID1c"
), class = "factor"), ID2 = structure(c(1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L), .Label = c("ID2a", 
"ID2b", "ID2c", "ID2d", "ID2e"), class = "factor"), value = c(6459695L, 
7263529L, 7740364L, 885473L, 1411355L, 1253524L, 648019L, 587785L, 
682977L, 453613L, 612730L, 886897L, 1777308L, 2458672L, 3559283L
)), .Names = c("ID1", "ID2", "value"), row.names = c(NA, 
-15L), class = "data.frame")

> head(dfm)
   ID1  ID2   value
1 ID1a ID2a 6459695
2 ID1b ID2a 7263529
3 ID1c ID2a 7740364
4 ID1a ID2b  885473
5 ID1b ID2b 1411355
6 ID1c ID2b 1253524

Using ddply first to calculate the percentages, and cast to present the data in the required format

library(reshape)
library(plyr)

df1 <- ddply(dfm, .(ID1), summarise, ID2 = ID2, pct = value / sum(value))
dfc <- cast(df1, ID1 ~ ID2)

dfc
   ID1      ID2a       ID2b       ID2c       ID2d      ID2e
1 ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
2 ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
3 ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195

Compared to your example, this is missing the row totals, these need to be added separately.

Not sure though, whether this solution is more elegant than the one you currently have.


Here is a one-liner using tapply and prop.table. It does not rely on any auxilliary packages:

prop.table(tapply(dfm$value, dfm[1:2], sum), 1)

giving:

      ID2
ID1         ID2a       ID2b       ID2c       ID2d      ID2e
  ID1a 0.6318101 0.08660638 0.06338147 0.04436700 0.1738350
  ID1b 0.5888996 0.11442735 0.04765539 0.04967784 0.1993399
  ID1c 0.5480662 0.08875735 0.04835905 0.06279786 0.2520195

or this which is even shorter:

prop.table( xtabs(value ~., dfm), 1 )
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜