r equivalent of group by with cube

2023-02-28 04:49 问答作者：

Some sql databases support a with cube modifier to group by operations. Mine doesn't have this feature.

Basically if I have a dataset like:

+------+-----------+---------+---------+
| sum  | source_id | type_id | variety |
+------+-----------+---------+---------+
|  491 |         1 |       1 |       1 |
| 2008 |         1 |       2 |       1 |
|   33 |         1 |       3 |       1 |
|  483 |         1 |       4 |       1 |
|  482 |         1 |       5 |       1 |
|  343 |         1 |       6 |       1 |
| 4979 |         4 |       5 |       1 |
|  303 |         5 |       1 |       1 |
|  443 |         5 |       1 |       2 |
| 1295 |         5 |       2 |       1 |
...

I want to import this into a data frame in r and generate t开发者_StackOverflow社区he combined sum for all sub-permutations of (source_id, type_id, and variety). So, the combined sum where source_id=1, where source_id=1 and type_id=1, where source_id=1 and variety=1, where type_id=1 and variety=1, where type_id=1, where source_id=2, and so on.

How can I best accomplish this?

You can use ddply for this, and input a list with the possible combinations, like this :

facs <- c("source_id","type_id","variety")

combs <-  unlist(
            mapply(function(j)combn(facs,j,simplify=F),1:3)
          ,recursive=F)

require(plyr)
datlist <- mapply(function(j)ddply(Data,j,summarize,sum(Sum)),combs)

require(reshape)
rbind.fill(datlist)

Tested with :

Data <- data.frame(
  Sum=rpois(10,5),
  source_id=rep(1:2,each=5),
  type_id=rep(1:5,each=2),
  variety=rep(1:2,5)
)

This should do it

# generate dummy data

df = data.frame(
       Sum = rnorm(10), 
       source_id = sample(10, 5, replace = T), 
       type_id   = sample(10, 5, replace = T), 
       variety   = sample(10, 5, replace = T)
     )

index = names(df)[-1]
temp  = expand.grid(0:1, 0:1, 0:1)[-1,]

require(plyr)
cubedf = adply(temp, 1, function(x) 
   ddply(df, index[x == 1], summarize, SUM = sum(Sum)))

EDIT: ALTERNATE SOLUTION (using code borrowed from Joris)

library(plyr)
# list factor variables
index  = names(df)[-1]

# generate all combinations of factor variables
combs  = unlist(llply(1:3, combn, x = index, simplify = F), recursive = F)

# calculate sum for each combination
cubedf = ldply(combs, function(var) 
            ddply(df, var, summarize, SUM = sum(Sum)))

Joris's Answer is right. But I must admit that it's not intuitive to me at first blush. Prior to reading his answer, I would have solved this with multiple ddply() steps. Something like this:

Data <- data.frame(
  Sum=rpois(10,5),
  source_id=rep(1:2,each=5),
  type_id=rep(1:5,each=2),
  variety=rep(1:2,5)
)

require(plyr)

myStuff1 <- ddply(Data, c("source_id"                      ), function(df) sum(df$Sum) )
myStuff2 <- ddply(Data, c("source_id", "type_id"           ), function(df) sum(df$Sum) )
myStuff3 <- ddply(Data, c("source_id", "type_id", "variety"), function(df) sum(df$Sum) )

继续阅读：r

r equivalent of group by with cube

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？