开发者

compute means of a group by factor

Is there a way that this can be improved, or done more simply?

means.by<-function(data,INDEX){
  b<-by(data,INDEX,function(d)apply(d,2,mean))
  return(structure(
    t(matrix(unlist(b),nrow=length(b[[1]]))),
      dimnames=list(names(b),col.names=names(b[[1]]))
  ))
}
开发者_运维百科

The idea is the same as a SAS MEANS BY statement. The function 'means.by' takes a data.frame and an indexing variable and computes the mean over the columns of the data.frame for each set of rows corresponding to the unique values of INDEX and returns a new data frame with with the row names the unique values of INDEX.

I'm sure there must be a better way to do this in R but I couldn't think of anything.


Does the aggregate function do what you want?

If not, look at the plyr package, it gives several options for taking things apart, doing computations on the pieces, then putting it back together again.

You may also be able to do this using the reshape package.


You want tapply or ave, depending on how you want your output:

> Data <- data.frame(grp=sample(letters[1:3],20,TRUE),x=rnorm(20))
> ave(Data$x, Data$grp)
 [1] -0.3258590 -0.5009832 -0.5009832 -0.2136670 -0.3258590 -0.5009832
 [7] -0.3258590 -0.2136670 -0.3258590 -0.2136670 -0.3258590 -0.3258590
[13] -0.3258590 -0.5009832 -0.2136670 -0.5009832 -0.3258590 -0.2136670
[19] -0.5009832 -0.2136670
> tapply(Data$x, Data$grp, mean)
         a          b          c 
-0.5009832 -0.2136670 -0.3258590 

# Example with more than one column:
> Data <- data.frame(grp=sample(letters[1:3],20,TRUE),x=rnorm(20),y=runif(20))
> do.call(rbind,lapply(split(Data[,-1], Data[,1]), mean))
             x         y
a -0.675195494 0.4772696
b  0.270891403 0.5091359
c  0.002756666 0.4053922


With plyr

library(plyr)
df <- ddply(x, .(id),function(x) data.frame(
mean=mean(x$var)
))
print(df)

Update:

data<-data.frame(I=as.factor(rep(letters[1:10],each=3)),x=rnorm(30),y=rbinom(30,5,.5))
ddply(data,.(I), function(x) data.frame(x=mean(x$x), y=mean(x$y)))

See, plyr is smart :)

Update 2:

In response to your comment, I believe cast and melt from the reshape package are much simpler for your purpose.

cast(melt(data),I ~ variable, mean)


Use only the generic function in R.

>d=data.frame(type=as.factor(rep(c("A","B","C"),each=3)),
x=rnorm(9),y=rgamma(9,2,1))
> d
type           x         y
1    A -1.18077326 3.1428680
2    A -0.91930418 4.4606603
3    A  0.88345422 1.0979301
4    B  0.06964133 1.1429911
5    B -1.15380345 2.7609049
6    B  1.13637202 0.6668986
7    C -1.12052765 1.7352306
8    C -1.34803630 2.3099202
9    C -2.23135374 0.7244689
>
> cbind(lm(x~-1+type,data=d)$coef,lm(y~-1+type,data=d)$coef)
         [,1]     [,2]
typeA -0.4055411 2.900486
typeB  0.0174033 1.523598
typeC -1.5666392 1.589873
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜