compute means of a group by factor
Is there a way that this can be improved, or done more simply?
means.by<-function(data,INDEX){
b<-by(data,INDEX,function(d)apply(d,2,mean))
return(structure(
t(matrix(unlist(b),nrow=length(b[[1]]))),
dimnames=list(names(b),col.names=names(b[[1]]))
))
}
开发者_运维百科
The idea is the same as a SAS MEANS BY statement. The function 'means.by' takes a data.frame and an indexing variable and computes the mean over the columns of the data.frame for each set of rows corresponding to the unique values of INDEX and returns a new data frame with with the row names the unique values of INDEX.
I'm sure there must be a better way to do this in R but I couldn't think of anything.
Does the aggregate function do what you want?
If not, look at the plyr package, it gives several options for taking things apart, doing computations on the pieces, then putting it back together again.
You may also be able to do this using the reshape package.
You want tapply
or ave
, depending on how you want your output:
> Data <- data.frame(grp=sample(letters[1:3],20,TRUE),x=rnorm(20))
> ave(Data$x, Data$grp)
[1] -0.3258590 -0.5009832 -0.5009832 -0.2136670 -0.3258590 -0.5009832
[7] -0.3258590 -0.2136670 -0.3258590 -0.2136670 -0.3258590 -0.3258590
[13] -0.3258590 -0.5009832 -0.2136670 -0.5009832 -0.3258590 -0.2136670
[19] -0.5009832 -0.2136670
> tapply(Data$x, Data$grp, mean)
a b c
-0.5009832 -0.2136670 -0.3258590
# Example with more than one column:
> Data <- data.frame(grp=sample(letters[1:3],20,TRUE),x=rnorm(20),y=runif(20))
> do.call(rbind,lapply(split(Data[,-1], Data[,1]), mean))
x y
a -0.675195494 0.4772696
b 0.270891403 0.5091359
c 0.002756666 0.4053922
With plyr
library(plyr)
df <- ddply(x, .(id),function(x) data.frame(
mean=mean(x$var)
))
print(df)
Update:
data<-data.frame(I=as.factor(rep(letters[1:10],each=3)),x=rnorm(30),y=rbinom(30,5,.5))
ddply(data,.(I), function(x) data.frame(x=mean(x$x), y=mean(x$y)))
See, plyr
is smart :)
Update 2:
In response to your comment, I believe cast and melt from the reshape package are much simpler for your purpose.
cast(melt(data),I ~ variable, mean)
Use only the generic function in R.
>d=data.frame(type=as.factor(rep(c("A","B","C"),each=3)),
x=rnorm(9),y=rgamma(9,2,1))
> d
type x y
1 A -1.18077326 3.1428680
2 A -0.91930418 4.4606603
3 A 0.88345422 1.0979301
4 B 0.06964133 1.1429911
5 B -1.15380345 2.7609049
6 B 1.13637202 0.6668986
7 C -1.12052765 1.7352306
8 C -1.34803630 2.3099202
9 C -2.23135374 0.7244689
>
> cbind(lm(x~-1+type,data=d)$coef,lm(y~-1+type,data=d)$coef)
[,1] [,2]
typeA -0.4055411 2.900486
typeB 0.0174033 1.523598
typeC -1.5666392 1.589873
精彩评论