开发者

Substitute values with their mean in a data frame in R

I need to replace the values of the two replica (A and B) in a data frame, with their mean.

This is the data frame:

Sample.Name <- c("sample01","sample01","sample02","sample02","sample开发者_如何学Python03","sample03")
Rep <- c("A", "B", "A", "B", "A", "B")
Rep <- as.factor(Rep)
joy <- sample(1000:50000000, size=120, replace=TRUE)
values <- matrix(joy, nrow=6, ncol=20)
df.data <- cbind.data.frame(Sample.Name, Rep, values)
names(df.data)[-c(1:2)] <- paste("V", 1:20, sep="")

And this is the loop I tried to write to substitute the mean to the replica:

Sample <- as.factor(Sample.Name)
livelli <- levels(Sample)
for (i in (1:(length(livelli)))){
    estrai.replica <- which(df.data == livelli[i])
    media.replica <- apply(values[estrai.replica,], 2, mean)
    foo <- rbind(media.replica)
}

The main problems are:

  1. in this way I have only the last row in my new data frame (foo), and
  2. I haven't the name of the sample in any column.

Do you have any suggestion?


I think you want to aggregate your data frame. Try this:

aggregate(df.data, by=list(Sample.Name), FUN=mean)


Out of curiosity I tried a tapply based solution.

# Not correct: lapply(df.data[-(1:3)], tapply, INDEX=df.data$Sample.Name, FUN=mean)

It just needed as.data.frame to "clean it up".

# Not correct: as.data.frame(lapply(df.data[-(1:3)], tapply, INDEX=df.data$Sample.Name, FUN=mean))

EDIT: Like @daroczig I got an error complaining that the trim argument to mean.default is not of length 1. So adding further arguments form mean was attempted but only when I also changed to a two argument version of "[" did I succeed in satisfying the interpreter but still not getting the right grouping of the function application. This version does work:

as.data.frame(lapply(df.data[, 3:22], 
                       function(x) tapply(x, df.data$Sample.Name, FUN=mean))  )


A data.table solution for time and memory efficiency

library(data.table)
DT <- as.data.table(df.data)
DT[,lapply(.SD, mean),by = Sample.Name, .SDcols = paste0('V',1:20)]

Note that .SD is the subset for each group and .SDcols defines the columns in .SD to evaluate lapply upon.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜