Calculate means of rows

2023-01-27 13:29 问答作者：

I have a dataframe called ants detailing multiple entries per site, looks like this:

  Site     Date     Time  Temp SpCond Salinity Depth Turbidity Chlorophyll
1   71 6/8/2010 14:50:35 14.32  49.88    32.66 0.397       0.0         1.3
2   71 6/8/2010 14:51:00 14.31  49.94    32.70 1.073       0.0         2.0
3   71 6/8/2010 14:51:16 14.32  49.95    32.71 1.034      -0.1         1.6
4   71 6/8/2010 14:51:29 14.31  49.96    32.71 1.030      -0.2         1.6
5   70 6/8/2010 14:53:55 14.30  50.04    32.77 1.002      -0.2         1.2
6   70 6/8/2010 14:54:09 14.30  50.03    32.77 0.993      -0.5         1.2

Sites have different numbers of entries, usually 3 but sometimes less or more. Where both date and site number match I would like to write a new dataframe with one entry per site detailing the average/mean readings for each parameter. I would like empty or "na" cells to be omitted from the calculation and 开发者_如何学JAVAsubsequent dataframe.

I'm not sure if this is an apply function or a version of rowMeans maybe? Very stuck, any help much appreciated!

Nico's answer looked like mine would have except that I would have added a named argument to be passed to mean() so that the NA's (in the aggregated columns) would not sabotage the results. (I could not tell whether the OP was asking that NA's in the by variables or in the otehr variables were known or suspected of having NA's) :

aggregate(df, by=list(df$Site, df$Date), FUN=mean, na.rm=TRUE)

You would probably need to also run aggregate or tapply calls in parallel to count the number of non-NA values.

The other method using aggregate's formula method might be different since na.action=na.omit is the default:

aggregate( . ~Site +Date, data=df,  FUN=mean, na.rm=TRUE)

Here is one way using the plyr package and its ddply() function:

R> df
  Site     Date     Time  Temp SpCond Salinity Depth Turbidity Chlorophyll
1   71 6/8/2010 14:50:35 14.32  49.88    32.66 0.397       0.0         1.3
2   71 6/8/2010 14:51:00 14.31  49.94    32.70 1.073       0.0         2.0
3   71 6/8/2010 14:51:16 14.32  49.95    32.71 1.034      -0.1         1.6
4   71 6/8/2010 14:51:29 14.31  49.96    32.71 1.030      -0.2         1.6
5   70 6/8/2010 14:53:55 14.30  50.04    32.77 1.002      -0.2         1.2
6   70 6/8/2010 14:54:09 14.30  50.03    32.77 0.993      -0.5         1.2
R> library(plyr)
R> ddply(df, .(Site,Date), function(x) mean(x[,-(1:3)], na.rm=TRUE))
  Site     Date   Temp SpCond Salinity  Depth Turbidity Chlorophyll
1   70 6/8/2010 14.300 50.035   32.770 0.9975    -0.350       1.200
2   71 6/8/2010 14.315 49.933   32.695 0.8835    -0.075       1.625
R>

I used a custom anonymous function to skip the first three columns.

You can also use aggregate

aggregate(df, by=list(df$Site, df$Date), FUN=mean, na.rm=TRUE)

Here is a complete new answer with a full log also covering your new specification:

R> Lines <- "  Site     Date     Time  Temp SpCond Salinity Depth Turbidity Chlorophyll
+ 71 6/8/2010 14:50:35 14.32  49.88    32.66 0.397       0.0         1.3
+ 71 6/8/2010 14:51:00 14.31  49.94    32.70 1.073       0.0         2.0
+ 71 6/8/2010 14:51:16 14.32  49.95    32.71 1.034      -0.1         1.6
+ 71 6/8/2010 14:51:29 14.31  49.96    32.71 1.030      -0.2         1.6
+ 70 6/8/2010 14:53:55 14.30  50.04    32.77 1.002      -0.2         1.2
+ 70 6/8/2010 14:54:09 14.30  50.03    32.77 0.993      -0.5         1.2
+ "
R> con <- textConnection(Lines)
R> df <- read.table(con, sep="", header=TRUE, stringsAsFactors=FALSE)
R> close(con)
R> df$pt <- as.POSIXct(strptime(paste(df$Date, df$Time), "%m/%d/%Y %H:%M:%S"))
R> library(plyr)
R> newdf <- ddply(df, .(Site,Date), function(x) mean(x[,-(1:3)], na.rm=TRUE))
R> newdf$pt <- as.POSIXct(newdf$pt, origin="1970-01-01")
R> newdf
  Site     Date  Temp SpCond Salinity  Depth Turbidity Chlorophyll                  pt
1   70 6/8/2010 14.30  50.03    32.77 0.9975    -0.350       1.200 2010-06-08 20:54:02
2   71 6/8/2010 14.32  49.93    32.70 0.8835    -0.075       1.625 2010-06-08 20:51:05
R>

You were close with rowMeans(), but you need colMeans() instead. The others have shown how to use built-in or add-on functionality and I would certainly recommend you use them. However, it might be useful to see how to do something like this by hand:

## using df from Dirk's answer, we split the data in Site Date combinations
df.sp <- with(df,
              split(data.frame(Temp, SpCond, Salinity, Depth, Turbidity,
                               Chlorophyll),
                    list(Site = Site, Date = Date)))
## The above gives  a list of data frames one per date-site combo,
## to which we apply the colMeans() function
df.mean <- data.frame(t(sapply(df.sp, colMeans)))

At this point we need to do some extra tidying if you want the output to be nice like the others' answers:

## Process the rownames on df.mean
name.parts <- strsplit(rownames(df.mean), "\\.")
## pull out the Site part (before the '.')
df.mean <- within(df.mean, Site <- as.numeric(sapply(name.parts, `[`, 1)))
## pull out the Date part (after the '.')
df.mean <- within(df.mean, Date <- sapply(name.parts, `[`, 2))
## rearrange the columns
df.mean <- df.mean[, c(7:8,1:6)]

Note again, for most cases you should use the canned functions as described by the other answers. Sometimes it might be quicker to cook your own solution however, and the above might act as a guide to achieving this.

继续阅读：average dataframe mean r row

Calculate means of rows

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？