Calculate group mean, sum, or other summary stats. and assign column to original data
I want to calculate mean
(or any other summary statistics of length one, e.g. min
, max
, length
, sum
) of a numeric variable ("value") within each level of a grouping variable ("grou开发者_开发技巧p").
The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean
:
Before
id group value
1 a 10
2 a 20
3 b 100
4 b 200
After
id group value grp.mean.values
1 a 10 15
2 a 20 15
3 b 100 150
4 b 200 150
You may do this in dplyr
using mutate
:
library(dplyr)
df %>%
group_by(group) %>%
mutate(grp.mean.values = mean(value))
...or use data.table
to assign the new column by reference (:=
):
library(data.table)
setDT(df)[ , grp.mean.values := mean(value), by = group]
Have a look at the ave
function. Something like
df$grp.mean.values <- ave(df$value, df$group)
If you want to use ave
to calculate something else per group, you need to specify FUN = your-desired-function
, e.g. FUN = min
:
df$grp.min <- ave(df$value, df$group, FUN = min)
One option is to use plyr
. ddply
expects a data.frame
(the first d) and returns a data.frame
(the second d). Other XXply functions work in a similar way; i.e. ldply
expects a list
and returns a data.frame
, dlply
does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.
require(plyr)
ddply(dat, "group", transform, grp.mean.values = mean(value))
id group value grp.mean.values
1 1 a 10 15
2 2 a 20 15
3 3 b 100 150
4 4 b 200 150
Here is another option using base functions aggregate
and merge
:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", "mean"))
group id value.x value.y
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150
You can get "better" column names with suffixes
:
merge(x, aggregate(value ~ group, data = x, mean),
by = "group", suffixes = c("", ".mean"))
group id value value.mean
1 a 1 10 15
2 a 2 20 15
3 b 3 100 150
4 b 4 200 150
精彩评论