开发者

R: using ddply to apply functions to subsets of data

I'm trying to use the ddply method to take a dataframe with various info about 3000 movies and then calculate the mean gross of each genre. I'm new to R, and I've read all the questions on here relating to ddply, but I still can't se开发者_运维知识库em to get it right. Here's what I have now:

> attach(movies)
> ddply(movies, Genre, mean(Gross))
Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress,  : 
.fun is not a function.

How am I supposed to write a function that takes the mean of the values in the "Gross" column for each set of movies, grouped by genre? I know this seems like a simple question, but the documentation is really confusing to me, and I'm not too familiar with R syntax yet.

Is there a method other than ddply that would make this easier?

Thanks!!


Here is an example using the tips dataset available in ggplot2

library(ggplot2);
mean_tip_by_day = ddply(tips, .(day), summarize, mean_tip = mean(tip/total_bill))

Hope this is useful


You probably don't need plyr for a simple operation like that. tapply() does the job easily and you won't need to load additional packages. The syntax also seems simpler than Ramnath's:

tapply(tips$tip, tips$day, mean)

Note that plyr is a fantastic tool for many tasks. To me, it just seems like overkill here.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜