开发者

ddply aggregated column names

I am using ddply to aggregate my data but haven't found an elegant way to assign colum开发者_运维知识库n names to the output data frame.

At the moment I am doing this:

agg_data <- ddply(raw_data, .(id, date, classification), nrow)
names(agg_data)[4] <- "no_entries"

and this

agg_data <- ddply(agg_data, .(classification, date), colwise(mean, .(no_entries)) )
names(agg_data)[3] <- "avg_no_entries"

Is there a better, more elegant way to do this?


The generic form I use a lot is:

 ddply(raw_data, .(id, date, classification), function(x) data.frame( no_entries=nrow(x) )

I use anonymous functions in my ddply statements almost all the time so the above idiom meshes well with anonymous functions. This is not the most concise way to express a function like nrow() but with functions where I pass multiple arguments, I like it a lot.


You can use summarise:

agg_data <- ddply(raw_data, .(id, date, classification), summarise, "no_entries" = nrow(piece))

or you can use length(<column_name>) if nrow(piece) doesn't work. For instance, here's an example that should be runnable by anyone:

ddply(baseball, .(year), summarise, newColumn = nrow(piece))

or

ddply(baseball, .(year), summarise, newColumn = length(year))

EDIT

Or as Joshua comments, the all caps version, NROW does the checking for you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜