Convert a "by" object to a data frame in R
I'm using the "by" function in R to chop up a data frame and apply a function to different parts, like this:
pairwise.compare <- function(x) {
Nright <- ...
Nwrong <- ...
Ntied <- ...
return(c(Nright=Nright, Nwrong=Nwrong, Ntied=Ntied))
}
Z.by <- by(rankings, INDICES=list(rankings$Rater, rankings$Class), FUN=pairwise.compare)
The result (Z.by) looks something like this:
: 4
: 357
Nright Nwrong Ntied
3 0 0
------------------------------------------------------------
: 8
: 357
NULL
------------------------------------------------------------
: 10
: 470
Nright Nwrong Ntied
3 4 1
----开发者_StackOverflow社区--------------------------------------------------------
: 11
: 470
Nright Nwrong Ntied
12 4 1
What I would like is to have this result converted into a data frame (with the NULL entries not present) so it looks like this:
Rater Class Nright Nwrong Ntied
1 4 357 3 0 0
2 10 470 3 4 1
3 11 470 12 4 1
How do I do that?
The by
function returns a list, so you can do something like this:
data.frame(do.call("rbind", by(x, column, mean)))
Consider using ddply in the plyr package instead of by. It handles the work of adding the column to your dataframe.
Old thread, but for anyone who searches for this topic:
analysis = by(...)
data.frame(t(vapply(analysis,unlist,unlist(analysis[[1]]))))
unlist()
will take an element of a by()
output (in this case, analysis
) and express it as a named vector.
vapply()
does unlist to all the elemnts of analysis
and outputs the result. It requires a dummy argument to know the output type, which is what analysis[[1]]
is there for. You may need to add a check that analysis is not empty if that will be possible.
Each output will be a column, so t()
transposes it to the desired orientation where each analysis entry becomes a row.
This expands upon Shane's solution of using rbind() but also adds columns identifying groups and removes NULL groups - two features which were requested in the question. By using base package functions, no other dependencies are required, e.g., plyr.
simplify_by_output = function(by_output) {
null_ind = unlist(lapply(by_output, is.null)) # by() returns NULL for combinations of grouping variables for which there are no data. rbind() ignores those, so you have to keep track of them.
by_df = do.call(rbind, by_output) # Combine the results into a data frame.
return(cbind(expand.grid(dimnames(by_output))[!null_ind, ], by_df)) # Add columns identifying groups, discarding names of groups for which no data exist.
}
I would do
x = by(data, list(data$x, data$y), function(d) whatever(d))
array(x, dim(x), dimnames(x))
精彩评论