break dataframe into subsets by factor values, send to function that returns glm class, how to recombine?
Thanks to Hadley's plyr package ddply function we can take a dataframe, break it down into subdataframes by factors, send each to a function, and then combine the function results for each subdataframe into a new dataframe.
But what if the function returns an object of a class like glm or in my case, a c("glm", "lm"). Then, these can't be combined into a dataframe can they? I get this error instead
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class 'c("glm", "lm")' into a data.frame
Is there some more flexible data st开发者_开发知识库ructure that will accommodate all the complex glm class results of my function calls, preserving the information regarding the dataframe subsets?
Or should this be done in an entirely different way?
Just to expand my comment: plyr
has set of functions to combine input and output type. So when you function returns something inconvertible to data.frame
you should use list
as output. So instead of using ddply
use dlply
.
When you want to do something on each model and convert results to data.frame
then ldply
is the key.
Lets create some models using dlply
list_of_models <- dlply(warpbreaks, .(tension), function(X) lm(breaks~wool, data=X))
str(list_of_models, 1)
# List of 3
# $ L:List of 13
# ..- attr(*, "class")= chr "lm"
# $ M:List of 13
# ..- attr(*, "class")= chr "lm"
# $ H:List of 13
# ..- attr(*, "class")= chr "lm"
# - attr(*, "split_type")= chr "data.frame"
# - attr(*, "split_labels")='data.frame': 3 obs. of 1 variable:
It gives list
of three lm
models.
Using ldply
you could create a data.frame
, e.g.
with predictions of each model:
ldply(list_of_models, function(model) { data.frame(fit=predict(model, warpbreaks)) }) # tension fit # 1 L 44.5556 # 2 L 44.5556 # 3 L 44.5556
with statistics to each model:
ldply(list_of_models, function(model) { c( aic = extractAIC(model), deviance = deviance(model), logLik = logLik(model), confint = confint(model), coef = coef(model) ) }) # tension aic1 aic2 deviance logLik confint1 confint2 confint3 confint4 coef.(Intercept) coef.woolB # 1 L 2 98.3291 3397.78 -72.7054 34.2580 -30.89623 54.8531 -1.77044 44.5556 -16.33333 # 2 M 2 81.1948 1311.56 -64.1383 17.6022 -4.27003 30.3978 13.82559 24.0000 4.77778 # 3 H 2 76.9457 1035.78 -62.0137 18.8701 -13.81829 30.2411 2.26273 24.5556 -5.77778
精彩评论