break dataframe into subsets by factor values, send to function that returns glm class, how to recombine?
Thanks to Hadley's plyr package ddply function we can take a dataframe, break it down into subdataframes by factors, send each to a function, and then combine the function results for each subdataframe into a new dataframe.
But what if the function returns an object of a class like glm or in my case, a c("glm", "lm"). Then, these can't be combined into a dataframe can they? I get this error instead
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class 'c("glm", "lm")' into a data.frame
Is there some more flexible data st开发者_开发知识库ructure that will accommodate all the complex glm class results of my function calls, preserving the information regarding the dataframe subsets?
Or should this be done in an entirely different way?
Just to expand my comment: plyr has set of functions to combine input and output type. So when you function returns something inconvertible to data.frame you should use list as output. So instead of using ddply use dlply.
When you want to do something on each model and convert results to data.frame then ldply is the key.
Lets create some models using dlply
list_of_models <- dlply(warpbreaks, .(tension), function(X) lm(breaks~wool, data=X))
str(list_of_models, 1)
# List of 3
#  $ L:List of 13
#   ..- attr(*, "class")= chr "lm"
#  $ M:List of 13
#   ..- attr(*, "class")= chr "lm"
#  $ H:List of 13
#   ..- attr(*, "class")= chr "lm"
#  - attr(*, "split_type")= chr "data.frame"
#  - attr(*, "split_labels")='data.frame':        3 obs. of  1 variable:
It gives list of three lm models.
Using ldply you could create a data.frame, e.g.  
- with predictions of each model: - ldply(list_of_models, function(model) { data.frame(fit=predict(model, warpbreaks)) }) # tension fit # 1 L 44.5556 # 2 L 44.5556 # 3 L 44.5556
- with statistics to each model: - ldply(list_of_models, function(model) { c( aic = extractAIC(model), deviance = deviance(model), logLik = logLik(model), confint = confint(model), coef = coef(model) ) }) # tension aic1 aic2 deviance logLik confint1 confint2 confint3 confint4 coef.(Intercept) coef.woolB # 1 L 2 98.3291 3397.78 -72.7054 34.2580 -30.89623 54.8531 -1.77044 44.5556 -16.33333 # 2 M 2 81.1948 1311.56 -64.1383 17.6022 -4.27003 30.3978 13.82559 24.0000 4.77778 # 3 H 2 76.9457 1035.78 -62.0137 18.8701 -13.81829 30.2411 2.26273 24.5556 -5.77778
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论