开发者

What are the key components and functions for standard model objects in R?

I have implemented a new statistical model in R and it works in my sandbox, but I would like to make it more standard. A good comparison is lm(), where I can take a model object and:

  • apply the summary() function
  • extract the coefficients of the model
  • extract residuals from the fitted (training) data
  • update the model
  • apply the predict() function
  • apply plot() to pre-selected descriptive plots
  • engage in many other kinds of joy

I've looked through the R manuals, searched online, and thumbed through several books, and, unless I'm overlooking something, I can't find a good tutorial on what should go into a new model package.

Although I'm most interested in thorough references or guides, I'll keep this post focu开发者_如何学Gosed on a question with two components:

  1. What are the key components that are usually expected to be in a model object?
  2. What are typical functions that are usually implemented in a modeling package?

Answers could be from the R Core (or package developers) perspective or from the perspective of users, e.g. users expect to be able to use functions like summary, predict, residuals, coefficients, and often expect to pass a formula when fitting a model.


Put into the object what you think is useful and necessary. I think a more important Question is how do you include this information, as well as how one accesses it.

At a minimum, provide a print() method so the entire object doesn't get dumped to the screen when you print the object. If you provide a summary() method, the convention is to have that object return an object of class summary.foo (where foo is your class) and then provide a print.summary.foo() method --- you don't want your summary() method doing any printing in and of itself.

If you have coefficients, fitted values and residuals and these are simple, then you can store them in your returned object as $coefficients, $fitted.values and $residuals respectively. Then the default methods for coef(), fitted() and resid() will work without you needing to add your own bespoke methods. If these are not simple, then provide your own methods for coef(), fitted.values() and residuals() for your class. By not simple, I mean, for example, if there are several types of residual and you need to process the stored residuals to get the requested type --- then you need your own method that takes a type argument or similar to select from the available types of residual. See ?residuals.glm for an example.

If predictions are something that can be usefully provided, then a predict() method could be provided. Look at the predict.lm() method for example to see what arguments should be taken. Likewise, an update() can be provided if it makes sense to update the model by adding/removing terms or altering model parameters.

plot.lm() gives an example of a method that provides several diagnostics plots of the fitted model. You could model your method on that function to select from a set of predefined diagnostics plots.

If your model has a likelihood, then providing a logLik() method to compute or extract it from the fitted model object would be standard, deviance() is another similar function if such a thing is pertinent. For confidence intervals on parameters, confint() is the standard method.

If you have a formula interface, then formula() methods can extract it. If you store it in a place that the default method searches for, then your life will be made easier. A simple way to store this is to store the matched call (match.call()) in the $call component. Methods to extract the model frame (model.frame()) and model matrix (model.matrix()) that are the data and the expanded (factors converted to variables using contrasts, plus any transformations or functions of the model frame data) model matrix are standard extractor functions. Look at examples from standard R modelling functions for ideas on how to store/extract this information.

If you do use a formula interface, try to follow the standard, non-standard evaluation method used in most R model objects that have a formula interface/method. You can find details of that on the R Developer page, in particular the document by Thomas Lumley. This gives plenty of advice on making your function work like one expects an R modelling function to work.

If you follow this paradigm, then extractors like na.action() should just work if you follow the standard (non-standard) rules.


Following up on Gavin's answer, I found this page, also on the developer site, with a long list of useful suggestions.

Also, "An R Companion to Applied Regression", by Fox and Weisberg, has a walk-through of some of the key methods, in Chapter 8. I found that by looking for mentions of model frames in various R books. This book also has a reference to the same page on the R developer site.


This might be another good source.


The following code:

library(hints)
hints(class="lm")

will provide all Functions for lm as:

Functions for lm in package ‘base’:

kappa                   Compute or Estimate the Condition Number of a
                        Matrix
base-defunct            Defunct Functions in Package 'base'
rcond                   Compute or Estimate the Condition Number of a
                        Matrix

Functions for lm in package ‘gam’:

deviance.lm             Service functions and as yet undocumented
                        functions for the gam library

Functions for lm in package ‘gdata’:

nobs                    Compute the Number of Non-missing Observations

Functions for lm in package ‘methods’:

setOldClass             Register Old-Style (S3) Classes and Inheritance

Functions for lm in package ‘stats’:

add1                    Add or Drop All Possible Single Terms to a
                        Model
alias                   Find Aliases (Dependencies) in a Model
anova.lm                ANOVA for Linear Model Fits
case.names.lm           Case and Variable Names of Fitted Models
cooks.distance.lm       Regression Deletion Diagnostics
dfbeta.lm               Regression Deletion Diagnostics
dfbetas.lm              Regression Deletion Diagnostics
drop1.lm                Add or Drop All Possible Single Terms to a
                        Model
dummy.coef.lm           Extract Coefficients in Original Coding
effects                 Effects from Fitted Model
family.lm               Accessing Linear Model Fits
formula.lm              Accessing Linear Model Fits
hatvalues.lm            Regression Deletion Diagnostics
influence.lm            Regression Diagnostics
labels.lm               Accessing Linear Model Fits
logLik                  Extract Log-Likelihood
model.frame.lm          Extracting the Model Frame from a Formula or
                        Fit
model.matrix.lm         Construct Design Matrices
plot.lm                 Plot Diagnostics for an lm Object
print.lm                Fitting Linear Models
proj                    Projections of Models
residuals.lm            Accessing Linear Model Fits
rstandard.lm            Regression Deletion Diagnostics
rstudent.lm             Regression Deletion Diagnostics
summary.lm              Summarizing Linear Model Fits
variable.names.lm       Case and Variable Names of Fitted Models
vcov                    Calculate Variance-Covariance Matrix for a
                        Fitted Model Object
case.names              Case and Variable Names of Fitted Models
dummy.coef              Extract Coefficients in Original Coding
influence.measures      Regression Deletion Diagnostics
lm.influence            Regression Diagnostics
lm                      Fitting Linear Models
lm.fit                  Fitter Functions for Linear Models
model.frame             Extracting the Model Frame from a Formula or
                        Fit
model.matrix            Construct Design Matrices
stats-defunct           Defunct Functions in Package 'stats'
lm.glm                  Some linear and generalized linear modelling
                        examples from `An Introduction to Statistical
                        Modelling' by Annette Dobson

Functions for lm in package ‘unknown’:

confint.lm              NA
extractAIC.lm           NA
qr.lm                   NA
simulate.lm             NA

Functions for lm in package ‘VGAM’:

predict.lm              Undocumented and Internally Used Functions and
                        Classes

Functions for lm in package ‘xtable’:

xtable                  Create Export Tables
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜