What does "S3 methods" mean in R?
Since I am fairly new to R, I do not know what the S3 methods and objects are. I found that there are S3 and S4 objec开发者_StackOverflow中文版t systems, and some recommend to use S3 over S4 if possible (See Google's R Style Guide at http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html)*. However, I do not know the exact definition of S3 methods/objects.
Update: As of 2019, Google's R Style Guide hyperlink is now here.
Most of the relevant information can be found by looking at ?S3
or ?UseMethod
, but in a nutshell:
S3 refers to a scheme of method dispatching. If you've used R for a while, you'll notice that there are print
, predict
and summary
methods for a lot of different kinds of objects.
In S3, this works by:
- setting the class of objects of
interest (e.g.: the return value of a
call to method
glm
has classglm
) - providing a method with the general
name (e.g.
print
), then a dot, and then the classname (e.g.:print.glm
) - some preparation has to have been
done to this general name (
print
) for this to work, but if you're simply looking to conform yourself to existing method names, you don't need this (see the help I refered to earlier if you do).
To the eye of the beholder, and particularly, the user of your newly created funky model fitting package, it is much more convenient to be able to type predict(myfit, type="class")
than predict.mykindoffit(myfit, type="class")
.
There is quite a bit more to it, but this should get you started. There are quite a few disadvantages to this way of dispatching methods based upon an attribute (class) of objects (and C purists probably lie awake at night in horror of it), but for a lot of situations, it works decently. With the current version of R, newer ways have been implemented (S4 and reference classes), but most people still (only) use S3.
To get you started with S3, look at the code for the median
function. Typing median
at the command prompt reveals that it has one line in its body, namely
UseMethod("median")
That means that it is an S3 method. In other words, you can have a different median
function for different S3 classes. To list all the possible median methods, type
methods(median) #actually not that interesting.
In this case, there's only one method, the default, which is called for anything. You can see the code for that by typing
median.default
A much more interesting example is the print
function, which has many different methods.
methods(print) #very exciting
Notice that some of the methods have *
s next to their name. That means that they are hidden inside some package's namespace. Use find
to find out which package they are in. For example
find("acf") #it's in the stats package
stats:::print.acf
From http://adv-r.had.co.nz/OO-essentials.html:
R’s three OO systems differ in how classes and methods are defined:
S3 implements a style of OO programming called generic-function OO. This is different from most programming languages, like Java, C++ and C#, which implement message-passing OO. With message-passing, messages (methods) are sent to objects and the object determines which function to call. Typically, this object has a special appearance in the method call, usually appearing before the name of the method/message: e.g. canvas.drawRect("blue"). S3 is different. While computations are still carried out via methods, a special type of function called a generic function decides which method to call, e.g., drawRect(canvas, "blue"). S3 is a very casual system. It has no formal definition of classes.
S4 works similarly to S3, but is more formal. There are two major differences to S3. S4 has formal class definitions, which describe the representation and inheritance for each class, and has special helper functions for defining generics and methods. S4 also has multiple dispatch, which means that generic functions can pick methods based on the class of any number of arguments, not just one.
Reference classes, called RC for short, are quite different from S3 and S4. RC implements message-passing OO, so methods belong to classes, not functions. $ is used to separate objects and methods, so method calls look like canvas$drawRect("blue"). RC objects are also mutable: they don’t use R’s usual copy-on-modify semantics, but are modified in place. This makes them harder to reason about, but allows them to solve problems that are difficult to solve with S3 or S4.
There’s also one other system that’s not quite OO, but it’s important to mention here:
- base types, the internal C-level types that underlie the other OO systems. Base types are mostly manipulated using C code, but they’re important to know about because they provide the building blocks for the other OO systems.
I came to this question mostly wondering where the names came from. It appears from this wikipedia article that the name refers to the version of the S Programming Language that R is based on. The method dispatching schemes described in the other answers come from S and are labelled appropriately according to version.
Try
methods(residuals)
which lists, among others, "residuals.lm" and "residuals.glm". This means when you have fitted a linear model, m, and type residuals(m)
, residuals.lm will be called. When you have fitted a generalized linear model, residuals.glm will be called.
It's kind of the C++ object model turned upside down. In C++, you define a base class having virtual functions, which are overrided by derived classed.
In R you define a virtual (aka generic) function and then you decide which classes will override this function (aka define a method). Note that the classes doing this do not need to be derived from one common super class.
I would not agree to generally prefer S3 over S4. S4 has more formalism (= more typing) and this may be too much for some applications. S4 classes, however, can be de defined like a class or struct in C++. You can specify that an object of a certain class is made up of a string and two numbers for example:
setClass("myClass", representation(label = "character", x = "numeric", y = "numeric"))
Methods that are called with an object of that class can rely on the object having those members. That's very different from S3 classes, which are just a list of a bunch of elements.
With S3 and S4, you call a member function by fun(object, args)
and not by object$fun(args)
. If you are looking for something like the latter, have a look at the proto package.
Here is an updated fast rundown of the numerous R object systems according to "Advanced R, 2nd edition" (CRC Press, 2019) by Hadley Wickham (Chief Scientist at RStudio), which has a web representation here, based on the chapter about Object-Oriented Programming.
The first edition from 2015 has a web representation here, with the corresponding chapter on OO here.
Approaches to OO systems
Hadley defines the following to distinguish two distinct approaches to OO programming:
Functional OOP: methods (callable code pieces) belong to generic functions (not to be confused with Java/C# generic methods). Think of the methods as being located in a global lookup table. The method to execute is found by the runtime system based on the name of the function and the type (or object class) of one or more arguments passed to that function (this is called "method dispatch"). Syntax-wise, method calls may look like ordinary function calls: myfunc(object, arg1, arg2)
. This call would lead the runtime to look for the method associated to the pair ("myfunc", typeof(object)) or possibly ("myfunc", typeof(object), typeof(arg1), typeof(arg2)) if the language supports that. In R's S3, the full name of the generic function gives the (function-name, class) pair. For example: mean.Date
is the method to compute the mean of Dates. Try methods("mean")
to list the generic methods with function name mean
. The Functional OOP approach is found for example in the OO pioneer Smalltalk, the Common Lisp Object System and Julia. Hadley notes that "Compared to R, Julia’s implementation is fully developed and extremely performant."
Encapsulated OOP: methods belong to objects or classes, and method calls typically look like object.method(arg1, arg2)
. This is called encapsulated because the object encapsulates both data (fields) and behaviour (methods). Think of the method as being located in a lookup table attached to the object or the object's class description. The runtime looks the method up based on method name and possibly the type of one or more arguments. This is the approach found in "popular" OO languages like C++, Java, C#.
In both cases, if inheritance is supported (it probably is), the runtime may traverse the class hierarchy upwards until it has found a match for the call lookup key.
How to find out what system an R object belongs to
library(sloop) # formerly, "pryr"
otype(mtcars)
#> [1] "S3"
The R object systems
S3
- Functional OOP approach.
- Most important system according to Hadley.
- Simplest, most common. First OO system used by R.
- Comes with base R, used throughout base R.
- Relies on conventions rather than enforced guarantees.
- See Chambers, John M, and Trevor J Hastie. 1992. "Statistical Models in S." Wadsworth & Brooks/Cole Advanced Books & Software.
- Details in "Advanced R, 2nd edition" here.
S4
- Functional OOP approach.
- Third most important system according to Hadley.
- Rewrite of S3, therefore similar to S3, but more formal and more strict: it forces you to think carefully about program design. Suited for building large systems (e.g. the Bioconductor project).
- Implemented in the base "methods" package.
- See: Chambers, John M. 1998. "Programming with Data: A Guide to the S Language." Springer.
- Details in "Advanced R, 2nd edition" here.
RC aka "Reference Classes"
- Encapsulated OOP approach.
- Comes with base R.
- Based on S4.
- RC objects are special type of S4 objects that are also "mutable". i.e. instead of using R's usual copy-on-modify semantics, they can be modified in-place. Note that mutable state is hard to reason about and a source of ugly bugs but can lead to more efficient code in certain applications.
R6
- Encapsulated OOP approach.
- Second most important system according to Hadley.
- Can be found in the R6 package (install with
library(R6)
) - Similar to RC, but lighter & much faster: it does not depend on S4 or the methods package. Built on top of R environments. Also has:
- public and private methods
- active bindings (fields, that, when accessed, actually call a method)
- class inhertance which works across packages
- both class methods (code that belongs to class and can access an instance via
self
,private
,super
) and member functions (functions assigned to fields, but which are not methods, just functions)
- Provides a standardised way to escape R's "copy-on-modify" semantics
- See the package site: "R6: Encapsulated object-oriented programming for R".
- Details in "Advanced R, 2nd edition" here.
Others
There are others, like R.oo (similar to RC), proto (prototype-based, think JavaScript) and Mutatr. However, "Advanced R" says:
Apart from R6, which is widely used, these systems are primarily of theoretical interest. They do have their strengths, but few R users know and understand them, so it is hard for others to read and contribute to your code.
Be sure to read the chapter on trade-offs in "Advanced R, 2nd edition", too.
精彩评论