A class is an R object with a formal structure; think of classes as nouns. A generic and associated methods are functions that transform nouns; think of generics and methods as verbs.

R has two main class systems, and these differ from class systems in many programming languages. The primary differences is that in R methods are associated with generics, whereas in other programming languages methods are associated with classes.

S3

Consider this simple work flow

x <- rnorm(10)
y <- x + rnorm(10)
df <- data.frame(X=x, Y=y)
fit <- lm(Y ~ X, df)

x and y are examples of so-called ‘atomic’ vectors, the building blocks of R data represenations. df is a data.frame, and is an example of an R class – an assembly of different atomic types (here, a list of numeric vectors, in this case) with an associated ‘class’ attribute

class(df)
## [1] "data.frame"
attributes(df)
## $names
## [1] "X" "Y"
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $class
## [1] "data.frame"
dput(df)
## structure(list(X = c(0.316466059374151, -0.0290990768980746, 
## 0.233530610491406, -0.153293223624643, 0.159430839622362, 1.63223674585324, 
## 0.859170096614589, -0.412265948468438, 0.0306311062978289, -1.03914199775336
## ), Y = c(0.778510942506688, -2.13310593734469, 0.0325801294032117, 
## -0.566422028776406, 0.951729589533829, 2.29810146589824, 1.76793651271658, 
## -1.58211435993906, -0.613110945276851, 0.81776333410081)), .Names = c("X", 
## "Y"), row.names = c(NA, -10L), class = "data.frame")

There are several reasons to introduce classes, including

  1. Enforcing constraints on class members, e.g., vectors in a data.frame must be of equal length

  2. Providing functionality that would otherwise be tedious to maintain, e.g., row.names.

  3. Separating the implementation of the object from the way the user interacts with the object’s interface.

The last point is a primary reason for use of classes, and can be seen in the fit object – it has complicated internal structure that is somehow computationally conveient, but not really the business of the end user.

str(fit)
## List of 12
##  $ coefficients : Named num [1:2] -0.00979 1.15778
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "X"
##  $ residuals    : Named num [1:10] 0.422 -2.09 -0.228 -0.379 0.777 ...
##   ..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:10] -0.554 2.484 -0.363 -0.157 0.711 ...
##   ..- attr(*, "names")= chr [1:10] "(Intercept)" "X" "" "" ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:10] 0.3566 -0.0435 0.2606 -0.1873 0.1748 ...
##   ..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
##  $ assign       : int [1:2] 0 1
##  $ qr           :List of 5
##   ..$ qr   : num [1:10, 1:2] -3.162 0.316 0.316 0.316 0.316 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:10] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:2] "(Intercept)" "X"
##   .. ..- attr(*, "assign")= int [1:2] 0 1
##   ..$ qraux: num [1:2] 1.32 1.11
##   ..$ pivot: int [1:2] 1 2
##   ..$ tol  : num 1e-07
##   ..$ rank : int 2
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 8
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = Y ~ X, data = df)
##  $ terms        :Classes 'terms', 'formula'  language Y ~ X
##   .. ..- attr(*, "variables")= language list(Y, X)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "Y" "X"
##   .. .. .. ..$ : chr "X"
##   .. ..- attr(*, "term.labels")= chr "X"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(Y, X)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "Y" "X"
##  $ model        :'data.frame':   10 obs. of  2 variables:
##   ..$ Y: num [1:10] 0.7785 -2.1331 0.0326 -0.5664 0.9517 ...
##   ..$ X: num [1:10] 0.3165 -0.0291 0.2335 -0.1533 0.1594 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language Y ~ X
##   .. .. ..- attr(*, "variables")= language list(Y, X)
##   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:2] "Y" "X"
##   .. .. .. .. ..$ : chr "X"
##   .. .. ..- attr(*, "term.labels")= chr "X"
##   .. .. ..- attr(*, "order")= int 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(Y, X)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:2] "Y" "X"
##  - attr(*, "class")= chr "lm"

Instead, the user can manipulate the object through it’s interface, defined in part by the methods that operate on the class.

methods(class=class(fit))
##  [1] add1           alias          anova          case.names    
##  [5] coerce         confint        cooks.distance deviance      
##  [9] dfbeta         dfbetas        drop1          dummy.coef    
## [13] effects        extractAIC     family         formula       
## [17] hatvalues      influence      initialize     kappa         
## [21] labels         logLik         model.frame    model.matrix  
## [25] nobs           plot           predict        print         
## [29] proj           qr             residuals      rstandard     
## [33] rstudent       show           simulate       slotsFromS3   
## [37] summary        variable.names vcov          
## see '?methods' for accessing help and source code
anova(fit)
## Analysis of Variance Table
## 
## Response: Y
##           Df  Sum Sq Mean Sq F value  Pr(>F)  
## X          1  6.1691  6.1691  4.1604 0.07571 .
## Residuals  8 11.8625  1.4828                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Some aspects of S3 classes and methods

  1. The class attribute determines what class an object is; there is no formal class definition.

  2. Classes can have linear inheritance. All the methods that apply to an object of class lm can be used on fit1. There may be additional methods that apply only to class my.

    fit1 <- fit
    class(fit1)
    ## [1] "lm"
    class(fit1) = c("my", class(fit1))
    class(fit1)
    ## [1] "my" "lm"
  3. A generic is a plain-old-function that has UseMethod() in it’s body.

    fun <- function(object, ...)
        UseMethod("fun")
  4. A method is a plain old function whose name is constructed by pasting a generic function name and a S3 class name together.

    fun.lm <- function(object, ...)
        message("fun.lm method")
    fun(fit)
    ## fun.lm method
    fun(fit1)
    ## fun.lm method
    fun.my <- function(object, ...)
        message("fun.my method")
    fun(fit)
    ## fun.lm method
    fun(fit1)
    ## fun.my method
  5. Inheritance can be exploited in the function body using NextMethod()

    fun.my <- function(object, ...) {
        message("fun.my method")
        NextMethod()
    }
    fun(fit1)
    ## fun.my method
    ## fun.lm method

Classes, generics, and methods introduce some complexity, for instance getting help…

… or finding source code

S4

The S4 system introduces

Here’s an S4 class definition representing people with first and last names.

.Person <- setClass("Person",
    slots=c(
        first ="character",
        last ="character"
    )
)

setClass() defines the class. It returns a ‘generator’ function that can be used to create an instance of the class. My convention is to assign the generator to a variable named after the class and preceeded by a .. The reason is that the argument signature of the generator is not inforrmative for the user – it consists of ..., rather than named arguments. Thus my convention is to write a user-facing constructor

Person <- function(firstname=character(), lastname=character()) {
    .Person(first=firstname, last=lastname)
}

Here’s a people instance

people <- Person(
    firstname = c("George", "John", "Thomas"),
    lastname = c("Washington", "Adams", "Jefferson")
)

A new class often requires methods work with the data. To separate the implementation from the interface, we’ll write a couple of ‘accessor’ functions that extract relevant components of the data. The accessors use knowledge of the class structure, but we will strive to make all other operations ignorant of implementation details.

firstname <- function(x)
    slot(x, "first")

lastname <- function(x)
    slot(x, "last")

We’ll now implement length() and show() methods, using existing generics. The generics can be discovered with getGeneric(). For instance,

getGeneric("length")
## standardGeneric for "length" defined from package "base"
## 
## function (x) 
## standardGeneric("length", .Primitive("length"))
## <bytecode: 0x36dbd80>
## <environment: 0x36d7e98>
## Methods may be defined for arguments: x
## Use  showMethods("length")  for currently available ones.

tells us that the method wee write should have a single argument x. Thus

setMethod("length", "Person", function(x) {
    length(firstname(x))  # use length of first name vector
})
## [1] "length"
setMethod("show", "Person", function(object) {
    cat("class: ", class(object), "\n",
        "length: ", length(object), " individuals\n",
        sep="")
})
## [1] "show"

Note that we use accessors rather than direct slot access.

Here we implement a derived class, with an additional slot and accessor

.President <- setClass("President",
    contains = "Person",
    slots = c(party = "character")
)

party <- function(x)
    slot(x, "party")

There are two ways in which one can construct an object of this class

.President(    # use base class to initialize...
    people,
    party = c("Unaffiliated", "Federalist", "Democratic-Republican")
)
## class: President
## length: 3 individuals
.President(    # ... or initialize each slot
    first = c("George", "John", "Thomas"),
    last = c("Washington", "Adams", "Jefferson"),
    party = c("Unaffiliated", "Federalist", "Democratic-Republican")
)
## class: President
## length: 3 individuals

We’ll choose to implement a constructor that matches the latter

President <- function(firstname=character(), lastname=character(),
    party=character())
{
    .President(first=firstname, last=lastname, party=party)
}

Note that we did not need to define length() or show() methods for our derived class.

There are many additional features of S4 classes. A simple example is the ‘validity’ method, which can be used to impose constraints on the data.

setValidity("Person", function(object) {
    msg <- character()   # describe how the object is invalid

    if (length(firstname(object)) != length(lastname(object)))
        msg <- c(msg, "firstname() and lastname() lengths differ")
    if (anyNA(firstname(object)) || anyNA(lastname(object)))
        msg <- c(msg, "NA values not allowed in firstname() or lastname()")

    if (length(msg)) msg else TRUE
})
## Class "Person" [in ".GlobalEnv"]
## 
## Slots:
##                           
## Name:      first      last
## Class: character character
## 
## Known Subclasses: "President"
setValidity("President", function(object) {
    ## test only properties of President
    msg <- character()

    if (length(party(object)) != length(object))
        msg <- c(msg, "party() length differs from person lengths")
    if (anyNA(party(object)))
        msg <- c(msg, "NA values not allowed in party()")

    if (length(msg)) msg else TRUE
})
## Class "President" [in ".GlobalEnv"]
## 
## Slots:
##                                     
## Name:      party     first      last
## Class: character character character
## 
## Extends: "Person"