Contents

1 R Packages

All the functionality we have been using comes from packages that are automatically loaded when R starts. Loaded packages are on the search() path.

search()
##  [1] ".GlobalEnv"           "package:RColorBrewer" "package:BiocStyle"   
##  [4] "package:stats"        "package:graphics"     "package:grDevices"   
##  [7] "package:utils"        "package:datasets"     "package:methods"     
## [10] "Autoloads"            "package:base"

Additional packages may be installed in R’s libraries. Use `installed.packages() or the RStudio interface to see installed packages. To use these packages, it is necessary to attach them to the search path, e.g., for survival analysis

library("survival")
library(help="survival")
ls(2)

There are many thousands of R packages, and not all of them are installed in a single installation. Important repositories are

Packages can be discovered in various ways, including CRAN Task Views and the Bioconductor web and Bioconductor support sites.

To install a package, use install.packages() or, for Bioconductor packages, instructions on the package landing page, e.g., for GenomicRanges. Here we install the ggplot2 package.

install.packages("ggplot2", repos="https://cran.r-project.org")

For Bioconductor package, we also recommend installing BiocInstaller package. This package includes an installation function for CRAN and Bioconductor packages biocLite().

So as an alternative to the above, if BiocInstaller has been installed:

BiocInstaller::biocLite("ggplot2")

A package needs to be installed once, and then can be used in any R session.

2 Graphics and Visualization

Load the BRFSS-subset.csv data

path <- file.choose()  # or file.path
brfss <- read.csv(path)

Clean it by coercing Year to factor

brfss$Year <- factor(brfss$Year)

2.1 Base R Graphics

Useful for quick exploration during a normal work flow.

  • Main functions: plot(), hist(), boxplot(), …
  • Graphical parameters – see ?par, but often provided as arguments to plot(), etc.
  • Construct complicated plots by layering information, e.g., points, regression line, annotation.

    brfss2010Male <- subset(brfss, (Year == 2010) & (Sex == "Male"))
    fit <- lm(Weight ~ Height, brfss2010Male)
    
    plot(Weight ~ Height, brfss2010Male, main="2010, Males")
    abline(fit, lwd=2, col="blue")
    points(180, 90, pch=20, cex=3, col="red")

  • Approach to complicated graphics: create a grid of panels (e.g., par(mfrows=c(1, 2)), populate with plots, restore original layout.

    brfssFemale <- subset(brfss, Sex=="Female")
    
    opar = par(mfrow=c(2, 1))     # layout: 2 'rows' and 1 'column'
    hist(                         # first panel -- 1990
        brfssFemale[ brfssFemale$Year == 1990, "Weight" ],
        main = "Female, 1990")
    hist(                         # second panel -- 2010
        brfssFemale[ brfssFemale$Year == 2010, "Weight" ],
        main = "Female, 2010")

    par(opar)                      # restore original layout

2.2 What makes for a good graphical display?

  • Common scales for comparison
  • Efficient use of space
  • Careful color choice – qualitative, gradient, divergent schemes; color blind aware; …
  • Emphasis on data rather than labels
  • Convey statistical uncertainty

2.3 Grammar of Graphics: ggplot2

library(ggplot2)

‘Grammar of graphics’

  • Specify data and ‘aesthetics’ (aes()) to be plotted
  • Add layers (geom_*()) of information

    ggplot(brfss2010Male, aes(x=Height, y=Weight)) +
        geom_point() +
        geom_smooth(method="lm")

  • Capture a plot and augment it

    plt <- ggplot(brfss2010Male, aes(x=Height, y=Weight)) +
        geom_point() +
        geom_smooth(method="lm")
    plt + labs(title = "2010 Male")

  • Use facet_*() for layouts

    ggplot(brfssFemale, aes(x=Height, y=Weight)) +
        geom_point() + geom_smooth(method="lm") +
        facet_grid(. ~ Year)

  • Choose display to emphasize relevant aspects of data

    ggplot(brfssFemale, aes(Weight, fill=Year)) +
        geom_density(alpha=.2)