All the functionality we have been using comes from packages that are automatically loaded when R starts. Loaded packages are on the search()
path.
search()
## [1] ".GlobalEnv" "package:RColorBrewer" "package:BiocStyle"
## [4] "package:stats" "package:graphics" "package:grDevices"
## [7] "package:utils" "package:datasets" "package:methods"
## [10] "Autoloads" "package:base"
Additional packages may be installed in R’s libraries. Use `installed.packages() or the RStudio interface to see installed packages. To use these packages, it is necessary to attach them to the search path, e.g., for survival analysis
library("survival")
library(help="survival")
ls(2)
There are many thousands of R packages, and not all of them are installed in a single installation. Important repositories are
Packages can be discovered in various ways, including CRAN Task Views and the Bioconductor web and Bioconductor support sites.
To install a package, use install.packages()
or, for Bioconductor packages, instructions on the package landing page, e.g., for GenomicRanges. Here we install the ggplot2 package.
install.packages("ggplot2", repos="https://cran.r-project.org")
For Bioconductor package, we also recommend installing BiocInstaller package. This package includes an installation function for CRAN and Bioconductor packages biocLite()
.
So as an alternative to the above, if BiocInstaller has been installed:
BiocInstaller::biocLite("ggplot2")
A package needs to be installed once, and then can be used in any R session.
Load the BRFSS-subset.csv data
path <- file.choose() # or file.path
brfss <- read.csv(path)
Clean it by coercing Year
to factor
brfss$Year <- factor(brfss$Year)
Useful for quick exploration during a normal work flow.
plot()
, hist()
, boxplot()
, …?par
, but often provided as arguments to plot()
, etc.Construct complicated plots by layering information, e.g., points, regression line, annotation.
brfss2010Male <- subset(brfss, (Year == 2010) & (Sex == "Male"))
fit <- lm(Weight ~ Height, brfss2010Male)
plot(Weight ~ Height, brfss2010Male, main="2010, Males")
abline(fit, lwd=2, col="blue")
points(180, 90, pch=20, cex=3, col="red")
Approach to complicated graphics: create a grid of panels (e.g., par(mfrows=c(1, 2))
, populate with plots, restore original layout.
brfssFemale <- subset(brfss, Sex=="Female")
opar = par(mfrow=c(2, 1)) # layout: 2 'rows' and 1 'column'
hist( # first panel -- 1990
brfssFemale[ brfssFemale$Year == 1990, "Weight" ],
main = "Female, 1990")
hist( # second panel -- 2010
brfssFemale[ brfssFemale$Year == 2010, "Weight" ],
main = "Female, 2010")
par(opar) # restore original layout
library(ggplot2)
‘Grammar of graphics’
aes()
) to be plottedAdd layers (geom_*()
) of information
ggplot(brfss2010Male, aes(x=Height, y=Weight)) +
geom_point() +
geom_smooth(method="lm")
Capture a plot and augment it
plt <- ggplot(brfss2010Male, aes(x=Height, y=Weight)) +
geom_point() +
geom_smooth(method="lm")
plt + labs(title = "2010 Male")
Use facet_*()
for layouts
ggplot(brfssFemale, aes(x=Height, y=Weight)) +
geom_point() + geom_smooth(method="lm") +
facet_grid(. ~ Year)
Choose display to emphasize relevant aspects of data
ggplot(brfssFemale, aes(Weight, fill=Year)) +
geom_density(alpha=.2)