COMPASS - Combinatorial Polyfunctionality Analysis of Single Cells
==================================================================
**Kevin Ushey, Lynn Lin, and Greg Finak**
**`r Sys.Date()`**
Introduction
------------
Rapid advances in flow cytometry and other single-cell technologies have
enabled high-dimensional, multi-parameter, high-throughput measurements of
individual cells. Numerous questions about cell population heterogeneity can
now be addressed, as these novel technologies permit single-cell analysis of
antigen-specific T-cells. Unfortunately, there is a lack of computational tools
to take full advantage of these complex data.
COMPASS is a statistical framework that enables unbiased analysis of
antigen-specific T-cell subsets. COMPASS uses a Bayesian hierarchical framework
to model all observed cell-subsets and select the most likely to be
antigen-specific while regularizing the small cell counts that often arise in
multi-parameter space. The model provides a posterior probability of
specificity for each cell subset and each sample, which can be used to profile
a subject’s immune response to external stimuli such as infection or
vaccination.
Example
-------
We will use a simulated dataset to illustrate the use of the `COMPASS` package.
First, we will outline the components used to construct the `COMPASSContainer`,
the data structure used to hold data from an ICS experiment. We will first
initialize some parameters used in generating the simulated data.
```{r sim-init}
library(COMPASS)
set.seed(123)
n <- 100 ## number of samples
k <- 6 ## number of markers
sid_vec <- paste0("sid_", 1:n) ## sample ids; unique names used to denote samples
iid_vec <- rep_len( paste0("iid_", 1:(n/10) ), n ) ## individual ids
```
The `COMPASSContainer` is built out of three main components: the **data**,
the total **counts**, and the **metadata**. We will describe the R structure
of these components next.
### data
This is a list of matrices, with each matrix
representing a population of cells drawn from a particular sample. Each row is an
individual cell, each column is a marker (cytokine), and each matrix element is an intensity measure associated
with a particular cell and marker. Only cells that express at least one
marker are included.
```{r sim-data}
data <- replicate(n, {
nrow <- round(runif(1) * 1E4 + 1000)
ncol <- k
vals <- rexp( nrow * ncol, runif(1, 1E-5, 1E-3) )
vals[ vals < 2000 ] <- 0
output <- matrix(vals, nrow, ncol)
output <- output[ apply(output, 1, sum) > 0, ]
colnames(output) <- paste0("M", 1:k)
return(output)
})
names(data) <- sid_vec
head( data[[1]] )
```
### counts
This is a named integer vector, denoting the total number of cells that were
recovered from each sample in `data`.
```{r sim-counts}
counts <- sapply(data, nrow) + round( rnorm(n, 1E4, 1E3) )
counts <- setNames( as.integer(counts), names(counts) )
head(counts)
```
### metadata
This is a `data.frame` that associates metadata information with each sample
available in `data` / `counts`. We will suppose that each sample was
subject to one of two treatments named `Control` and `Treatment`.
```{r sim-meta}
meta <- data.frame(
sid=sid_vec,
iid=iid_vec,
trt=sample( c("Control", "Treatment"), n, TRUE )
)
head(meta)
```
Once we have these components, we can construct the `COMPASSContainer`:
```{r sim-CC}
CC <- COMPASSContainer(
data=data,
counts=counts,
meta=meta,
individual_id="iid",
sample_id="sid"
)
```
We can see some basic information about our `COMPASSContainer`:
```{r CC-basics}
CC
summary(CC)
```
Fitting the `COMPASS` model is very easy once the data has been inserted into
the `COMPASSContainer` object. To specify the model, the user needs to specify
the criteria that identify samples that received a positive stimulation, and
samples that received a negative stimulation. In our data, we can specify it
as follows. (Because MCMC is used to sample the posterior and that is a slow
process, we limit ourselves to a small number of iterations for the purposes
of this vignette.)
`COMPASS` is designed to give verbose output with the model fitting statement,
in order for users to compare expectations in their data to what the `COMPASS`
model fitting function is doing, in order to minimize potential errors.
```{r COMPASS-fit}
fit <- COMPASS( CC,
treatment=trt == "Treatment",
control=trt == "Control",
iterations=100
)
```
After fitting the `COMPASS` model, we can examine the output in many ways:
```{r COMPASS-examine}
## Extract the functionality, polyfunctionality scores as described
## within the COMPASS paper -- these are measures of the overall level
## of 'functionality' of a cell, which has shown to be correlated with
## a cell's affinity in immune response
FS <- FunctionalityScore(fit)
PFS <- PolyfunctionalityScore(fit)
## Obtain the posterior difference, posterior log ratio from a COMPASSResult
post <- Posterior(fit)
## Plot a heatmap of the mean probability of response, to visualize differences
## in expression for each category
plot(fit)
## Visualize the posterior difference, log difference with a heatmap
plot(fit, measure=PosteriorDiff(fit), threshold=0)
plot(fit, measure=PosteriorLogDiff(fit), threshold=0)
```
`COMPASS` also packages a Shiny application for interactive visualization of
the fits generated by a `COMPASS` call. These can be easily generated through
a call to the `shinyCOMPASS` function.
shinyCOMPASS(fit, stimulated="Treatment", unstimulated="Control")
Interoperation with flowWorkspace
---------------------------------
`flowWorkspace` is a package used for managing and generating gates for data
obtained from flow cytometry experiments. Combined with `openCyto`, users are
able to automatically gate data through flexibly-defined gating templates.
`COMPASS` comes with a utility function for extracting data from a
`flowWorkspace` `GatingSet` object, providing a seamless workflow between
both data management and analysis for polyfunctionality cytokine data.
Data can be extracted using `COMPASSContainerFromGatingSet`, with appropriate
documentation available in `?COMPASSContainerFromGatingSet`. As an example,
a researcher might first gate his data in order to find `CD4+` cells, and then
gate a number of markers, or cytokines, for these `CD4+` cells, in order to
identify cells expressing different combinations of markers.
Users who have gated their data with `flowJo` and are interested in analyzing
their data with `COMPASS` can do so by first loading and parsing their
workspace with `flowWorkspace`, and next generating the `COMPASSContainer`
through `COMPASSContainerFromGatingSet`.
Citations
---------
```{r citation, echo=FALSE, results='asis'}
cite_package <- function(...) {
tryCatch({
args <- unlist( list(...) )
cites <- lapply(args, citation)
txt <- sapply(cites, function(x) {
attr( unclass(x)[[1]], "textVersion" )
})
return(txt[order(txt)])
}, error=function(e) NULL
)
}
citations <- cite_package("COMPASS", "flowWorkspace", "openCyto", "base")
citations <- c("Lin, L. Finak, G. Ushey, K. Seshadri C. et al. COMPASS identifies T-cell subsets correlated with clinical outcomes. Nature biotechnology (2015). doi:10.1038/nbt.3187", citations[1:2],"Greg Finak, Andrew McDavid, Pratip Chattopadhyay, Maria Dominguez, Steve De Rosa, Mario Roederer, Raphael Gottardo. Mixture models for single-cell assays with applications to vaccine studies. Biostatistics 2014 Jan;15(1):87-101",citations[3:4])
invisible(lapply(citations, function(x) cat(x, "\n\n")))
```