%\VignetteIndexEntry{An R Package for Analysis of High Throughput Flow Cytometry Data} \documentclass[12pt]{article} \usepackage{amsmath,pstricks} % With MikTeX 2.9, using pstricks also requires using auto-pst-pdf or running % pdflatex will fail. Note that using auto-pst-pdf requires to set environment % variable MIKTEX_ENABLEWRITE18=t on Windows, and to set shell_escape = t in % texmf.cnf for Tex Live on Unix/Linux/Mac. \usepackage{auto-pst-pdf} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \SweaveOpts{keep.source=TRUE} <>= library(gplots) @ \textwidth=6.2in \textheight=8.5in \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1()}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\code}[1]{{\texttt{#1}}} \bibliographystyle{plainnat} \title{Analysis of High Throughput Flow Cytometry Data using \Rpackage{plateCore}} \author{Errol Strain, Florian Hahne, Perry Haaland} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} \maketitle \tableofcontents \section{Overview} \Rpackage{plateCore} is a Bioconductor packaged created to make processing and analysis of large, complex flow datasets in R easier. High throughput flow studies are often run in a 96 or 384-well plate format, with a number of different samples, controls, and antibodies-dye conjugates present on the plate. Analyzing the output from the cytometer requires keeping track of the contents of each well, matching sample wells with control wells, gating each well/channel separately, making the appropriate plots, and summarizing the results. \Rpackage{plateCore} extends the \Rpackage{flowCore} and \Rpackage{flowViz} packages to work on \Rclass{flowPlate} objects that represent these large flow datasets. For those familiar with \Rpackage{flowCore} and \Rpackage{flowViz}, the gating (filtering), transformation, and other data manipulations for \Rclass{flowPlates} are very similar to \Rclass{flowSets}. In this document we show how setup a \Rpackage{plateCore} analysis and provide examples for a publicly available dataset. The peripheral blood mononucleocyte (PBMC) dataset is a collection of five 96-well plates that have been stained with 189 different antibody-dye conjugates. The goal is to identify cells that are positively stained relative to some corresponding negative control. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} \subsection{Background} \Rpackage{plateCore} was created to address the need for robust methods to analyze data from \href{http://www.bd.com/technologies/discovery_platform/BD_FACS_CAP.asp}{BD FACS\texttrademark CAP} experiments. The current version of FACS\texttrademark CAP has 200+ antibody-dye conjugates on a single 96-well plate, where the antibodies are specific for different human cell surface markers or proteins. The output from an experiment is a per cell measurement of protein/marker expression for each antibody. FACS\texttrademark CAP is designed to be a tool for screening a large number of surface markers on different types of human tissue or blood samples. Analyzing the data requires objective, high throughput approaches to gating and summarizing large amounts of flow data. In addition to BD FACS\texttrademark CAP, \Rpackage{plateCore} also works for any other type of plate-based flow data, or any collection of samples that can be organized into a \Rclass{flowPlate} object. \Rclass{flowPlates} are simply a convenient way to structure the data and manage the annotation for a set of related flow samples. Once the data is in \Rpackage{plateCore}, it's also easy to summarize results across multiple experiments and to integrate flow results with other types of data using Bioconductor and access statistical methods in R. \subsection{Bioconductor Flow Tools} \Rpackage{plateCore} is built on top of \Rpackage{flowCore} and \Rpackage{flowViz}, and the functionality implemented for \Rclass{flowPlates} is a subset of what is available for \Rclass{flowSets}. Getting your data into a \Rclass{flowSet} is the best way to start exploratory analysis, and also gives you access to other flow packages such as \Rpackage{flowQ}, \Rpackage{flowStats}, and \Rpackage{flowClust}. Once the layout of the experiment has been standardized, loading the data into plateCore allows users to automate portions of the analysis, and makes summarizing the data and extracting the results easier. \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Installation and Getting Started} Install the \Rpackage{plateCore} package and load the sample data containing a plate dataset, compensation files, and a table decsribing the layout of the experiment. <>= library(plateCore) data(plateCore) @ <<>>= ls() @ The \Robject{pbmcPlate} is actually one of the 5 plates that make up the Peripheral Blood Mononucleocyte Cell (PMBC) dataset in the Section ~\ref{section:PBMC}. This particular plate has 96 wells and is stained in the FL1.H (FITC), FL2.H (PE), FL3.H (PerCP.Cy5.5), and FL4.H (APC) channels. \Rpackage{flowCore} was used to read in the raw FCS files, and the \Robject{pbmcPlate} is a \Rclass{flowSet} resulting from \Rfunction{read.flowset} operation. \Rclass{flowSets} are modeled on the microarray data structures used in Bioconductor. The \Robject{pbmcPlate} is comprised of an \Rclass{exrs} expression matrix, along with the associated \Rclass{phenoData}. <>= pbmcPlate @ The wellAnnotation \Rclass{data.frame} describes the layout of the plate and the content of each well. Stained wells have a row for each of the channels of interest, while unstained wells only have a single row. The information in well annotation is similar to the \Robject{pData} matrix for \Rclass{phenoData(AffyBatch)}. For \Rclass{pData} all information about a sample is contained on a single row, which makes it difficult to handle a multiplexed flow dataset since each channel essentially requires multiple columns to describe the contents of a well. The wellAnnotation format is easier to create and maintain, and \Rpackage{plateCore} handles organizing the relevant information into a \Rclass{pData} object and incorporating it into a \Rclass{flowSet}. <>= head(wellAnnotation) @ The first column of the table contains a unique well identifier, which must correspond to one of of the samples from pbmcPlate. The FCS output from the cytometer usually contains the well id in the filename, and these filenames are what \Rpackage{flowCore} uses to assign sample names to a \Rclass{flowSet}. These well identifiers must be unique to a well, which usually means using 3 character codes like "A01", "B09", "H10", etc. \Rpackage{plateCore} assumes that the first character provides the row name and that second two characters give the column on the plate. The second column in the annotation table gives the sample type, currently either "Unstained", "Test","Negative.Control", or "Isotype". Each well can only be a single sample type. The Ab.Name column contains either the name of the antibody, or some other descriptor that will be useful for making plots. Looking at well B05 in the FL4.H, we see that it is stained with an antibody named CDbd14. <>= wellAnnotation[50,] @ The antibody names in this PBMC dataset were masked prior to public release, so these names will not correspond to the standard CD names. The Channel column tells which channel was used to detect fluoresence for the antibody-dye conjugate. Negative.Control information indicates which well is the negative control (e.g. isotype) for test samples. The current version of plateCore only supports a single negative control well for each test well, but multiple test wells can use the same negative control. Each negative control well should be assigned as it's own negative control for reasons that will be explained in following sections. Additional columns can be included in wellAnnotation and will be incorporated in later results, but the 5 columns described here are the minimum for a \Rclass{flowPlate}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Creating a \Rclass{flowPlate}} Making a \Rclass{flowPlate} requires a \Rclass{flowset}, a well annotation table, and a name for the plate that is unique within the set of flow data under analysis. Later we may want to combine information from multiple plates, so having a unique plate name identifier (e.g. barcode) makes it easier to track the samples. Using the sample dataset from the \Rpackage{plateCore}, a \Rclass{flowPlate} can be built. <>= pbmcFP <- flowPlate(pbmcPlate,wellAnnotation,plateName="PBMC.001") @ Looking at the first \Rclass{pData} entry from the \Robject{pbmcPlate} and from \Robject{pbmcFP}, we can see how the \Robject{wellAnnotation} was incorporated into \Rclass{pData}. <>= pData(phenoData(pbmcPlate))[1,] pData(phenoData(pbmcFP))[1,] @ The following examples use the lymphocyte population from \Robject{pbmcFP}, so we select the cells of interest using a \Rclass{rectangleGate} and \Rclass{norm2Filter}. Details on how to use \Rfunction{Subset} can be found in the Gating section and in \Rpackage{flowCore}. The cells selected by this gate are shown in Figure ~\ref{fig:lymphGate}. <>= rectGate <- rectangleGate("FSC-H"=c(300,700),"SSC-H"=c(50,400)) normGate <- norm2Filter("SSC-H","FSC-H",scale.factor=2.5) pbmcFP.lymph <- Subset(pbmcFP, rectGate & normGate) @ \begin{figure} \begin{center} <>= print(xyplot(`SSC-H` ~ `FSC-H` | as.factor(Well.Id),pbmcFP[93:96],smooth=FALSE,displayFilter=TRUE,col=c("red","blue"), filter=rectangleGate("FSC-H"=c(300,700),"SSC-H"=c(50,400)) & norm2Filter("SSC-H","FSC-H",scale.factor=2.5))) @ \end{center} \caption{Forward (FSC) and side-scatter (SSC) dotplots for the first four wells of the PMBC plate. The lymphocyte population is shown in blue while monocytes are located in the upper right corner. Lymphocytes were selected using the \Rclass{rectangleGate} and \Rclass{norm2Filter} gates from \Rpackage{flowCore}.} \label{fig:lymphGate} \end{figure} \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Compensation and Background Correction} The \Rfunction{compensation} function from \Rpackage{flowCore} is used to create a compensation matrix for the sample data. Channel names in the compensation \Rclass{flowSet} must match the dataset under analysis, otherwise the matrix will not work correctly. Details about how \Rfunction{compensation} works can be found in \Rpackage{flowCore}. We can create a compensation matrix using the sample data in \Rpackage{plateCore} and the \Rfunction{spillover} from \Rpackage{flowCore}. <>= comp.mat <- spillover(x=compensationSet,unstain=sampleNames(compensationSet)[5], patt=".*H",fsc="FSC-H",ssc="SSC-H",method="median") @ This \Robject{comp.mat} matrix can then be applied to a \Rclass{flowPate} to correct for the effects of spillover. The difference between compensating in \Rpackage{flowCore} and compensating in \Rpackage{plateCore} is that \Rpackage{plateCore} only compensates for the dyes listed in \Robject{wellAnnotation}, whereas \Rpackage{flowCore} compensates each sample the same way. Since \Rpackage{plateCore} does this custom compensation, it is important to list each dye or fluorophore in the wellAnnotation, even if the sample will not be used for further analysis. If all the samples are stained the same way, then compensating with either approach should give the same results. The \Robject{pbmcFP} can be compensated using the \Robject{comp.mat} from above. <>= pbmcFPcomp <- compensate(pbmcFP.lymph,comp.mat) @ In this particular case the \Robject{pbmcPlate} has already compensated on the cytometer, so no further compensation is necessary. The process of analyzing and gating (filtering) the large amount of data in a \Rclass{flowPlate} can be simplified by first correcting for the effects of cell size on background fluorescence. This step is not necessary in this PBMC lymphocyte example, since the level of autofluorescence is very low, but it can have large effects on other cell types that have a wider range of FSC values, such as fibroblasts and stem cells. This correction will later allows us to define a one-dimensional gate between positive and negative cells in only the channel interest, instead of having a two dimensional gate that includes the forward scatter (FSC) channel. The background correction uses the unstained wells and fits a log-log linear model to FSC versus each fluorescence channel (Hahne et al. 2006 Genome Biology). The correction is then applied to all the wells on a the plate. The \Rfunction{fixAutoFl} function takes a \Rclass{flowPlate}, and character variables indicating the FSC channel and the fluorsence channels (chanCols). <>= pbmcFPbgc <- fixAutoFl(pbmcFP.lymph,fsc="FSC-H",chanCols=rownames(comp.mat)) @ Figure ~\ref{fig:bgc1} shows the results of the autofluorescence correction for PBMC lymphoctes where the background fluorescence was artificially inflated. \begin{figure} \begin{center} <>= temp <- pbmcFP.lymph[96] fsMed <- log10(median(exprs(temp[[1]])[,"FSC-H"])) temp <- fsApply(plateSet(temp),function(x) { flVals <- log10(exprs(x)[,"FL1-H"]) fsc <- log10(exprs(x)[,"FSC-H"]) exprs(x)[,"FL1-H"] <- 10^(flVals+4*(fsc-fsMed)) x }) temp2 <- as(list("0877408774.H12"=temp[[1]]),"flowSet") wellTemp <- subset(wellAnnotation(pbmcFP),Well.Id=="H12") temp2 <- flowPlate(temp2,wellTemp,plateName="H12") temp2 <- fixAutoFl(temp2,fsc="FSC-H",chanCols=c("FL1-H","FL2-H"),unstain="0877408774.H12") temp <- as(list("H11"=temp[[1]],"H11 (corrected)"=temp2[[1]]),"flowSet") temp <- transform("FL1-H"=log10) %on% temp print(flowViz::levelplot(`FL1-H` ~ `FSC-H` | as.factor(name),data=temp, ylim=c(0,2.2))) @ \end{center} \caption{Levelplots showing the effects of cell size (FSC) on fluorescence for the unstained cells before (left) and after (right) the background correction. The fluorescence signal intensity generally increases with increasing cell size (FSC). The effect of cell size on background fluorescence for PBMC lymphocytes was exaggerated in this plot to demonstrate \Rfunction{fixAutoFl}.} \label{fig:bgc1} \end{figure} \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Gating} Gating on \Rclass{flowplates} is mainly performed using \Rfunction{Subset}. The first argument to this function is a \Rclass{flowPlate} and the second is a valid \Rpackage{flowCore} filter(s). For exploratory analysis and setting up the initial gates, it is sometimes more convenient to work with a \Rclass{flowSet}, which can be extracted from a \Rclass{flowPlate} using the \Rfunction{plateSet} function. <>= fs <- plateSet(pbmcFP) @ In this PBMC example, we can use a \Rclass{rectangleGate} to separate the lymphocytes from the debris and monocytes. Looking at the plots in Figure ~\ref{fig:lymphGate}, the lymphocytes look like they are located from about 400 to 650 on the FSC scale, and 100 to 300 on the SSC scale. We can use \Rfunction{xyplot} from the \Rpackage{flowViz} package to display events inside the gates. <>= rectGate <- rectangleGate("FSC-H"=c(400,700),"SSC-H"=c(100,300)) xyplot(`SSC-H` ~ `FSC-H` | as.factor(Well.Id), pbmcFP[1:2], displayFilter=TRUE, smooth=FALSE, col=c("red","blue"),filter=rectGate) @ Since we want this gate to be applicable to all the wells, it may help to enlarge the \Rclass{rectangleGate} and then use a data-driven gate like \Rclass{norm2Filter} to pick out lymphocytes. The lymphocyte population may drift over the course of analyzing the plate. \Rclass{norm2Filter} will fit a bivariate normal to the FSC and SSC channels to identify an elliptical region of high density. <>= rectGate <- rectangleGate("FSC-H"=c(300,700),"SSC-H"=c(50,400)) normGate <- norm2Filter("SSC-H","FSC-H",scale.factor=1.5) xyplot(`SSC-H` ~ `FSC-H` | as.factor(Well.Id), pbmcFP[1:4], displayFilter=TRUE, smooth=FALSE, col=c("red","blue"),filter=normGate & rectGate) @ The results of applying the \Rclass{rectangleGate} and \Rclass{norm2filt} are shown in Figure~\ref{fig:lymphGate}. Once the population of interest has been identified, \Rfunction{Subset} the \Rclass{flowPlate} to select those cells. These same gates were used to select lymphocytes when we were initially creating the flowSet. If your sample is comprised of multiple cell populations, as in this PBMC example, then it is necessary to \Rfunction{Subset} before the compensation and background correction steps. The \Robject{compensationSet} should be processed with the \Rclass{rectangleGate} and \Rclass{norm2Filter} gates prior to constructing the compensation matrix. <>= rectGate <- rectangleGate("FSC-H"=c(300,700),"SSC-H"=c(50,400)) normGate <- norm2Filter("SSC-H","FSC-H",scale.factor=1.5) pbmcFP.lymph <- Subset(pbmcFP, rectGate & normGate) @ Now we're ready to set the isotype (Negative.Control) gates that will be used to idenfity positively and negatively stained cells. We will estimate the gates using the \Rclass{flowPlate} that has been subsetted for lymphocytes, compensate, and then background corrected. The default settings for the \Rfunction{setControlGates} assumes the data is on a linear scale. <>= pbmcFPbgc <- setControlGates(pbmcFPbgc,gateType="Negative.Control",numMads=5) pbmcFPbgc <- applyControlGates(pbmcFPbgc) @ The \Robject{numMads} parameter indicates how far above the isotype population the positive-negative threshold is set. The threshold is set by taking the median fluorescence intensity (MFI) and adding 5 median absolute deviations (MADs). The default value of 5 works well for a number of cell types but may need to be adjusted for specific applications. Future versions of \Rpackage{plateCore} will use more sophisticated methods of estimating kernel density to fit distrubutions to the data and have a more robust esimtate of the positive-negative threshold. Once the control gates have be created, they are applied to the test wells using the \Rfunction{applyControlGates}. We see how reasonable the gates look using \Rfunction{xyplot}. Examples of the test wells associated with the negative control well A03 are shown in Figure ~\ref{fig:isoGate1}. To check all the gates in the FL1.H channel we would use the following code. <>= xyplot(`FL1-H` ~ `FSC-H` | as.factor(Well.Id), transform("FL1-H"=log10) %on% pbmcFPbgc, displayFilter=TRUE, smooth=FALSE,col=c("red","blue"),filter="Negative.Control") @ \begin{figure} \begin{center} <>= wells <- unique(subset(pbmcFPbgc@wellAnnotation,Negative.Control=="A03",select="name")[,1]) print(xyplot(`FL1-H` ~ `FSC-H` | as.factor(Well.Id), transform("FL1-H"=log10) %on% pbmcFPbgc[wells], displayFilter=TRUE,smooth=FALSE, col=c("red","blue"), filter="Negative.Control")) @ \end{center} \caption{Dotplots showing the test wells associated with the negative control well A03. The gate was created automatically using \Rfunction{setControlGates} to estimate the threshold between positive and negative cells based on the staining in A03. \Rclass{rectangleGates} are drawn around the positive cells in each plot. Positive cells are shown in blue and negative cells in red.} \label{fig:isoGate1} \end{figure} \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Displaying Data} Tools for visualizing \Rclass{flowPlates} are based on the collection of functions in \Rpackage{flowViz}, which is itself based on \Rpackage{lattice}. Currently on \Rfunction{xyplot} works directly on a \Rclass{flowPlate} object, but the other functions like \Rfunction{levelplot} (shown in Figure \ref{fig:bgc1}) can be used by accessing the \Rclass{flowSet} inside a \Rclass{flowPlate} via \Rfunction{plateSet}. \Rfunction{xyplot} is the primary tool for making dotplots, showing gates, and creating overlay plots. If the negative control gates have been created using \Rfunction{setControlGates}, then they can be shown using \Rfunction{xyplot}. The code used to generate Figure ~\ref{fig:isoGate1} is shown below. <>= wells <- unique(subset(pbmcFPbgc@wellAnnotation,Negative.Control=="A03", select="name")[,1]) xyplot(`FL1-H` ~ `FSC-H` | as.factor(Well.Id), transform("FL1-H"=log10) %on% pbmcFPbgc[wells], displayFilter=TRUE,smooth=FALSE, col=c("red","blue"), filter="Negative.Control") @ The wells of interest are identified by looking for all the wells that have A03 are their negative control. When used for plotting \Rclass{flowSets}, the filter argument takes a \Rpackage{flowCore} \Rclass{filter}. For \Rclass{flowPlates} the filter can also accept a character string indicating what type of gate to display. \Rpackage{plateCore} currently supports "Negative.Control" gates, and more options will be available in the future. In the plots shown in Figure ~\ref{fig:isoGate1}, a number of the test samples are heterogenous in expression for the markers in the FL1.H channel. For these samples it is clear that the gate created using \Rfunction{setControlGates} looks reasonable. In other situations where the difference between positive and negative cells is not as distinct, making overlay dotplots showing both the test and negative control samples on the same graph can help to determine the appropriate gate. An example of the code use to generate the overlay plots in Figure ~\ref{fig:isoGate2}is shown below. <>= wells <- unique(subset(pbmcFPbgc@wellAnnotation,Negative.Control=="A06", select="name")[,1]) xyplot(`FL1-H` ~ `FSC-H` | as.factor(Well.Id), transform("FL1-H"=log10) %on% pbmcFPbgc[wells], displayFilter=TRUE,smooth=FALSE, col=c("blue","green"), filter="Negative.Control",filterResults="Negative.Control") @ In order for the gates to display correctly, any transformations must be applied directly on the \Rclass{flowPlate}. This ensures that the Negative.Control gates, along with the fluorescence signal data, are on the same scale. If the transformation is in the formula, as in the following example, the the gates will not show up. <>= xyplot(log10(`FL1-H`) ~ `FSC-H` | as.factor(Well.Id), pbmcFPbgc[wells], displayFilter=TRUE,smooth=FALSE, col=c("blue","green"), filter="Negative.Control",filterResults="Negative.Control") @ \begin{figure} \begin{center} <>= wells <- unique(subset(pbmcFPbgc@wellAnnotation,Negative.Control=="A06",select="name")[,1]) print(xyplot(`FL1-H` ~ `FSC-H` | as.factor(Well.Id), transform("FL1-H"=log10) %on% pbmcFPbgc[wells], displayFilter=TRUE,smooth=FALSE, col=c("blue","green"), filter="Negative.Control",filterResults="Negative.Control")) @ \end{center} \caption{Overlay dotplots for the test wells associated with the negative control well A06. The cells from A06 (green) are shown in each dotplot along with the results from the test cells (blue). In cases where the test well population is close to the positive-negative threshold, such as well D08, these plots can help determine if the Negative.Control gates should be adjusted.} \label{fig:isoGate2} \end{figure} \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Extracting Results} Once the Negative.Control gates have been created and applied, we can then use the \Rfunction{summaryStats} to calculate different metrics of interest from the \Rclass{flowPlate}. Running \Rfunction{summaryStats} on the \Robject{pbmcFPbgc}, <>= pbmcFPbgc <- summaryStats(pbmcFPbgc) @ will result in additional columns created in the associated \Robject{wellAnnotation} \Rclass{data.frame}. <>= colnames(pbmcFPbgc@wellAnnotation) @ These new columns include percentage of cells above the Negative.Control gate (Percent.Positive), the number of cells in the raw data (Total.Count), the number of positive cells (Positive Count), the median fluorescence intensity (MFI), and the ratio of the test well MFI to the MFI of the negative control well (MFI.Ratio). The Predict.PP and Gate.Score columns are explained in Section ~\ref{section:Quality}. In this PMBC example a number of the samples have multiple cell populations, so the MFI and MFI.Ratio may not be helpful since they are based on all the cells in a well, and not just the positive cells. Now that we have this information we can display it using \Rfunction{xyplot}. We can add the marker names and percent positive results to our dotplots (Figure ~\ref{fig:isoGate3}). The \Rfunction{getGroups} retrieves a list of wells associated with each negative control for a particular channel. In this case, the 3rd element in the controlGroups list corresponds to negative control well A03. <>= controlGroups <- getGroups(pbmcFPbgc,chan="FL1-H") print(xyplot(`FL1-H` ~ `FSC-H` | as.factor(Well.Id), transform("FL1-H"=log10) %on% pbmcFPbgc[controlGroups[[3]]], displayFilter=TRUE, smooth=FALSE, col=c("red","blue"), filter="Negative.Control", flowStrip=c("Well.Id","Ab.Name","Percent.Positive"))) @ The \Robject{wellAnnotation} \Rclass{data.frame} can be exported as a comma delimited for a record of the analysis. <>= write.csv(pbmcFPbgc@wellAnnotation,file="PMBC.001.csv") @ \clearpage \begin{figure} \begin{center} <>= controlGroups <- getGroups(pbmcFPbgc,chan="FL1-H") print(xyplot(`FL1-H` ~ `FSC-H` | as.factor(Well.Id), transform("FL1-H"=log10) %on% pbmcFPbgc[controlGroups[[3]]], displayFilter=TRUE, smooth=FALSE, col=c("red","blue"), filter="Negative.Control", flowStrip=c("Well.Id","Ab.Name","Percent.Positive"))) @ \end{center} \caption{Dotplots and Negative.Control gates for wells associated with control well A03. Negative cells are shown in read and positive cells in blue. The strip above each plot now contains the Antibody-Marker name, along with the percentage of positive cells in the well.} \label{fig:isoGate3} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \clearpage \section{Quality Assessment} \label{section:Quality} Quality assessment using Bioconductor flow tools is covered in detail in the flowQ publication (Le Meur et al. 2007, Cytometry Part A). This example briefly covers how to check the number of lymphocyte events in each well, scan for fluidic events, and check the consistency of the isotype gating. The number of gated lymphocytes in each well of the pbmcFPbgc dataset can be found in the Total.Count column of the wellAnnotation. <>= summary(wellAnnotation(pbmcFPbgc)$Total.Count) @ Since each well has at least 858 gated lymphocyte events there are no apparent problems with the sample acquisition on the plate. Fluidic events (bubbles, etc.) can cause the cytometer readings to shift. These type of events can often be identified by plotting FSC vs Time, or through ecdf plots (as described in Le Meur et al. 2007, Cytometry Part A). An example of an ecdf plot for FSC looking for a row effect, since the samples are acquired by row, is shown in Figure ~\ref{fig:ecdf}. The code used to produce the figure is shown below. <>= print(flowViz::ecdfplot(~`FSC-H` | as.factor(Column.Id), data=plateSet(pbmcFPbgc), groups=Row.Id, auto.key=TRUE)) @ One approach to evaluating the consistency of isotype-based gating is to plot the MFI ratio versus the percentage of positive cells. The MFI ratio is the ratio of the median fluorescence intensity of the test sample to its corresponding isotype control. If two sample have approximately the same percentage of positive cells, then their MFI ratios should be similar. The \Rfunction{summaryStats} function in \Rpackage{plateCore} performs a robust logistic regression on the MFI ratio to the percentage of positive cells, and results from the fit are stored in the Predict.PP and Gate.Score columns of wellAnnotation. Predict.PP gives the estimated percentage of positive cells based on the MFI ratio, and Gate.Score indicates how many standard residuals a sample data point is from the best fit line. Figure ~\ref{fig:mfiRatio} shows the MFI ratio versus percent positive plot for the PBMC lymphocyte example. The code used to generate the plot is shown below. The \Rfunction{glmrob} function from the \Rpackage{robustbase} package is used to perform the regression. <>= mfiPlot(fp,thresh=2,xlab="MFI Ratio (Test MFI / Isotype MFI)",xlim=c(0.1,250), ylab="Percentage of cells above the isotype gate",pch=23) @ \clearpage \begin{figure} \begin{center} <>= print(flowViz::ecdfplot(~`FSC-H` | as.factor(Column.Id), data=plateSet(pbmcFPbgc), groups=Row.Id, auto.key=TRUE)) @ \end{center} \caption{Empirical Cumulative Distribution Function (ecdf) plot for FSC from the pmbc lymphocyte example plate. } \label{fig:ecdf} \end{figure} \clearpage \begin{figure} \begin{center} <>= mfiPlot(pbmcFPbgc,thresh=2,xlab="MFI Ratio (Test MFI / Isotype MFI)",xlim=c(0.1,250), ylab="Percentage of cells above the isotype gate",pch=23) @ \end{center} \caption{Plot of the MFI ratio versus the percentage of positive cells for the example PBMC lymphocytes. The robust best fit line is shown in red, and samples that are more than 2 standard residuals away from the line are shown in red.} \label{fig:mfiRatio} \end{figure} \clearpage %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Example: PBMC Large Panel Study} \label{section:PBMC} The PBMC dataset used in this example is available for download from ficcs.org as the \href{http://www.ficcs.org/data/plateData.tar.gz}{plateData.tar.gz} file. The data consists of 5 different PBMC samples that were analyzed with 189 different antibodies on 96-well plates. Each plate has a set of unstained, isotype, and control wells. Antibodies and isotype controls are arrayed 3 per well, but the data was compensated on the cytometer so there is no need to correct for spillover. This example assumes that the plates have been unpacked and stored in the folder "data/pbmc". Each of the 5 subfolders in the PBMC directory contains 96 fcs files. The text delimited file describing the layout of the plate is named pmbcPlateLayout.csv. Processing all 5 plates is memory intensive, so each plate was loaded separately and the resulting \Robject{flowPlate} was stored in an R data image in the pbmcRData folder. The code used to process plate 8774 is shown below. Settings for the remaining 4 plates were identical to plate 8774. <>= plateName <- "lymph08774" plateDescription <- read.delim("pmbcPlateLayout.csv", as.is=TRUE,header=TRUE,stringsAsFactors=FALSE) platePBMCraw <- flowPlate(data=read.flowSet(path="data/pbmc/08774"), plateDescription,plateName=plateName) platePBMC <- Subset(platePBMCraw, rectangleGate("FSC-H"=c(300,700),"SSC-H"=c(50,400)) & norm2Filter("SSC-H","FSC-H",scale.factor=1.5)) platePBMC <- setControlGates(platePBMC,gateType="Negative.Control") platePBMC <- applyControlGates(platePBMC) platePBMC <- summaryStats(platePBMC) save.image(file=paste("pbmcRData/",plateName,".Rdata",sep="")) @ Once the 5 plates have been processed, the \Robject{flowPlates} can then be combined using \Rfunction{fpbind}. In this case the plates were saved using the same name, platePBMC. The \Robject{flowPlates} were read into a list, and then combined into a large "virtPlate". <>= fileNames <- list.files("pbmcRData",full.names=TRUE) plates <- lapply(fileNames,function(x){ load(x) platePBMC }) virtPlate <- fpbind(plates[[1]],plates[[2]],plates[[3]],plates[[4]],plates[[5]]) @ Once the plates have been combined, xyplots and densityplots can then be conditioned on "plateName" to create graphics like Figure ~\ref{fig:expDens}. The R code used to generate Figure ~\ref{fig:expDens} is shown below. <>= print(densityplot(~ `FL2-H` | as.factor(plateName), transform("FL2-H"=log10) %on% virtPlate[c("C02","C03","A05")], layout=c(3,2),xlim=c(-0.2,2.5), filterResult="Negative.Control",lty=c(1,2,3,4), col=c('blue','black','red'))) @ \clearpage \begin{figure} \centering \includegraphics{expDens.pdf} \caption{Density plot for two test antibodies (red,black) and their associated isotype control (blue). The isotype gate is indicated with a vertical black bar.} \label{fig:expDens} \end{figure} \end{document}