\documentclass{article} % Commands to recognize Rnw file in xemacs % M-x noweb-mode % M-x font-lock-mode % -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{rflowcyt overview} %\VignetteKeywords{FACS, FCS, Flow Cytometry} %\VignetteDepends{rflowcyt,lattice,locfit} %\VignettePackage{rflowcyt} %% Postscript fonts \usepackage{hyperref} \usepackage{times} \usepackage{graphicx} \usepackage[authoryear,round]{natbib} %% layout \textwidth=6.2in \textheight=8.5in \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in %% Special fonts \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rargument}[1]{{\textit{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\code}[1]{{\texttt{#1}}} \title{Using RFlowCyt} \author{A.J. Rossini \and J.Y. Wan \and N. Le Meur} \date{\today} \begin{document} \maketitle \begin{abstract} This manual describes the usage of the functions in the \Rpackage{rflowcyt} library package, part of the Bioconductor\footnote{\url{http://www.bioconductor.org/}} project. The main categories are Data Mananagement and Retrieval, Flow Cytometry Visualizations, Exploratory Analysis, Gating, and Flow Cytometry Hypothesis Testing and Statistical Inference. Examples are also shown for each category. \end{abstract} \tableofcontents \clearpage \section{Data Management and Retrieval} \label{SEC:data-man-intro} The \Rpackage{rflowcyt} tools in this category are used for the following tasks: \begin{enumerate} \item \Rfunction{read.FCS} to read in FCS binary files into R-objects of S4 class \Rclass{FCS} \item \Rfunction{[i,j]} to extract or subset information from the \Robject{data} (a \Robject{matrix} object) of the \Robject{FCS} R-object \item \Rfunction{[[i]]} to extract \Robject{metadata} (which is of S4 class \Rclass{FCSmetadata}) portion of the \Robject{FCS} R-object \item \Rfunction{[i,j]<-} to replace \Robject{data} information \item \Rfunction{addParameter} to add column variables in the \Robject{data} \item \Rfunction{[[i]]<-} to replace or add new information to the \Robject{metadata} portion of the \Robject{FCS} R-object \item \Rfunction{as} to coerce among \Rclass{FCS}, \Rclass{data.frame} and \Rclass{matrix} class objects \item \Rfunction{equals} to demonstrate equality between two FCS R objects \item \Rmethod{print-methods}, \Rmethod{show-methods} print or show methods for \Robject{FCS} objects \item \Rfunction{checkvars} to check for any discrepancies between the \Robject{metadata} and the \Robject{data} of the \Robject{FCS} object \item \Rfunction{fixvars} to fix the \Robject{metadata} to relect the information obtained from the \Robject{data} if there are discrepancies \item \Rfunction{summary-methods} to summarize the \Robject{FCS} R-object with Tukey's five number summary of the \Robject{data} and with slot information in the \Robject{metadata} and to output an \Robject{FCSsummary} S4 class object \end{enumerate} \subsection{Datasets} \label{SUBSEC:Datasets} There are four types of data sets that are available in the required data package \Rpackage{rfcdmin}. The first two types of data set consists of raw binary Flow Cytometry Standard (FCS) files, and the second type consists of R-objects of S4 class \Rclass{FCS} and is the result of reading in FCS binary files using \Rfunction{read.FCS} or \Rfunction{read.series.FCS} function. Table~\ref{tab:1} summarizes the current reading information for the raw binary files in the \Rpackage{rfcdmin} data package. For more information about the binary FCS files and the dataset contain in the \Rpackage{rfcdmin} package, look at the package documentation files using the commands in R: <>= help(package="rfcdmin") @ <>= table1<-rbind(c("2.0", "UW", "facscan", "8", "0-256"), c("3.0", "FHCRC", "DiVa", "10", "0-1024"), c("2.0", "BCCRC", "FACSCalibur", "10", "0-1024")) table1 <- as.data.frame(table1) ## column names are the summary variables colnames(table1)<-c("FCS Version", "Source", "Machine", "bit resolution", "Integer range") ## rownames are the names of the FCS binary files rownames(table1)<-c( "facscan256.fcs","SEB-NP22.fcs","A06-H06") save(table1,file="table1.Rda") @ <>= if (require(xtable)) { xtable(table1, caption="Example FCS binary files in 'rfcdmin' package that can be read in using read.FCS or read.series.FCS. (UW: University of Washington, Seattle; FHCRC: Fred Hutchinson Cancer Research Center, Seattle; BCCRC: British Columbia Cancer Research Center, Vancouver)", label="tab:1") }else { cat("XTABLE not present; please install to get the right table.") } @ %def xtable.table1 \subsubsection{Binary FCS data files} \label{SUBSUBSEC:BinaryFCSFiles} The Flow Cytometry Standard (FCS) binary files consist of a HEADER, a TEXT, a DATA, and an optional ANALYSIS segment. The HEADER in ASCII text gives information about the version of the FCS file and the byte offsets of the beginning and ending of the other segments within the FCS file. The FCS version 3.0 is currently used and has been updated from version 2.0 to accommodate data sets longer than 99,999,999 bytes and allowing for primary and supplemental portions within the TEXT segment, among other changes. Located after the HEADER, the TEXT segment in ASCII text includes summary information in keywords such as the number of observations and names of column variables in the DATA segment. The DATA segment that follows consists of the raw binary data. The optional ANALYSIS segment includes some results of earlier data analyses \citep{robin:2001}. When specific immunofluoroescence signals are received and digitized by the analog-to-digital converters (ADCs) of a flow cytometer machine, the measurements are grouped into a number of bins based on the bit-resolution of the ADC. Thus, a \textit{n}-bit resolution ADC will group the data into $2^n$ bins or "channels". Thus, each immunofluoroescence measurement variable is actually categorical and has an integer range from 0 to $2^n$, depending on the ADC bit-resolution, which is usually 10 or 8 \citep{robin:2001}. \subsubsection{Reading in the FCS binary file} \label{SUBSUBSEC:readFCS} The subsequent code allows us to call the \Rpackage{rflowcyt} library. If the \Rpackage{rflowcyt} library is in the working library location, then the \Rargument{library.location} is a character string identifying the location of the \Rpackage{rflowcyt} library. <>= library(rflowcyt) if (!require(rfcdmin)) { stop("rfcdmin not available?") } @ The raw binary FCS files can have the extension "\*.fcs" or no extension at all. The raw binary FCS files can be read one by one using the \Rfunction{read.FCS} function or in a set using the \Rfunction{read.series.FCS} function. The output is either a \Robject{FCS} R object of S4 or S3 class or a list of \Robject{FCS} R-objects. (\textit{NB: Currently, the \Rpackage{rflowcyt} package implements functions and methods with the S4 class.}) In order to read in the FCS binary file, the location of the FCS binary file in the fcs or bccrc directories of the \Rpackage{rfcdmin} package has to be input as a parameter in the calling for \Rfunction{read.FCS} or \Rfunction{read.series.FCS}. <>= fcs.loc <- system.file("fcs", package="rfcdmin") @ After finding the \*.fcs file location, we will read in the raw binary file "facscan256.fcs" using \Rfunction{read.FCS} and call it \Robject{FC.FCSRobj}. In order to demonstrate a S3-to-S4 class change, we will incorrectly read in the binary file as an S3 object. <>= file.location <- paste(fcs.loc, "facscan256.fcs", sep="/") FC.FCSRobj <- read.FCS(file.location, UseS3=TRUE, MY.DEBUG=FALSE) @ The following are \Robject{FCS} R-objects which are readily accessible in R and can be used for analysis using the tools in the \Rpackage{rflowcyt} package. Prior to this release, the FCS class has been S3. Now the FCS class among other classes (\Rclass{FCSmetadata}, \Rclass{FCSgate} (described in the Gating section), \Rclass{FCSggobi} (still in working progress) ) are S4. Use \textbf{convertS3toS4} to convert S3-class \Robject{FCS} to S4-class \Robject{FCS} R objects. To exemplify the conversion, the following demonstrates the conversion of an S3 \Robject{FCS} R-object to an S4 \Robject{FCS} R-object: <>= FC.FCSRobj<-convertS3toS4(FC.FCSRobj, myFCSobj.name="FC.FCSRobj", fileName=file.location) @ The \Rfunction{read.series.FCS} function allows to read multiple FCS files. The output is a list of \Robject{FCS} R-object. <>= pathFiles <- system.file("bccrc", package="rfcdmin") drugFiles <- dir(pathFiles) drugData <- read.series.FCS(drugFiles, path=pathFiles, MY.DEBUG=FALSE) @ A \Robject{FCS} R-object has slots \Robject{data} and \Robject{metadata}. The \Robject{data} component is a matrix in which the rows are the individual cells or observations, and the columns are the different immunofluoroescence measurements or variables. The metadata is of S4-class \Rclass{FCSmetadata} and has slots referring to keywords that are in the TEXT segment of the FCS raw binary file. Information such as variable names (\Robject{longnames} and \Robject{shortnames}) and ranges (\Robject{paramranges}) are also slots in the \Robject{metadata} component. For more details, see the help files for \Robject{FCS} and \Robject{FCSmetadata}. \subsubsection{Opening \Rpackage{rflowcyt} data with \Robject{FCS} R-objects} \label{SUBSUBSEC:dataFCSRobj} The \Rpackage{rfcdmin} data package also contains archived \Robject{FCS} R-object: <>= data(VRCmin) data(MC.053min) data(flowcyt.fluors) @ For more details of the \Robject{FCS} R-object available see \Rfunction{data(package="rfcdmin")}. \subsubsection{Other S4 class R-objects} \label{SUBSUBSEC:OtherS4objs} Besides the \Rclass{FCS} class, other S4 class R-objects include \Rclass{FCSmetadata}, \Rclass{FCSsummary}, and \Rclass{FCSgate}. The \Rclass{FCS} class is the class of all FCS files that are read into R using \Rfunction{read.FCS} or \Rfunction{read.series.FCS}. The \Rclass{FCSmetadata} is the class of the \Robject{metadata} slot of an \textit{FCS} R-object. The \textit{FCSsummary} class is the class of the output of the \Rmethod{summary} method implemented on a \Robject{FCS} R-object. The \Rclass{FCSgate} class contains the \Rclass{FCS} class and extends it to include gating information (ie, the information about the indexing of row observations for subsequent extraction). The following is a brief summary of the available, generic methods associated with each class object. \begin{itemize} \item \textbf{"new" Generic Method} Default objects can be made by using the \Rmethod{new(object-contents, S4-class-name)} method, where object-contents refer to the contents of each slot for the specified S4-class-name. The following commands produce default class objects with no slot information: <>= ## default S4 objects new.FCS <- new("FCS") new.FCSmetadata <- new("FCSmetadata") new.FCSsummary <- new("FCSsummary") new.FCSgate <- new("FCSgate") @ \item \textbf{"coercian" Generic Method} Currently, there are only coercian methods to and from the \Rclass{FCS} class and the \Rclass{matrix} and \Rclass{data.frame} classes. A user makes a \Robject{FCS} R-object by the coercian method \Rmethod{as} exemplified in the following code, where data2 is a \Robject{matrix} or \Robject{data.frame} identifying the rows as the cell observations and the columns as the different variables: <>= data2 <- rbind(1:10, 2:11, 3:12) ## coerce data into a matrix object data2.matrix <- as(data2, "matrix") ## coerce data into a data.frame object data2.df <- as.data.frame(data2) ## coercing matrix into FCS test.FCSRobj <- as(data2.matrix, "FCS") ## coercing data.frame into FCS test.FCSobj2 <- as(data2.df, "FCS") ## coercing a FCS object to a matrix original.matrix <- as(test.FCSobj2, "matrix") ## coercing a FCS object to a data.frame original.matrix <- as(test.FCSobj2, "data.frame") ## assigning the metadata metadata <- new("FCSmetadata", size=dim(data2)[1], nparam=dim(data2)[2], fcsinfo=list("comment"="This is a pseudo FCS-R object.")) test.FCSRobj@metadata<-metadata test.FCSRobj @ \item \textbf{"is" Generic Method} The S4 R-object class can be verified by using the \Rmethod{is} method. <>= is(MC.053, "matrix") is(MC.053, "FCS") is(MC.053@metadata, "FCSmetadata") is(MC.053, "FCSgate") @ The \Rclass{FCSsummary} class is exemplified below: <>= sum.FCS <- summary(MC.053) is(sum.FCS, "FCSsummary") @ \end{itemize} \subsection{Tools to access and manipulate \Robject{FCS} R-objects} \label{SUBSEC:DescriptTools} \subsubsection{Descriptive tools for \Rclass{FCSmetadata} class R-objects} \label{SUBSUBSEC:DescriptToolsFCSmetadata} To access or extract the metadata components, use either the \Rfunction{@} or \Rfunction{metaData}. <>= ## returns the same FCSmetadata object meta1<-st.1829@metadata meta1<-metaData(st.1829) @ To get a description of the \Rclass{FCSmetadata} class, use for example the \Rfunction{show}, \Rfunction{print}, and \Rfunction{summary} functions. <>= show(st.1829@metadata) @ The following code would output the \Robject{metadata} as a string and is not shown because of its lengthy output. <>= summary(st.1829@metadata) @ The slots for an \Rclass{FCSmetadata} are summarized in Table~\ref{tab:4}. <>= slotnames<-c( "mode", "size", "nparam", "shortnames", "longnames", "paramranges", "filename", "objectname", "original", "fcsinfo") description<-c("Mode", "number of cells/rows", "number of column parameters", "shortnames of column parameters", "longnames of column parameters", "Ranges/Max value of the columns", "original FCS filename", "name of the current object", "current object original status", "misc.metadata info") table4 <- data.frame(slotnames, description) save(table4, file="table4.Rda") @ <>= if (require(xtable)) { xtable(table4, caption="FCSmetadata slot descriptions", label="tab:4") }else { cat("XTABLE not present; please install to get the right table.") } @ %def xtable.table4 Slots and slot components of the metadata can be retreived by using \Rfunction{@}, \Rfunction{[i]}, or \Rfunction{[[i]]}. Currently to extract metadata information, we can use a single character string index being one of the slotnames in Table ~\ref{tab:4} or one of the slotnames in the \Robject{fcsinfo} slot. In the case that there is a common slotname that is also in the \Robject{fcsinfo} slot, only the slot from Table ~\ref{tab:4} will be retrieved. A single numeric index or a vector of numeric indices refers to only the slot positions of the \Robject{fcsinfo} slot. The following examples extract the column parameter ranges or maximum value. <>= ## extracting the ranges st.1829@metadata@paramranges st.1829@metadata["paramranges"] st.1829@metadata[["$PnR"]] @ A single component of the range can also be retrieved <>= st.1829@metadata[["$P1R"]] @ Items in the \Rclass{FCSmetadata} can be replaced by using \Rfunction{[...]<-} or \Rfunction{[[...]]<-}. The following example will replace the \Robject{longnames} with dummy names. <>= ## longnames before the change st.1829@metadata["longnames"] ## some longname changes st.1829@metadata["longnames"] <- rep("dummy", length(st.1829@metadata["longnames"])) ## name the third column longname as "wrongname" st.1829@metadata["$P3S"] <- "wrongname" ## longnames after the change st.1829@metadata["longnames"] @ To extract and replace slots of the metadata of a \Robject{FCS} object, use only \Rfunction{[[..]]} and \Rfunction{[[...]]<-}, respectively. <>= ## extraction shortnames.1829 <- st.1829[["shortnames"]] shortnames.1829 ##replacement st.1829[["$PnR"]] st.1829[["$P1R"]] <- 0 st.1829[["paramranges"]] st.1829[["newslot"]] st.1829[["newslot"]] <- "this is even cooler" st.1829[["newslot"]] @ When using the replacement method for a \Robject{FCSmetadata} R-object (\textit{i.e.},\Rfunction{[...]<-} or \Rfunction{[[...]]<-}), if the slotname is not found, then a new slot with the current character index is made under the \Robject{fcsinfo} slot. In the following example, we will add a new slot named \Robject{newslot} to the metadata. <>= ## making a newslot st.1829@metadata[["newslot"]]<- "wow this is cool" ## newslot is automatically made in the "fcsinfo" slot st.1829@metadata@fcsinfo[["newslot"]] @ \subsubsection{Descriptive tools for \Rclass{FCS} class R-objects} \label{SUBSUBSEC:DescriptToolsFCS} To access or extract the data components, use either \Rfunction{@} or \Rfunction{fluors}. <>= ## returns the same FCSmetadata object meta1<-st.1829@metadata meta1<-metaData(st.1829) @ <>= ## returns the same data matrix data1<-st.1829@data data1<-fluors(st.1829) summary(data1) @ A set of descriptive tools are attached to the FCS R-object. The method \Rmethod{print} (using an \Robject{FCS} object in its signature) will automatically give a short summary of the \Robject{FCS} R-object without printing out all the contents of the data and the metadata. The following examples are different incantations of the \Rmethod{print} method for FCS objects: <>= print(unst.1829) print(MC.053) @ A longer and more detailed summary with statistics for the column variables can be displayed by using the \Rmethod{summary} method, whose output is a \Rclass{FCSsummary} S4 class. To extract and replace components within the data matrix of a \Robject{FCS} object, use only \Rfunction{[..]} and \Rfunction{[...]<-}, respectively. <>= ## extraction first 10 rows firstten.1829 <- as(st.1829[1:10,], "matrix") firstten.1829 ## etraction of single element firstobs.1829 <- as(st.1829[1,1], "matrix") firstobs.1829 ##replacement of first element st.1829[1,1] <- 99999999 as(st.1829[1,1], "matrix") st.1829[1,1]<-firstobs.1829 as(st.1829[1,1], "matrix") st.1829[1,1] @ Note that the "original" slot within the metadata is only changed to FALSE when the data is changed. Changing the metadata itself will not alter the status of the "original" slot. <>= ## the data was changed so the original flag should be FALSE st.1829[["original"]] @ The function \Rfunction{dim.FCS} retrieves the dimensions of the data matrix of a \Robject{FCS} R-object. <>= dim.1829 <- dim.FCS(st.1829) dim.1829 @ A data parameter column can be appended to the data matrix of a \Robject{FCS} R-object by using the method \Rmethod{addParameter}, which will also result in the change of the "original" metadata slot to be FALSE. <>= column.to.add <- rep(0, dim.1829[1]) st.1829 <-addParameter(st.1829, colvar=column.to.add, shortname="test", longname="example", use.shortname=FALSE) @ \clearpage \section{Data Assessment} \subsection{Checking Validity of the \Robject{FCS} R-object and Fixing errors} \label{SUBSEC:checkfixvars} The method \Rmethod{checkvars} checks the ranges, dimensions, and the column variable names of the data against what is specified in the metadata. If details are not specified in the metadata, then the available information is added to the metadata. The output is a boolean as to whether the object passes the check. The option, \Rargument{MY.DEBUG=TRUE}, allows us to view the checking statments. <>= st.1829.checkstat <- checkvars(st.1829, MY.DEBUG=TRUE) st.1829.checkstat @ Because st.1829 has been altered such that there is a discrepancy between the metadata and the data portions of this \Robject{FCS} object, \Rfunction{fixvars} will be used to correct major errors. <>= if (st.1829.checkstat==FALSE){ ## fix the FCS R object st.1829 <- fixvars(st.1829, MY.DEBUG=TRUE) } @ The original FCS R-object can be retrieved by using the function \Rfunction{get}, if the original object is on the current workspace and has been unchanged. Alternatively, the original FCS R-object can be obtained by reading in the binary, fcs file from the /fcs directory (if this raw binary file exists) of the data package \Rpackage{rfcdmin}. <>= st.1829 <- get(st.1829[["objectname"]]) original.FC.FCSRobj <- read.FCS(FC.FCSRobj[["filename"]], MY.DEBUG=FALSE) @ \subsection{Equality between \Robject{FCS} objects} \label{SUBSEC:FCSequal} Two \Robject{FCS} objects can be checked for equality by using the \Rmethod{equals} method. The default check is to verify the equality of the metadata (except for the \Rargument{filename} and the \Rargument{objectname}) and all the elements of the data. <>= ## default is to not check the equality ## of filenames and objectnames and ## only check the equality of the data and ## the other metadata slots equals(st.1829, unst.1829) @ The \Rargument{check.filename} and \Rargument{check.objectname} set to TRUE will allow the equality verification of the \Robject{filename} and \Robject{objectname} slots in the metadata. <>= ## check equality of everything in the metadata ## and the data of the FCS objects equals(st.1829, st.1829, check.filename=TRUE, check.objectname=TRUE) @ \subsection{Data quality assessment between \Robject{FCS} objects} \label{SUBSEC:FCSassessment} Data assessment is the act of inspecting data, measuring the defects and analyzing the cause (and potentially the impact of those defects). You may have an experiment where one sample has been divided in several aliquots and therefore you should be able to compare some measured parameters like the size (FSC - Forward SCatter) or the granularity (SSC - Side SCatter) of the different aliquots. In order to assess the data quality, we propose 3 graphical methods to explore the data and to detect whether any aliquots or samples were substantially different from the others, in a way that were not likely to be biologically motivated and therefore misleading. \begin{enumerate} \item Boxplot (or box-and-whisker) plot is a graphical representation of dispersions and extreme scores. Represented in this graphic are minimum, maximum, and quartile scores in the form of a box with "whiskers." The box includes the range of scores falling into the middle 50\% of the distribution (median) (Inter Quartile Range = 75 th percentile - 25 th percentile) and the whiskers are lines extended to the minimum and maximum scores in the distribution or to mathematically defined (+/-1.5*IQR) upper and lower fences. Boxplot summarizes the location and the shape of the distribution. For more details, see \Rfunction{boxplot} and \Rfunction{boxplot.FCS} in the \Rpackage{graphics} and\Rpackage{rflowcyt} package, respectively . \item Empirical cumulative distribution function (\Rfunction{ecdf}) reveals differences in the distributions. For more details, see \Rfunction{ecdf} and \Rfunction{plotECDF.FCS} in the \Rpackage{stats} and\Rpackage{rflowcyt} package, respectively. \item Density plot reveals the shape of the distributions.For more details, see \Rfunction{density.lf} and \Rfunction{plotdensity.FCS} in the \Rpackage{locfit} and \Rpackage{rflowcyt} package, respectively. \end{enumerate} As an example, we draw the differents plot for two different datasets from the British Columbia Cancer Research Center, Vancouver (Canada): \begin{enumerate} \item Time course experiment: collection of weekly peripheral blood samples from 1 patient (R object \Robject{flowcyt.fluors} in the \Rpackage{rfcdmin} data package). \item Cell line dataset: Flow Cytometry High Content Screening (FC-HCS) of a 2000-compound library by the mean of one cell line (FCS files from A06 to H06 in the \Rpackage{rfcdmin} data package -in this experiment, the name of the sample corresponds to its position in a 96 wells plates) \citep{gasparetto:smith:2004}. \end{enumerate} For more details, see \Rfunction{help="rfcdmin"}. \subsubsection{Time course experiment} \label{SUBSUBSEC:FCSTimeCourse} This dataset is an abstract of a time couse experiment realised at the British Columbia Cancer Research Center (Thanks to R. Brinkman, C. Smith and M. Gasparetto). It is a collection of weekly peripheral blood samples from 1 patient, divided in 8 aliquots at each time point and labeled by 4 markers to identify 8 differents stains. First, we load the data: <>= require(rfcdmin) data(flowcyt.data) @ Then, we can draw a density plot for the Foward SCatter parameter (\Rargument{varpos}=1) at the time points 1 and 9. \clearpage \begin{figure}[htbp] \centering <>= ## Draw a density plot for the Foward scatter parameter old.par <- par(no.readonly=TRUE) mat <- matrix(c(1:2),1,2,byrow=TRUE) nf <- layout(mat,respect=TRUE) plotdensity.FCS(flowcyt.data[1:8], varpos=c(1), main="FSC density plot at time point 1", ylim=c(0,0.015), ylab="density of cells") legend(450,0.012,paste("stain",c(1:8),sep=""),col=c(1:8),pch=22) plotdensity.FCS(flowcyt.data[65:72], varpos=c(1), main="FSC density plot at time point 9", ylim=c(0,0.015), ylab="density of cells") legend(450,0.012,paste("stain",c(1:8),sep=""),col=c(1:8),pch=22) par(old.par) @ \caption{plotdensity.FCS: Density plots of the forward scatter parameter of the different stains, at different 2 different time points (\Robject{flowcyt.fluors} FCS R object)} \label{fig:plotdensitTimeCourse} \end{figure} At the time point 9, we can see that the data for the first stain looks peculiar. We can also observe the multimodal distribution of the data. \clearpage To minimize the effect of multimodal distribution on the shape of the distributions, we can represent the FSC data \textit{via} a ECDF plot. This representation eventially allows us to confirm the previously observed phenomenon. \begin{figure}[htbp] \centering <>= ##Draw an empirical cumulative density plot for the Foward scatter ##parameter of the different stains at a particular different time point ##(one panel per time point). print(plotECDF.FCS(flowcyt.data, varpos=c(1), var.list=c(paste("time",1:12,sep="")), group.list=paste("Stain",c(1:8),sep=""), main="ECDF of the FSC for different stains at a particular time point", lwd=2, cex=1.5)) @ \caption{plotECDF.FCS: Empirical Cumulative Distribution plot of the forward scatter parameter of the different stains, at different 12 different time points (\Robject{flowcyt.fluors} FCS R object)} \label{fig:plotECDFTimeCourse} \end{figure} \clearpage Finally, we can draw a boxplot for the Foward SCatter parameter (\Rargument{varpos}=1) at different time points (\textit{e.g.} time points 1, 3, 7 and 9) \begin{figure}[htbp] \centering <>= ## Draw a boxplot for the Foward SCatter parameter for the time points 1 ## and 6 (in this experiment, each time point corresponds to a column of ## a 96 wells plates) old.par <- par(no.readonly=TRUE) mat <- matrix(c(1:4),2,2,byrow=TRUE) nf <- layout(mat,respect=TRUE) print(boxplot.FCS(flowcyt.data[1:8], varpos=c(1),col=c(1:8), main="FSC across stains time point 1", names=paste("stain",c(1:8),sep=""))) print( boxplot.FCS(flowcyt.data[17:24], varpos=c(1), col=c(1:8), main="FSC across stains time point 3", names=paste("stain",c(1:8),sep=""))) print( boxplot.FCS(flowcyt.data[49:56], varpos=c(1), col=c(1:8), main="FSC across stains time point 7", names=paste("stain",c(1:8),sep=""))) print( boxplot.FCS(flowcyt.data[65:72], varpos=c(1), col=c(1:8), main="FSC across stains time point 9", names=paste("stain",c(1:8),sep=""))) par(old.par) @ \caption{boxplot.FCS: Boxplot of the Foward SCatter parameter of the different stains at different 4 different time points (\Robject{flowcyt.fluors} FCS R object)} \label{fig:BoxplotTimeCourse} \end{figure} \clearpage The boxplot representations also confirm that, at time point 9, the first stain looks peculiar. Because of the multimodal distribution, the boxplot representation can be criticized but it still give us a good overview of the location of the distribution . For example, we can also see that at time point 1 and 3 the median of the FSC parameter is around 200 and that this parameter falls to 100 at time point 7 (observations not easily made in the density plot and not available in the ECDF representation). \clearpage \subsubsection{Cell line experiment} \label{SUBSUBSEC:FCSCellLine} This dataset is an abstract of a Flow Cytometry High Content Screening (FC-HCS) of a 2000-compound library (the reagent being a cell line) to identify compounds that would enhance the anti-lymphoma activity of the therapeutic antibofy rituximab~\citep{gasparetto:smith:2004}. It is a collection of 8 FCS files. First, we read the 8 FCS files, <>= if (require(rfcdmin)) { ##Obtaining the location of the fcs files in the data pathFiles<-system.file("bccrc", package="rfcdmin") drugFiles<-dir(pathFiles) ##Reading in the FCS files drugData<-read.series.FCS(drugFiles,path=pathFiles,MY.DEBUG=FALSE) ##Extract fluorescent information from the serie of FCS files drug.fluors<-lapply(drugData,fluors) } @ Then, we can draw the different plots for the Foward SCatter parameter (\Rargument{varpos}=1) of the cell line aliquots treated with different compounds. As in the previous example, we can observe some noise in the data. \begin{figure}[htbp] \centering <>= ##Draw a density plot for the Foward SCatter parameter for the ##differents aliquots (of the same cell line) tested with different ##compounds. plotdensity.FCS(drugData, varpos=c(1), main="FSC for aliquots treated with different compounds", ylim=c(0,0.005), ylab="Density of cells") @ \caption{plotdensity.FCS: Density of the Foward SCatter parameter of the cell line aliquots treated with different compounds.} \label{fig:plotDensityCellLine} \end{figure} \clearpage \begin{figure}[htbp] \centering <>= ##Draw a boxplot for the Foward SCatter parameter ##for the differents aliquots (of the same cell line) ##tested with different compounds. print( boxplot.FCS(drugData, varpos=c(1), col=c(1:8), main="FSC of differents aliquots from a cell line treated with different compounds.")) @ \caption{boxplot.FCS: Boxplot of the Foward SCatter parameter of the cell line aliquots treated with different compounds.} \label{fig:plotECDFCellLine} \end{figure} \clearpage \begin{figure}[htbp] \centering <>= ##Draw a empirical cumulative density plot for the Foward scatter ##parameter for the differents aliquots (of the same cell line) ##treated with different compounds. print(plotECDF.FCS(drugData, varpos=c(1), var.list=c("Serie"), group.list=paste("compound",c(1:8),sep=""), main="ECDF for different aliquots treated with diffrent compounds.", lwd=2, cex=1.5)) @ \caption{plotECDF.FCS: Empirical cumulative distribution plot of the Foward SCatter parameter of the cell line aliquots treated with different compounds.} \label{fig:BoxplotCellLine} \end{figure} \clearpage \section{Data Visualizations} \label{SEC:flowcyt-vis-intro} In this section, we include visualization tools that help analyze the multivariate flow cytometry data. Because each cell has multiple immunofluoroescence and light scatter measurements, we have made alternatives to visualize, beyond the ordinary bivariate scatterplots, the cell distributions based on the different measurements. The common approach in the field circumvents the visualization of data on all variables by selecting a subset of "interesting" cells by a sequential progression of 1 and 2 dimensional gating steps. Gating refers to the selection of a region of cells or observations in a bivariate or univariate plot by placing boundaries around the region. These boundaries or thresholds based on a particular immunofluoroescence or light scatter measurement are refered to as \textbf{gates}. The sequence of gating steps is based on certain pairs of measurements or individual measurement, in which the gated region in a previous step is subsequently gated further in the next gating step. First we discuss the bivariate and multivariate plotting tools and then the gating tools. \subsection{Bivariate Plotting Tools} \label{SUBSEC:BivarPlotTools} The basic bivariate plots are the \Rfunction{ContourScatterPlot} with hexgonal binning without contours or rectangular binning with superimposed contour levels and the \Rfunction{parallelCoordinates} plot which is either an \Rfunction{ImageParCoord} or a \Rfunction{JointImageParCoord} plot. \subsubsection{ContourScatterPlot} \label{SUBSUBSEC:ContourScatterPlot} The \Rfunction{plotvar.FCS} has the options of plotting specified variables from an FCS R-object. A univariate histogram or ContourScatterPlot with hexgonal binning or rectangular binning can be shown with the appropriate specified options. Here we will demonstrate with the FCS R object \Robject{unst.1829} the uses of \Rfunction{plotvar.FCS}. \begin{figure}[htbp] \centering <>= plotvar.FCS(unst.1829, varpos=c(1)) @ \caption{plotvar.FCS: Plotting a single variable histogram with the \Robject{unst.1829} FCS R object} \label{fig:plotvarhist} \end{figure} \clearpage \begin{figure}[htbp] \centering <>= plotvar.FCS(unst.1829, varpos=c(3,4), hexbin.CSPlot=FALSE) @ \caption{plotvar.FCS: Plotting a bivariate ContourScatterPlot with rectangular binning with the \Robject{unst.1829} FCS R object} \label{fig:plotvarrectbin} \end{figure} The function \Rfunction{ContourScatterPlot} will make an image plot using rectangular bins of counts produced by the function \Rfunction{make.grid} by default. Also by default, there are superimposed contour levels that are also drawn on the plot with rectangular image binning. The \Rfunction{make.grid} function is used by the \Rfunction{ContourScatterPlot} function to make an count matrix for the number of observations in a two-dimensional grid layout. This function will output a matrix of counts ("z") as well as the total number of observations ("n.cells") within this matrix. The count matrix for the image plot has 25 unit cut-offs and can be changed by the \Rargument{x.grid} and \Rargument{y.grid} options. Alternatively, if there is a \Rargument{status} or binary response variable for the data, other values such as the difference in counts, proportions, normalized proportions, and z statistics can be calculated by \Rfunction{make.density} for the rectangular bins of the image plot. Currently, a roughly estimated color legend is available for this rectangular binning with the \Rfunction{legend.CSP} function. Alternatively, however, there is an option for hexgonal binning with an appropriate legend. Note that the Bioconductor \Rpackage{hexbin} package is necessary for this plot option. The hexagonal binning does not have superimposed contour levels nor does it have the option to estimate other values besides counts in its bins. We will demonstrate the use of \Rfunction{ContourScatterPlot} to make the same plots exemplified earlier with \Rfunction{plotvar.FCS}. These plots are not shown. The following code extracts the third and the fourth column variables of the FCS R object \Robject{unst.1829}. <>= ## obtain the two column variables xvar<-as(unst.1829[,3], "matrix") yvar<-as(unst.1829[,4], "matrix") @ The \Rfunction{ContourScatterPlot} function is implemented to make a plot with hexgonal binning and a legend. Other parameters such as binning style and number of bins can also be specified in the signature. <>= ## hexagon cells without contour lines; default n.hexbins=100 ContourScatterPlot(xvar, yvar, xlab=unst.1829[["longnames"]][3], ylab=unst.1829[["longnames"]][4], main="Individual unst.1829", hexbin.plotted=TRUE) @ A plot can be made that has rectangular binning. The color of the image map (via the \Rargument{image.col} option) can be changed as well as the size of the rectangular bins by \Rargument{x.grid} and \Rargument{y.grid} options. A legend can be displayed in a separate plot by setting the option \Rargument{plot.legend.CSP} = TRUE. <>= ## rectangular cells with the contour plot ContourScatterPlot(xvar, yvar, xlab = unst.1829[["longnames"]][3], ylab = unst.1829[["longnames"]][4], main = "Individual 042402c1.053", hexbin.plotted = FALSE, numlev = 25, image.col = heat.colors(15)) @ \subsection{Multivariate Plots} \label{SUBSEC:multivarplot} The FCS R-object can be plotted using the generic \Rfunction{plot.FCS} or \Rfunction{plot} command which will make a pairs plot (by default) or a parallel coordinates plot. Here we show a default pairs plot using rectangular binning : \begin{figure}[htbp] \centering <>= ## should be able to implement because it is a pairsplot print(plot(unst.1829)) @ \caption{unst.1829: Default Pairs plotting with rectangular bins} \label{fig:PlotFCSobj} \end{figure} The same plot can be made using hexagonal binning; the code is shown, but the plot will not be displayed. \textbf{This is currently broken.} <>= ## plot(st.1829, alternate.hexbinplot=TRUE) @ Additional parameters for the pairsplot of a data matrix can be referenced by the \Rfunction{pairs.CSP} function. Currently a color legend can be plotted in the lower panels for \Rfunction{pairs.CSP} only for the rectangular binning. There is currently no legend available for pairs.CSP using hexagonal binning. The parallel coordinates plot tracks each observation whose value is plotted on the vertical, y-axis through a series of variables on the horizontal, x-axis. The observation is tracked by a line from one variable to the next. The order of the column variables on the horizontal axis is the order that is presented in the input data matrix. Here we make a parallel coordinates plot for the data portion of the \Robject{st.1829} FCS R-object. Because there are too many cell or row observations, we only show the first 10 observations in this parallel coordinates plot. \begin{figure}[htbp] \centering <>= par(mfrow=c(1,1)) row.obs<-1:10 parallelCoordinates(as(unst.1829[row.obs,], "matrix")) @ \caption{Parallel Coordinates plot of the first ten observations in the data of unst.1829.} \label{fig:ParallelCoordinatesPlot} \end{figure} It is important to note that all column variables in this plot must have the same range and scaling. We can force scaling on a [0,1] scale by using the option \Rargument{scaled} set to TRUE. We can also give group certain observations by color (\Rargument{group.col}), type (\Rargument{group.lty}), and width (\Rargument{group.lwd}) of line. New observations can also be added at a time by setting \Rargument{superimpose} to be TRUE or by using the function \Rfunction{add.parallelCoordinates}. The following example shows these other options: \begin{figure}[htbp] \centering <>= row.obs<-1:10 parallelCoordinates(as(unst.1829[row.obs,], "matrix"), scaled=TRUE, group=c(rep(1, 5), rep(2, 5))) @ \caption{Scaled Parallel Coordinates plot of the first ten observations in the data of unst.1829, where the first 5 observations are in one group, and the next five observations are in the second group.} \label{fig:ParallelCoordinatesPlot2} \end{figure} Because there are many cell or row observations, an ImageParCoord or JointImageParCoord plot can be used to show all of the row observations by binning on the y-axis and having the different column variables as labels on the x-axis. There are superimposed parallelCoordinates lines on the colored binning that demonstrate the movement of observations from one bin of one variable to another bin of the next variable. In an \Robject{ImageParCoord}, these lines represent moves only between two adjacent variables, and in a JointImageParCoord, the lines represent movement among all of the variables. The plots are subject to change with the ordering of the column variables as labels on the x-axis of the plots. Additional parallelCoordinates lines can be added to any existing plot using the \Rfunction{add.parallelCoordinates} function. The following series of graphs exemplify the Image parallelCoordinates plots. Only the first 5 column variables and the first 1000 observations will be shown. \begin{figure}[htbp] \centering <>= ## need to separate legend plotting output1<-ImageParCoord(unst.1829@data[1:1000, 1:5], num.bins=16, title="1000 obs 16 bins 5 trans", ntrans=5, legend.plotted=FALSE, plotted=TRUE, image.plotted=TRUE, lines.plotted=TRUE, MY.DEBUG=FALSE) @ \caption{This plot is an Image Parallel Coordinates plot of the first 1000 observations and the first 5 column variables in the "data" of unst.1829.} \label{fig:ImageParCoordPlot} \end{figure} \clearpage The functions \Rfunction{ImageParCoord} and \Rfunction{JointImageParCoord} can also plot histograms and traditional parallel coordinates plots as diagnostics in addition to or separately from the image parallel coordinates plots when the option \Rargument{MY.DEBUG=TRUE}. <>= ## need to separate legend plotting output3<-JointImageParCoord(unst.1829@data[1:1000,1:5], num.bins=16, title="1000 obs 16 bins 5 trans", ntrans=5, legend.plotted=FALSE, MY.DEBUG=FALSE) @ \subsection{Dynamic Plotting Tools} \label{SEC:DynamPlot} Another multi-dimensional tool is \Rfunction{xgobi.FCS} which uses the \Rpackage{xgobi} library. We will leave the example for the user because the tool is interactive. Generally, by default \Rfunction{xgobi.FCS} will show the first 15 observations across all variables in the input data of the FCS R-object in a high-level multi-dimensional plot, in which the user is able to shift among sets of variables, color certain observations, and rotate visual perspectives of these observations amongst these variables. The function \Rfunction{xgobi.FCS} allows the user to input the FCS R-object, subset amongst the row observations, and subset amongst the column variables to show in an \Rpackage{xgobi} plot. Currently \Robject{ggobi} S4 objects are still being contructed and would extend \Rpackage{xgobi} with more dynamic plotting and subsetting features. The example code for the S3 \Rfunction{xgobi.FCS} is shown below but is left for the user to run separately. By default, only the first 15 rows and half of the column variables are shown. If \Rargument{subset.row} and \Rargument{subset.col} are specified, then these rows and columns will be displayed for the user to view interactively. In the second example, the first 6000 rows with the first 2 column variables are shown. <>= ## plots first 1/15 rows ## plots first 1/2 columns xgobi.FCS(unst.1829, title="unst.1829 default subset") ## plots all the rows ## plots only the first 3 columns xgobi.FCS(unst.1829, subset.row=1:6000, subset.col=1:2, title="unst.1829: 6000 rows, 2 vars") @ \section{Gating} \label{SEC:gating} <>= slotnames<-c("gate", "history", "extractGatedData.msg", "current.data.obs", "data", "metadata") description<-c("matrix of column indices for row selection", "vector of strings describing columns in gate", "vector of strings describing extraction of the data", "vector of the original row positions in current data", "matrix of column variables for rows denoting cells", "FCSmetadata object") table5 <- data.frame(slotnames, description) save(table5, file="table5.Rda") @ <>= if (require(xtable)) { xtable(table5, caption="FCSgate slot descriptions", label="tab:5") } else { cat("XTABLE not present; please install to get the right table.") } @ %def xtable.table5 <>= slotnames<-c("uniscut", "bipcut", "bidcut", "biscut", "biscut.quadrant", "") description<-c("univariate single cut", "bivariate polygonal cut", "bivariate double cut", "bivariate single cut", "values denoting the quadrant to be selected", "$+$/$+$, $+$/$-$, $-$/$-$, $+$/$-$") table6 <- data.frame(slotnames, description) save(table6, file="table6.Rda") @ <>= if (require(xtable)) { xtable(table6, caption="Types of Gating", label="tab:6") } else { cat("XTABLE not present; please install to get the right table.") } @ %def xtable.table6 <>= slotnames<-c("gateNum", "gateName", "type", "biscut.quadrant", "data.colpos", "data.colnames", "IndexValue.In", "gatingrange", "prev.gateNum", "prev.gateName", "comment") description<-c("column position in 'gate' matrix", "name of gate index", "type of gating", "quadrant selected, if gating type is 'biscut'", "'data' column variable positions used in gating", "'data' names of the column variables used in gating", "value of the gating index denoting inclusion", "vector of gating thresholds", "gateNum of previous gating, if any", "gateName of previous gating, if any", "comment by user for this gating index") table7 <- data.frame(slotnames, description) save(table7, file="table7.Rda") @ <>= if (require(xtable)) { xtable(table7, caption=paste("Description of 'extractGateHistory' output:", "Gating Details", sep =" "), label="tab:7") } else { cat("XTABLE not present; please install to get the right table.") } @ %def xtable.table7 The \Rclass{FCSgate} class extends the S4 FCS class. The slots of the S4 FCSgate class are summarized in Table ~\ref{tab:5}. There are three aspects to gating that are summarized below: \begin{description} \item[Create Gating Index] Initially, a gating index will be created. This binary index will denote the selection of row observations in the data and will be appended as a column to the gate matrix. The extension of the FCS object to a FCSgate object results from the S4 methods \Rfunction{createGate} and \Rfunction{icreateGate}, an interactive method with user prompts for option values. Table ~\ref{tab:6} summarizes the types of gates or cuts that can be used to select the data. Currently, there are only gates involving one (\textit{i.e.} univariate) or two (ie, bivariate) column variables of the data. A single or double cut refers to the number of thresholds for each variable. For an example, if there is a bidcut, then there are two thresholds for each of the two variables. The group of observations lying within these bivariate thresholds are chosen. In the bivariate polygonal cut "bipcut", the selection ranges describe a polygonal shape which could be a square or any other closed linear shape description. \item[Extract Gated Data] In order to collapse the data given the row selection index, the method \Rmethod{extractGatedData} will subset the data according to a specific value of the selection index (\textit{i.e.} \Rargument{IndexValue.In}) and to a particular column in the gate matrix. Information about the extraction will be updated in the corresponding element of the \Robject{extractGatedData.msg} vector. The metadata will also be updated in terms of row size and the "original" flag will be set as FALSE. The "current.data.obs" will also be subset according to the selection index. In summary, the S4 method \Rmethod{extractGatedData} handles data collapsing with a corresponding row selection index of a FCSgate class object. \item[Extract Gating Information] The \Rfunction{extractGateHistory} will output a list of values and details of a particular gating index. Table ~\ref{tab:7} summarizes the descriptions of the gating information that is extracted. \end{description} The following subsections exemplify the creation of a gating or selection binary index, the extraction or subsetting of the data using this newly created gating index, the extraction of gating details, a description of bivariate gating schemes, and other gating functions for high-dimensional plots. See ~\ref{SUBSEC:ProbBin} for details about subsequent analyses after gating \citep{roe:hardy:2001}. \subsection{Creating Gate Index} \label{SUBSEC:createGate} Using \Rfunction{createGate} or the interactive \Rfunction{icreateGate} will result in a binary index that will be appended to the gate matrix. We will use the FCS R-object \Robject{unst.1829} for a following demonstration of gating. First a bivariate double cut gate will be implemented and will capture the observations between 300 and 600 of the FSC-Height, first column variable of data, and the Side Scatter, second column variable of data. <>= gate.range.x <- c(300,600) gate.range.y <- c(300, 600) unst.1829.gate1 <- createGate(unst.1829, varpos=c(1,2), gatingrange=c(gate.range.x, gate.range.y), type="bidcut", comment="first gate") @ In order to see the gate, we use \Rfunction{plotvar.FCS} and \Rfunction{showgate.FCS}. Currently, the \Rfunction{showgate.FCS} does not work with \Rfunction{plotvar.FCS} with the\Rargument{hexbin.CSPlot=TRUE} option. The following is a hexbin \Robject{ContourScatterPlot} of the complete data before extraction on the created gate. Note that the gating thresholds are not shown. <>= par(mfrow=c(1,1)) data.vars<-1:2 plotvar.FCS(unst.1829.gate1, varpos=data.vars, plotType="ContourScatterPlot", hexbin.CSPlot=TRUE) @ (Again, Sweave errors cause the above not to work here). The gate for the can be shown with the original data with the following code: <>= data.vars<-1:2 plotvar.FCS(unst.1829.gate1, varpos=data.vars, plotType="ContourScatterPlot", hexbin.CSPlot=FALSE) showgate.FCS(unst.1829.gate1@data[,data.vars], gatingrange= c(gate.range.x, gate.range.y), Index = unst.1829.gate1@gate[,1], type="bidcut", pchtype=".") @ Alternatively, the corresponding \Rfunction{icreateGate} could be implemented that would make a plot and prompt the user for information about the type of gate desired. If parameters such as the type of gate and the gatingrange are known before looking at the data, these options can be input into\Rfunction{icreateGate}, and the plot will be shown. The following plot and implementation describes the use of setting a univariate single cut gate for selection of cells that are $\geq 500$ in value for the 4th data column variable from those selected by the first gate. The previous gate is the first column of gate and the selection value is 1 (\textit{i.e.} $prev.gateNum=1$ and $prev.IndexValue.In=1$). Setting \Rargument{prompt.all.options} to FALSE will surpress other interactive prompts for the title and gating color of the plot. \begin{figure}[htbp] \centering <>= unst.1829.gate2 <- icreateGate(unst.1829.gate1, varpos=4, gatingrange=500, type="uniscut", prev.gateNum=1, prev.IndexValue.In=1, comment="", MY.DEBUG=FALSE, prompt.all.options=FALSE) @ \caption{unst.1829: The gating index for fourth column variable of the data is shown. The row observations beyond the vertical gate of 500 of uniscut are selected with an \Rargument{IndexValue.In=1}.} \label{fig:showGate2} \end{figure} For a completely interactive gating session, the user can implement \Rfunction{icreateGate} on a \Robject{FCS} R-object and input all plotting and gating options after each prompt. \subsection{Data Extraction from Gate Index} \label{SUBSEC:extractGatedData} The extraction or row subsetting of the data matrix corresponding to a gating index is implemented by \Rfunction{extractGatedData}. The following extraction the data will use the first gating index (\textit{i.e.} the first column of the gate matrix specified with \Rargument{gateNum=1}) and the selection value of 1 (\textit{i.e.} selection of observations with \Rargument{IndexValue.In=1}). <>= unst.1829.subset1.1 <- extractGatedData(unst.1829.gate2, gateNum = 1, IndexValue.In = 1, MY.DEBUG = FALSE) unst.1829.subset1.2 <- extractGatedData(unst.1829.gate1, gateNum=1, IndexValue.In=1, MY.DEBUG=FALSE) @ Both the \Robject{unst.1829.gate1} and \Robject{unst.1829.gate2} are \Robject{FCSgate} objects with the same data but different gate matrices. The generic method \Rmethod{equals} will only evaluate the equality of the \Robject{data} and \Robject{metadata} slots and not of the gate matrix for \Robject{FCSgate} objects. <>= equals(unst.1829.subset1.1, unst.1829.subset1.2, check.filename=FALSE, check.objectname=FALSE) @ Extraction using the second column index of the gate matrix (\textit{i.e.} \Rargument{gateNum=2}) and selecting those with \Rargument{IndexValue.In=1} could be implemented on either a previously extracted \Robject{FCSgate} object or the \Robject{FCSgate} object without extraction. The output \Robject{unst.1829.subset.2.1} and \Robject{unst.1829.subset2.2} should have the same \Robject{data} and \Robject{metadata} slots evaluated by \Rfunction{equals}. <>= unst.1829.subset2.1 <- extractGatedData(unst.1829.subset1.1, gateNum = 2, IndexValue.In = 1, MY.DEBUG = FALSE) unst.1829.subset2.2 <- extractGatedData(unst.1829.gate2, gateNum = 2, IndexValue.In = 1, MY.DEBUG = FALSE) equals(unst.1829.subset2.1, unst.1829.subset2.2, check.filename=FALSE, check.objectname=FALSE) @ \subsection{Extraction of Gating Details from "history"} \label{SUBSEC:extractGateHistory} The use of \Rfunction{extractGateHistory} extracts information for a particular gate index. The list output provides an easy way to access the information that can be used as input for the functions \Rfunction{createGate}, \Rfunction{icreateGate}, and \Rfunction{extractGatedData} in subsequent gating implementations. The extraction of gating information before gated data extraction is shown in the for gates 1 and 2. <>= info.gate1 <- extractGateHistory(unst.1829.gate2, gateNum=1) info.gate1 info.gate2 <- extractGateHistory(unst.1829.gate2, gateNum=2) info.gate2 @ The extraction of gating information after implementing "extractGatedData" provides the following output for gates 1 and 2, respectively: <>= info.gate1.1 <- extractGateHistory(unst.1829.subset2.1, gateNum=1) info.gate1.1 info.gate2.1 <- extractGateHistory(unst.1829.subset2.1, gateNum=2) info.gate2.1 @ Suppose the next gate is a bivariate double cut on the 5th and 6th column variables of the "data" matrix. If this gate is implemented from the previous first gate, then this extracted information \Robject{info.gate1} is used as well as \Robject{info.gate1.1} to identify the previous gating information (\textit{i.e.} \Robject{previous.gateNum} and \Robject{previous.IndexValue.In} in the example). <>= gate.range.x <- c(200, 300) gate.range.y <- c(100, 500) previous.gateNum <- info.gate1$gateNum previous.IndexValue.In <-info.gate1$InexValue.In unst.1829.gate3 <- createGate(unst.1829.gate2, varpos = c(1,2), gatingrange = c(gate.range.x, gate.range.y), type="bidcut", prev.gateNum = previous.gateNum, prev.IndexValue.In = previous.IndexValue.In, comment="first gate") extractGateHistory(unst.1829.gate3, gateNum=3) @ Subsequent data extraction can be made on the FCSgate object \Robject{unst.1829.gate3} using \Rfunction{extractGatedData} given a particular gate index column in the gate matrix. \subsection{Gating Schemes} \label{SUBSEC:GatingSchemes} The \Rfunction{FHCRC.HTVNFCS} and the \Rfunction{VRC.HVTNFCS} are functions that implement \Rfunction{icreateGate} and \Rfunction{extractGatedData} as example gating procedures \citep{roe:hardy:2001}. The user will be prompted for gating and plotting input with the following examples and associated FCS R objects (shown and not demonstrated). <>= MC.053.gt <- FHCRC.HVTNFCS(MC.053) MC.054.gt <- FHCRC.HVTNFCS(MC.054) MC.055.gt <- FHCRC.HVTNFCS(MC.055) st.1829.gt <- VRC.HVTNFCS(st.1829) unst.1829.gt <- VRC.HVTNFCS(unst.1829) st.DRT.gt <- VRC.HVTNFCS(st.DRT) unst.DRT.gt <- VRC.HVTNFCS(unst.DRT) @ If the user decides to implement one of the example gating schemes on his or her own FCS R object, the column variable positions can be adjusted for each gate implementation such that the variables to be gated may remain the same. The following example shows that for gate 2, column variable positions 7 and 5 refer to cd3 and cd8, respectively for that data matrix of \Robject{MC.053}, the \Robject{FCS} object to be gated. Likewise, column variable positions that correspond to cd69 and INFgamma are 4 and 3. <>= data(MC.053min) MC.053[["longnames"]] FHCRC.HVTNFCS(MC.053, gate2.vars=c(7,5), gate3.vars=c(4,3)) @ \subsection{Other Image Gating} \label{SUBSEC:OtherImageGating} There are other gating procedures that can be implemented on high-dimensional plots. The \Rfunction{gate.IPC} interactive function allows the user to click on upper and lower bin boundaries for a particular variable to subset. The subsequent graphs represent this subset of points that move from one variable to the next. The following code will be left for the user to implement as an exercise. <>= st.DRT2 <- st.DRT st.DRT2@data <- st.DRT@data[1:1000,] gate.IPC(st.DRT2, 3, hist.plotted=FALSE, image.plotted=TRUE, para.plotted=FALSE, lines.plotted=TRUE, MY.DEBUG=FALSE) @ Currently, there is still work in progress to gate on the dynamic plots \Rfunction{ggobi} and \Rfunction{xgobi}. See Section ~\ref{SEC:DynamPlot} for basic plotting usage. \section{Exploratory Data Analysis} \label{SEC:expl-data-analys} The user may decide to use more qualitative means to investigate the data. The Patient Rule Induction Method (PRIM) allows the extraction of \textbf{rules} defined as subsets that maximizes or minimizes a target function which is usually specified as the mean of a binary label \citep{fried:fish:1998}. In the flow cytometry setting, this target function is the mean of binary HIV-protein stimulated (Y=1) or unstimulated status (Y=0) for a particular immunofluoroescence data subset or box, which ultimately estimates a rule through iterative trimmings of the box in the greedy, top-down Peeling Step and iterative additions into the box during the patient Expansion Step. A Cross-Validation Step implements the same Peeling and Expansion Steps on Testdata Sets. Hence, the estimated rules aim at finding distributional differences between the HIV-protein stimulated and unstimulated cells in a multi-dimensional setting where many different immunofluoroescence measurements are made on the same sample of cells from an individual in an HIV vaccine trial. Again, the results of PRIM are only exploratory because it is a qualitative process that needs subjective, sound judgments to arrive at conclusions for each step of PRIM. PRIM is regarded as a tool for hypothesis generation rather than for inference. Please refer to the "PRIM.pdf" manual in the \Rpackage{rfcprim} package for details regarding the functions used on the \Robject{data} component of the FCS R-objects. \clearpage \section{Flow Cytometry Statistical Testing and Inference} \label{SEC:flowcyt-test-intro} The testing tools in this section are used to evaluate differences between HIV-protein stimulated and unstimulated scenarios, particularly in the IFNgamma measurement after gating described by \cite{roe:hardy:2001}. Each subsection describes particular tests that are implemented by \textbf{runflowcytests} and other functions. \subsection{Probability Binning} \label{SUBSEC:ProbBin} The current S3-class object \Robject{ProbBin.FCS} describes the equal probability binning of a univariate, immunofluoroescence measurement (usually of IFN-gamma) after the implementation of a series of gating schemes across different immunofluorescence measurements. \textit{Equal probability binning} ensures that there are equal number of observations, \textbf{N}, within a bin across all bins constructed by cut-offs or integer breakpoints of the immunofluorescence measurement. The final bin may contain more or less than \textbf{N}, the pre-specified number within each bin. The function, \Rfunction{breakpoints.ProbBin.FCS}, makes the breakpoints or cut-offs for equal probability binning in two ways: \begin{description} \item[combined] based on the combination of the univariate distributions (usually of INF-gamma) of both the HIV-protein stimulated and unstimulated samples of cells \item[by.control] based on only the unstimulated HIV-protein sample. These breakpoints are then used to make histogram objects from both the HIV-protein stimulated and unstimulated cell samples from an individual \citep{roe:hardy:2001}. \end{description} <>= slotnames<-c("unst.hist", "st.hist", "PB", "N.in.bin", "varname") description<-c("unstimulated histogram", "stimulated histogram", "'combined'/'by.control'", "number per bin for cut-off construction", "name of distribution/variable") table8 <- data.frame(slotnames, description) save(table8, file="table8.Rda") @ <>= if (require(xtable)) { xtable(table8, caption=paste("Description of 'ProbBin.FCS' S3 list", "output", sep =" "), label="tab:8") } else { cat("XTABLE not present; please install to get the right table.") } @ %def xtable.table8 The \Rfunction{ProbBin.FCS} object is a S3 list of the following components in Table ~\ref{tab:8} We will construct two gated objects as described in Section ~\ref{SEC:gating}. The stimulated gated object is \Robject{st.DRT.gt} and the unstimulated gated object is \Robject{unst.DRT.gt}. Here we will only gate on the bivariate double cut that extracts the lymphocytes from the Forward Scatter and Side Scatter measurements. Then we will extract the "IFN-gamma" measurment from each sample and then construct a ProbBin.FCS object. The following implements a "bidcut" gate and plots the image with the gate. \begin{figure}[htbp] \centering <>= unst.DRT.gt <- icreateGate(unst.DRT, varpos=c(1,2), gatingrange=c(300,650, 300, 500), type="bidcut", comment="", MY.DEBUG=FALSE, prompt.all.options=FALSE) @ \caption{unst.DRT.gt: The gating index for first two column variables of the data is shown for the selection of the central cluster of lymphocytes. The colored points in the center of the bidcut are selected with an \Rargument{IndexValue.In = 1}.} \label{fig:FCSicreateGate1} \end{figure} \begin{figure}[htbp] \centering <>= st.DRT.gt <- icreateGate(st.DRT, varpos=c(1,2), gatingrange=c(300,650, 300, 500), type="bidcut", comment="", MY.DEBUG=FALSE, prompt.all.options=FALSE) @ \caption{st.DRT.gt: The gating index for first two column variables of the data is shown for the selection of the central cluster of lymphocytes. The colored points in the center of the bidcut are selected with an \Rargument{IndexValue.In=1.}} \label{fig:FCScreateGate2} \end{figure} \begin{figure}[htbp] \centering <>= unst.DRT.gt <- icreateGate(unst.DRT.gt, varpos=c(7,5), gatingrange=c(500,1024, 0, 1024), type="bidcut", prev.gateNum=1, prev.IndexValue.In=1, comment="", MY.DEBUG=FALSE, prompt.all.options=FALSE) @ \caption{unst.DRT.gt: The gating index for 7th and 5th column variables of the data is shown for the selection of cd3+ cells based on the previous gating and selection of lymphocytes (\textit{i.e.}\Rargument{prev.gateNum=1}, \Rargument{prev.IndexValue.In=1}). The colored points of the bidcut gate are selected with an \Rargument{IndexValue.In = 1}.} \label{fig:FCSicreateGate3} \end{figure} \begin{figure}[htbp] \centering <>= st.DRT.gt <- icreateGate(st.DRT.gt, varpos=c(7,5), gatingrange=c(500,1024, 0, 1024), type="bidcut", prev.gateNum=1, prev.IndexValue.In=1, comment="", MY.DEBUG=FALSE, prompt.all.options=FALSE) @ \caption{st.DRT.gt: The gating index for the 7th and 5th column variables of the data is shown for the selection of cd3+ based on the previous gating and selection of lymphocytes (ie, \Rargument{prev.gateNum=1}, \Rargument{prev.IndexValue.In = 1}). The colored points the bidcut gate are selected with an \Rargument{IndexValue.In = 1}.} \label{fig:FCSicreateGate4} \end{figure} We could choose to implement subsequent gates; each gate that is dependent on the selection of a previous gate. We leave further gating as an exercise for the user. Below is an extraction of the data from the cd3+ lymphocytes (\textit{i.e.} from the second gate of cd3+ cells based on the selection of lymphocytes in the first gate). <>= unst.DRT.ex <- extractGatedData(unst.DRT.gt, gateNum=2) st.DRT.ex <- extractGatedData(st.DRT.gt, gateNum=2) @ We decide to analyze the IFN-gamma distribution among the selected cells. We obtain this measurement, IFN.unst and IFN.st, from the HIV-protein unstimulated and stimulated samples of individual DRT, respectively. <>= IFN.unst <- unlist(as(unst.DRT.ex[,4], "matrix")) IFN.st <- unlist(as(st.DRT.ex[,4], "matrix")) @ These two distributions are used to implement probability binning \Rargument{by.control} with 100 observations in each bin based on the control, unstimulated group: <>= PB.by.control <- ProbBin.FCS(IFN.unst, IFN.st, 100, varname=unst.DRT[["longnames"]][4], PBspec="by.control", MY.DEBUG=FALSE) @ Alternatively, these two IFN distributions could have been used to implement probability binning constructed by the \Rargument{combined} data having 100 observations in each bin: <>= PB.combined <- ProbBin.FCS(IFN.unst, IFN.st, 100, varname=unst.DRT[["longnames"]][4], PBspec="combined", MY.DEBUG=FALSE) @ To verify the \Robject{ProbBin.FCS} class objects, the following code using \Rmethod{is} can be used: <>= is(PB.by.control, "ProbBin.FCS") is(PB.combined, "ProbBin.FCS") @ We show the following \Robject{ProbBin.FCS} plots of the \Robject{PB.by.control}" object. \begin{figure}[htbp] \centering <>= plot(PB.by.control, plots.made="unstimulated", freq=TRUE) @ \caption{PB.by.control: The histogram shows the equal probability that was implemented on the unstimulated or control IFNgamma data. Here the counts in each bin are about 100} \label{fig:plotProbBinFCSunstimul} \end{figure} \begin{figure}[htbp] \centering <>= plot(PB.by.control, plots.made="stimulated", freq=TRUE) @ \caption{PB.by.control: The histogram shows the equal probability that was implemented on the unstimulated or control IFNgamma data of which whose breaks are applied to the stimulated data (which is shown in the above histogram). Here the counts in each bin can be shown setting the options freq=TRUE and labels=TRUE, which will prompt a warning because the binning is not equidistant.} \label{fig:plotProbBinFCSstimul} \end{figure} The statistics associated with testing the two distributions for differences, assuming the null of no difference between the stimulated and unstimulated samples can be referenced in \citep{roe:tre:moore:her:2001, baggerly:2001}. The summary of a \Robject{ProbBin.FCS} object will produce statistics that test the difference between the distributions of the stimulated and unstimulated samples. See Section ~\ref{SUBSEC:testing:diff}. <>= summary(PB.by.control) summary(PB.combined) @ \subsection{Testing for the difference between two univariate distributions} \label{SUBSEC:testing:diff} This section describes the tools used to test for the difference between the HIV-protein stimulated sample and the HIV-protein unstimulated sample in terms of the distribution of an immunofluoroescence measurement and, in particular, of the IFN-gamma measurement. There have been four main testing approaches that are outlined belowed. The null hypothesis is the assumption that both samples originate from the same distribution (\textit{i.e.}, there is no difference in two distributions), and the alternative is that they are from different distributions (\textit{i.e.}, the stimulated scenario compared to the unstimulated scenario are different in terms of cell densities). \begin{description} \item[WLR.flowcytest] The weighted log rank test (by default when rho=0) tests the difference in survival curves of the stimulated and unstimulated scenarios when all measurements are regarded as having the "event" and "time" is considered to be the IFN-gamma or other immunofluorescence measurement. Thus, at every point on the immunofluorescence, the curves are tested for differences. A plot of the survival curves for both samples is also optionally output. \item[KS.flowcytest] Kolmogorov-Smirnoff test also evaluates the difference in distributions for the control and the stimulated samples, but may be more sensitive and result in a higher false positive rate when there are a larger number of data points. \item[ProbBin.flowcytest] Statistics proposed by Keith A. Baggerly and Mario Roederer include Chi-squared and Normal tests for the PB metric via probability binning (both based on the control data only ("by.control") and based on the combined dataset of both the stimulated and the control samples ("combined") \citep{roe:tre:moore:her:2001, baggerly:2001}. \item[pkci2.flowcytest] The method, proposed by Zoe Moodie, PhD, tests the difference of the upper tails of the two distributions rather than the range of the distribution for IFN-gamma or other univariate immunofluorescence measurement. \item[runflowcytests] This function will run all of the aforementioned tests either separately or together in one call. \end{description} As a single example implementing all of the testing tools, we will only demonstrate the testing with the \Rfunction{runflowcytests}. Further documentation for each individual test can be obtained in the help documentation for the following tests: \Rfunction{WLR.flowcytest}, \Rfunction{KS.flowcytest}, \Rfunction{ProbBin.flowcytest}, \Rfunction{pkci2.flowcytest}. Please note that \Rfunction{ProbBin.flowcytest} provides the same statistical output as \Rfunction{summary.ProbBin.FCS}. <>= output.runflowcytests <- runflowcytests(IFN.unst, IFN.st, KS.plotted=FALSE, WLR.plotted=FALSE, PBobj.plotted=FALSE) @ The plots and output for the \Rfunction{KS.flowcytest} and the \Rfunction{WLR.flowcytest} are shown with the code on the following pages. The plots for the \Rfunction{ProbBin.flowcytest} is similar to those shown in Figure ~\ref{fig:plotProbBinFCSunstimul} and Figure ~\ref{fig:plotProbBinFCSstimul}. \begin{figure}[htbp] \centering <>= output.KSflowcytest <- KS.flowcytest(IFN.unst, IFN.st, KS.plotted=TRUE, MY.DEBUG=FALSE) @ \caption{KS.flowcytest plot shows the distributions of the stimulated and unstimulated samples.} \label{fig:KSflowcytestPlot} \end{figure} \clearpage \begin{figure}[htbp] \centering <>= output.WLRflowcytest <- WLR.flowcytest(IFN.unst, IFN.st, WLR.plotted=TRUE, MY.DEBUG=FALSE) @ \caption{WLR.flowcytest plot shows the survival curves for the two distributions if every data point was regarded as being an event, and time was regarded as the IFN-gamma measurement.} \label{fig:WLRflowcytestPlot} \end{figure} \subsection{ROC curves for testing tails of two distributions} \label{SUBSEC:testing-ROC} For each individual there is a pair of data corresponding to a HIV-protein stimulated sample and a HIV-protein unstimulated/control sample. For each individual who is either HIV-positive or negative, the 99.9-th percentile for the unstimulated sample and the percent positive for the stimulated sample based on this control-based 99.9-th percentile was calculated. Here we exemplify the calculations for the \Robject{IFN.st} and the \Robject{IFN.unst} obtained from the gating for the HIV-negative individual 1829. First, using \Rfunction{percentile.FCS}, we obtain the 99.9-th percentile based on the control, unstimulated sample. <>= unst.percentile <- percentile.FCS(IFN.unst, percent=0.999) @ Now using \Rfunction{PercentPos.FCS}, we obtain the percent positives for both the unstimulated and the stimulated samples, respectively, using the \Rargument{unst.percentile}. Note that the percent positive for the control sample is about 1 - 0.999. <>= PercentPos.FCS(IFN.unst, percentile=unst.percentile)$percent.pos PercentPos.FCS(IFN.st, percentile = unst.percentile)$percent.pos @ To evaluate which HIV-protein stimulation results in the most sensitive detection of HIV-positive status as well as the lowest chance of falsely concluding HIV-positive status based on a stimulated sample's higher 99.9th percentile control-based percent positive (\textit{i.e.}, according to the approach used in \Rfunction{pcki2.flowcytest}). Zoe Moodie, PhD, constructed the ROC (Receiver Operating Characteristic) HIV-protein-specific curves in which the cut-offs are based on the combined stimulated and unstimulated percent positives obtained by the previous methods. The \Robject{PerPosROCmin} data in the \Rpackage{rfcdmin} package exemplifies the percent positives obtained to plot the ROC curve. Here we retrieve the example data provided by Zoe Moodie, PhD. <>= data(PerPosROCmin, package="rfcdmin") @ The function \Rfunction{ROC.FCS} shows the ROC curve and sensitivity, specificity output after the implementation of the functions \Rfunction{percentile.FCS} and \Rfunction{PercentPos.FCS} to obtain the percentiles and the percent positives, respectively, for each individual's HIV-protein stimulated and unstimulated pair for a particular immunofluorescence measurement. \begin{figure}[htbp] \centering <>= GAG<-ROC.FCS(hivpos.gag, hivneg.gag) #plotting the pola stimulated 100* percent positives POLA<-ROC.FCS(hivpos.pola, hivneg.pola, lineopt=2, colopt=2, overlay=TRUE) #plotting the polb stimulated 100* percent positives POLB<-ROC.FCS(hivpos.polb, hivneg.polb, lineopt=4, colopt=3, overlay=TRUE) legend(0.7, 0.7, c("gag", "polA", "polB"), col = c(1,2,3), lty=c(1,2,4)) @ \caption{The ROC curves are based on the different HIV-proteins used for the stimulation of immune responses. Here the GAG appears to achieve greater sensitivity at a lower 1-specificity when evaluating the difference in immune responses between an HIV-infected and HIV-noninfected profiles using the \Rfunction{pkci2.flowcytest} approach.} \label{fig:ROCexample1} \end{figure} \clearpage \section{Future Updates} \label{SEC:futureupdates} Most notable future updates include converting the testing and the gating into generic S4 class objects. Currently these objects are all S3. The dynamic plotting functions will also be converted to S4 generic objects with additional visualization tools and methods. Future work with PRIM include using the algorithm with real datasets and displaying the results with the tools provided in the \Rpackage{rflowcyt} package. \bibliographystyle{plainnat} \bibliography{rflowcyt} \end{document}