% NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % %\VignetteIndexEntry{Fingerprinting for Flow Cytometry} %\VignetteDepends{flowFP} %\VignetteKeywords{} %\VignettePackage{flowFP} \documentclass[11pt]{article} \usepackage{times} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \usepackage{times} \usepackage{comment} \usepackage{graphicx} \usepackage{subfigure} \textwidth=6.2in \textheight=8.5in \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \title{flowFP: Fingerprinting for Flow Cytometry} \author{H. Holyst and W. Rogers} \begin{document} \maketitle \begin{abstract} \noindent \textbf{Background} A new software package called \Rpackage{flowFP} for the analysis of flow cytometry data is introduced. The package, which is tightly integrated with other Bioconductor software for analysis of flow cytometry, provides tools to transform raw flow cytometry data into a form suitable for direct input into conventional statistical analysis and empirical modeling software tools. The approach of \Rpackage{flowFP} is to generate a description of the multivariate probability distribution function of flow cytometry data in the form of a ``fingerprint''. As such, it is independent of a presumptive functional form for the distribution, in contrast with model-based methods such as Gaussian Mixture Modeling. \Rpackage{flowFP} is computationally efficient, and able to handle extremely large flow cytometry data sets of arbitrary dimensionality. Algorithms and software implementation of the package are described.\\ \noindent \textbf{Methods} \Rpackage{flowFP} implements several S4 classes and methods, and rests upon the flowCore foundation classes for flow cytometry data.\\ \noindent \textbf{Results} Use of the software is exemplified with applications to data quality control and to the automated classification of Acute Myeloid Leukemia.\\ \noindent \textbf{keywords} Flow cytometry, fingerprinting, high throughtput, software \end{abstract} <>= library(flowFP) @ \section{Introduction} Flow cytometry (FC) produces multi-dimensional biological information at the level of the cellular compartment, and over very large numbers of cells. As such it is ideally suited for a wide variety of investigations for which cellular context and large sample observations are important. In recent years the technology of FC has undergone appreciable development \citep{Chattopadhyay2008,McCoy2002} with the introduction of digital signal processing electronics \citep{Murthi2005}, multiple lasers, increasing numbers of fluorescence detectors, and robotic automation, both in sample preparation \citep{Kelliher2005} and in instrumental data collection \citep{Gates2009}. The recent development of new reagents \citep{Chattopadhyay2006} that enable increasing assay complexity has also been rapid and accelerating. Given the scope and pace of these developments, the bottleneck in many FC experiments has shifted from the wet laboratory to the computer laboratory; that is to say, data analysis\citep{Chattopadhyay2008}.\\ FC data are typically analyzed using graphically-driven approaches. Subsets of cells (events) are delineated usually in one- or two-dimensional histograms or ``dotplots'' in a procedure termed ``gating''. The gating process is often applied in a sequential fashion, with the numbers of events inside successive gates falling monotonically from step to step. Subsets determined via gating are typically then quantified with respect to their expression patterns in the dimensions of multiparameter space not utilized for gating, often by simply counting proportions of the subsets that are positive or negative for each of the markers of interest for that subset. Several commercially available software packages have been extensively optimized to support this kind of visually-guided analysis workflow, for example, FlowJo (Treestar Inc, Ashland, OR), WinList (Verity Software House, Topsham, ME) and FCSExpress (De Novo Software, Los Angeles, CA).\\ Despite near-universality of this data analysis approach within the flow cytometry community, the procedure has three main drawbacks. First, the choice of gates is often subjective, particularly in the not-unusual situation where the gating distribution is broad and smooth. This leads to an inherent lack of reproducibility from sample-to-sample, or even for the re-analysis of the same sample, as well as presenting audit trail difficulties that compromise verifiability of results. Second, because gates are specified by manually drawing regions on a graph using a computer mouse, the process is very labor-intensive and time-consuming, and in most cases takes many times longer than the actual acquisition of the data. Finally, because gating and analytical regions are determined by the data analyst based on his or her experience, there may be interesting and informative features that exist within the full, un-gated multivariate distribution of events but that nevertheless escape detection in this analysis paradigm.\\ \Rpackage{flowFP} is designed to address these limitations in conventional approaches to the analysis of FC data. The broad aim of \Rpackage{flowFP} is to directly transform raw FC list-mode data into a form suitable for direct input to other statistical analysis and empirical modeling tools. Thus, it is useful to think of flowFP as an intermediate step between the acquisition of high- throughput FC data on the one hand, and empirical modeling, machine learning, and knowledge discovery on the other.\\ \section{Algorithm Description} \Rpackage{flowFP} implements and integrates ideas put forth in \citep{Roederer2001b} and \citep{Rogers2008}. FlowFP utilizes the Probability Binning (PB) algorithm \citep{Roederer2001b} to subdivide multivariate space into hyper-rectangular regions that contain nearly equal numbers of events. Regions (bins) are determined by (a) finding the parameter whose variance is highest, (b) dividing the population at the median of this parameter which results in two bins, each with half of the events, and (c) repeating this process for each subset in turn. Thus, at the first level of binning the population is divided into two bins. At the second level, each of the two ``parent'' bins is divided into two ``daughter'' bins, and so forth. The final number of bins \textit{n} is determined by the number of levels \textit{l} of recursive subdivision, such that $n=2^l$.\\ This binning procedure is typically carried out for a collection of samples (instances), called a ``training set''. The result of this process is in essence a description of the subdivision of a multiparametric space into sub-regions, and is thus termed a ``model'' of the space (not to be confused with modeling approaches that fit data to a parameterized model or set of models). The model is then applied to another set of samples (which may or may not include instances from the training set). This operation results in a feature vector of event counts in each bin of the model for each instance in the set. These feature vectors are, in the context of a specific model, a unique description of the multivariate probability distribution function for each instance in the set of samples, and thus are aptly referred to as ``fingerprints''.\\ Although flowFP generates bins using the PB algorithm, the way it utilizes the resulting fingerprints is similar to the methods described in \citep{Rogers2008}. Each element of a fingerprint represents the number of events in a particular sub-region of the model. Although it may not be known \textit{a priori} which of these regions are informative with respect to an experimental question, it is possible to determine this by using appropriate statistical tests, along with corrections for multiple comparisons, to ascertain which regions (if any) are differentially populated in two or more groups of samples. If we regard the number of events in a bin as one of $n$ features describing an instance, then the statistical determination of informative sub-regions is clearly seen to be a feature selection procedure.\\ Fingerprint features are useful in two distinct modes. First, all or a selected subset of features can be used in clustering or classification approaches to predict the class of an instance based on its similarity to groups of instances. Second, the events within selected, highly informative bins can be visualized within their broader multivariate context in order to interpret the output of the modeling process. This step is crucial in that it provides a means to develop new hypotheses for FC-derived biomarkers within the context of existing reagent panels.\\ \section{Fingerprint Representation} \Rpackage{FlowFP} is one of a growing number of Bioconductor packages integrated within the framework provided by \Rpackage{flowCore} and is thus able to interoperate with other \Rpackage{flowCore}-compliant tools as well as with the full range of downstream statistical analysis and machine learning tools available in R. This integration enables flexible creation of powerful high-throughput analysis procedures for large FC data sets.\\ \Rpackage{FlowFP} uses the S4 object-oriented facility of R. Computationally intensive parts are written in the C programming language for efficiency. FlowFP is built around a set of three S4 classes, each with a constructor of the same name as the class name. In addition there are a number of methods for accessing, manipulating and visualizing the data in each of the classes.\\ \subsection{The \Rclass{flowFPModel} Class} \Rclass{flowFPModel} is the fundamental class for the flowFP package. The \Rfunction{flowFPModel} constructor takes a collection of one or more list-mode instances which are represented in the flowCore framework as a \Rclass{flowFrame} (for a single instance) or a \Rclass{flowSet} (for a collection of instances), respectively (henceforth we shall refer to \Rclass{flowFrame}s and \Rclass{flowSet}s, the original list-mode data being implied). In addition to the required argument, \Rfunction{flowFPModel} has optional arguments that allow control over the number of levels of recursive subdivision and the set of parameters to be considered in the binning process. By default all parameters in the input \Rclass{flowSet} are considered, but if this argument is provided, any parameters not listed are ignored. The constructor emits an object of type \Rclass{flowFPModel}, which encapsulates a complete representation of the binning process that is used later to construct fingerprints.\\ To see how this works, let's build a \Rclass{flowFPModel} for a small data set. \textit{fs1} is a \Rclass{flowSet} comprised of 7 \Rclass{flowFrame}s, one for each tube in a sample. The tubes are stained with different antibody conjugates, but CD45-ECD is common to all of the tubes in order to support gating of the entire panel from one of the tubes using CD45 vs. SSC. <>= data(fs1) fs1 @ One of the tubes (the first one) looks like this: <>= fs1[[1]] @ Let's construct a model, using SSC (parameter 2) and CD45 (parameter 5). We will specify the number of recursions to be 7, resulting in $2^7=128$ bins in the model:\\ <>= mod <- flowFPModel(fs1, name="CD45/SS Model", parameters=c(2,5), nRecursions=7) show(mod) @ \begin{center} <>= plot(mod) @ \end{center} Usage and argument descriptions for \Rfunction{flowFPModel} are as follows:\\ \\ Usage: \begin{quote} \begin{verbatim} flowFPModel(obj, name="Default Model", parameters=NULL, nRecursions="auto", dequantize=TRUE, sampleSize=NULL, verbose=FALSE) \end{verbatim} \begin{tabular}{rp{4.75in}} \hline obj & Training data for model, either a \Rclass{flowFrame} or \Rclass{flowSet}.\\ parameters & A vector of parameters to be considered during model construction. You may provide these as a vector of parameter indices (as shown above) or as a character vector. For example, parameters = c("SS Log", "FL3 Log") would yield the same result as shown in the example.\\ nRecursions & Number of times the FCS training data will be sub divided. Each recursion doubles the number of bins, so that $n_{bins} = 2^{nRecursions}$. A warning will be generated if the number of expected events in each bin is < 1. (e.g. if your training set had 1000 events, and you specified ${nRecursions}=10$.)\\ dequantize & If TRUE, all of the event values in the training set will be made unique by adding a tiny value (proportional to the ordinal position of each event) to the data.\\ sampleSize & Used to specify the per-\Rclass{flowFrame} sample size of the data to use in model generation. If \emph{NULL}, all of the data in \emph{x} is used. Setting this to a smaller number will speed up processing, at the cost of accuracy.\\ name & A descriptive name assigned to the model.\\ verbose & If TRUE, prints out information as it constructs the model. Useful for debugging.\\ \end{tabular} \end{quote} \subsection{The \Rclass{flowFP} Class} The \Rfunction{flowFP} constructor takes a \Rclass{flowFrame} or a \Rclass{flowSet} as its only required argument, and an optional \Rclass{flowFPModel}. If no \Rclass{flowFPModel} is supplied, \Rfunction{flowFP} computes a model (by calling \Rfunction{flowFPModel} internally). Regardless the source of the model, \Rfunction{flowFP} applies the model to each of the instances in its input. The resulting \Rclass{flowFP} object extends the \Rclass{flowFPModel} class and contains two additional important slots to store a matrix of counts and a list of tags. The counts matrix has dimensions $m \times n$, where $m$ is the number of instances in the input \Rclass{flowSet} (or one if a \Rclass{flowFrame} is provided), and $n$ is the number of features in the model. The tags slot is a list of $m$ vectors, each of which has $e$ elements, where $e$ is the number of events in the corresponding frame in the input \Rclass{flowSet}. The value for each element of the tag vector represents the bin number into which the corresponding event fell during the fingerprinting procedure. This is useful for visualization or gating based on fingerprints, as will be illustrated below.\\ A set of fingerprints is obtained by applying the model to a \Rclass{flowSet}. In this case we will apply the model derived from \textit{fs1} to the same \Rclass{flowSet}:\\ \begin{center} <>= fp <- flowFP (fs1, mod) show(fp) plot (fp, type="stack") @ \end{center} Usage and argument descriptions for \Rfunction{flowFP} are as follows:\\ \\ Usage: \begin{quote} \begin{verbatim} flowFP(obj, model=NULL, sampleClasses=NULL, verbose=FALSE, ...) \end{verbatim} \begin{tabular}{rp{4.75in}} \hline fcs & A \Rclass{flowFrame} or \Rclass{flowSet} for which fingerprint(s) are desired.\\ model & A model generated with the \Rfunction{flowFPModel} constructor, or \emph{NULL}. If \emph{NULL}, a default model will be silently generated from all instances in \emph{x}.\\ sampleClasses & An optional character vector describing modeling classes. If supplied, there must be exactly one element for each \emph{flowFrame} in the \emph{flowSet} in \emph{x} (see Details).\\ verbose & If TRUE, prints out information as it constructs the fingerprint and possibly the model. Useful for debugging.\\ \dots & If model is \emph{NULL}, additional arguments are passed on to the model constructor. see \Rfunction{flowFPModel} for details.\\ \end{tabular} \end{quote} \subsection{The \Rclass{flowFPPlex} Class} The \Rclass{flowFPPlex} is a container object which facilitates combining, processing and visualizing large collections of flowFP objects which are all derived from the same set of instances. The \Rfunction{flowFPPlex} constructor takes a list of \Rclass{flowFP} objects. The \Rclass{flowFPPlex} manages the logical association of a set of \Rclass{flowFP} descriptions. In particular, it extends the counts matrices of its members ``horizontally'' so as to create a unified representation of the entire collection of fingerprints.\\ For example, let's load data for another sample, similar to \textit{fs1}. We will then use both \Rclass{flowSet}s as model bases, and fingerprint \textit{fs1} with respect to both of them. Then we'll load both sets of fingerprints into a \Rclass{flowFPPlex} and visualize the result:\\ \begin{center} <>= data(fs2) mod1 <- flowFPModel(fs1, name="CD45/SS Model vs fs1", parameters=c("SS Log", "FL3 Log"), nRecursions=7) mod2 <- flowFPModel(fs2, name="CD45/SS Model vs fs2", parameters=c("SS Log", "FL3 Log"), nRecursions=7) fp1_1 <- flowFP (fs1, mod1) fp1_2 <- flowFP (fs1, mod2) plex <- flowFPPlex(c(fp1_1, fp1_2)) plot (plex, type='grid', vert_scale=10) @ \end{center} In the figure, the light blue vertical lines show the division of the fingerprints resulting from the two models. Notice that, as one might expect, when fingerprinting \textit{fs1} against a model constructed from itself, the deviations from the norm are small, whereas when fingerprinting \textit{fs1} against a model constructed from a different \Rclass{flowSet}, the deviations in the fingerprint values are large.\\ We might also wish to use the \Rclass{flowFPPlex} to facilitate exploration of the effect on fingerprints due to variation of the number of levels of recursion.\\ \begin{center} <>= fp <- flowFP (fs1, param=c("SS Log", "FL3 Log"), nRecursions=8) plex <- flowFPPlex() for (levels in 8:5) { nRecursions(fp) <- levels plex <- append (plex, fp) } plot (plex, type="tangle", transformation="norm") @ \end{center} Notice that as the fingerprint resolution (determined by the number of recursions) is reduced from left to right, the number of features in each fingerprint falls, but the size of the variations from the norm also falls. Evidently, the more local the modeling, the larger the variations from instance to instance we can expect.\\ Also notice a couple of other features we illustrate in this example. First, we only computed the model once, but we can change the effective number of recursions (and thus the resolution of the fingerprints) to any integer less than that at which the model was computed, using the accessor function \Rfunction{nRecursions}. Second, we can initialize an empty \Rclass{flowFPPlex}, and then use the function \Rfunction{append} to add \Rclass{flowFP}s one at a time.\\ Usage and argument descriptions for \Rfunction{flowFPPlex} are as follows:\\ \\ Usage: \begin{quote} \begin{verbatim} flowFPPlex(fingerprints=NULL) \end{verbatim} \begin{tabular}{rp{4.75in}} \hline fingerprints & List of \Rclass{flowFP}s.\\ \end{tabular} \end{quote} \subsection{Generic functions} A number of other methods have been provided to facilitate interaction with and analysis of fingerprinting results. Chief among these are visualization methods that aid in the understanding and interpretation of fingerprinting results. They are provided as overloads to the generic \Rfunction{plot} function. In addition, a few other accessor methods deserve special mention.\\ \begin{description} \item[nRecursions(obj).] This generic function returns the number of levels of recursive subdivision of its argument. FlowFP, flowFPPlex and flowFPModel all implement the method. Furthermore, the flowFP class implements the ``set'' method. This enables the user to compute a model at some fairly high resolution, and then to derive fingerprints at that resolution or any lower resolution without re-computing the model. This is possible because fingerprinting is recursive, so that given any high-resolution model, all models of lower resolution can be derived from it.\\ \item[counts(obj).] This generic function returns a matrix of the number of events per instance and per bin. FlowFP and flowFPPlex classes implement this method, facilitating creation of fingerprint matrices suitable for processing by downstream methods outside of the flowFP package. The method has an optional argument ``transformation'' that can take on values ``raw'' (returns the actual event counts for each bin), ``normalize'' (normalizes by dividing raw counts by the expected number of events), or ``log2norm'' (like normalize except that it further takes the log2 of the result).\\ \item[sampleNames(obj) and sampleClasses(obj).] These generic functions set or get sample identifiers for objects of class flowFP or flowFPPlex. By default, for flowFPs, sample names are derived from the flowSet. However they can be overridden by the set method, providing flexibility to handle cases where the sample names in a flowSet are not appropriate. When adding fingerprints to a flowFPPlex, sample names, and if present sample classes, are compared, and the join operation is not permitted unless names and classes among all fingerprints in the flowFPPlex are identical.\\ \item[parameters(obj).] This generic function returns the light scatter and/or fluorescence parameters involved in binning, either for a flowFPModel or a flowFP. The function is able to report both the parameters that were considered for binning as well as those that actually participating (i.e. ones that were subdivided at during recursive subdivision).\\ \item[tags(fp).] This generic function returns the tags slot of a flowFP object. This is useful for visualization and gating operations.\\ \item[binBoundary(obj).] This generic function reports a list of multivariate rectangles corresponding to the limits of the bins. FlowFP and flowFPModel classes both implement this method. This information is also useful for visualization and gating operations.\\ \end{description} \section{Fingerprinting for Gating Quality Control} As alluded to in Section 3.1, a common practice, especially in some clinical settings, is the collection of data in several aliquots, each stained with different reagent cocktails in order to see all of the markers of interest, but including in all of the tubes at least one common marker. Using parameters common to all of the tubes (CD45 and SSC are frequently used for this purpose \citep{Borowitz1993}) subsets of cells can be delineated by drawing gates on one tube and then applying the gates to all of the tubes. This saves time, but relies on the assumption that the probability distribution is stationary over all of the tubes. If this assumption is not valid, subsetting errors will occur, but may not be readily apparent without careful study of the gating plots.\\ Using flowFP, in order to rapidly detect consistency of CD45 vs. SSC distributions without the need to look at dotplots, we can fingerprint a collection of tubes and look for outliers.\\ \begin{center} <>= fp1 <- flowFP (fs1, parameters=c("SS Log", "FL3 Log"), name="self model: fs1", nRecursions=5) plot (fp1, type="qc", main="Gate QC for Sample fs1") @ \end{center} In this plot the fingerprints for each of the 7 tubes is shown in a grid. The color of the grid square for a tube indicates the standard deviation of the normalized and log-transformed fingerprint feature vector for that tube according to the color scale at the top of the figure. The standard deviation value is printed in the grid square.\\ The following figure shows an example where the gate deviation is large. Note the fact that tube 4 and especially tube 5 are the outliers. Also note how easy it is to spot this problem.\\ \begin{center} <>= fp2 <- flowFP (fs2, parameters=c("SS Log", "FL3 Log"), name="self model: fs2", nRecursions=5) plot (fp2, type="qc", main="Gate QC for Sample fs2") @ \end{center} <>= phenoData(fs2)$Tube <- factor(1:7) #$ for (i in 1:7) { exprs(fs2[[i]]) = exprs(fs2[[i]])[sample(1:nrow(fs2[[i]]),size=2500),] } fp2 <- flowFP (fs2, param=c(2,5), nRec=5) @ <>= xyplot (`FL3 Log` ~ `SS Log` | Tube, data=fs2) @ % \begin{center} <>= plot(trellis.last.object()) @ \end{center} In the above \Rpackage{flowViz} plot it is certainly possible to spot the inconsistency, but it's not so easy as in the fingerprint-based QC picture. On the other hand \dots \\ \begin{center} <>= plot (fp2, fs2, hi=5, showbins=c(6,7,10,11), pch=20, cex=.3, transformation='norm') @ \end{center} In this figure we can follow the fingerprint bins containing excess events in Tube 5 by way of the color map shown below the fingerprint. By comparing the bin indices 6, 7, 10 and 11 corresponding to green to blue-green colors, it's easy now to localize the place where Tube 5 differs from the rest.\\ Quality control for individual multi-tube samples is tedious, but not crazily impossible. 96-well plate data will drive you nuts with the need to examine gating data for each well and for many plates. Try this instead: \begin{center} <>= data(plate) fp <- flowFP (plate, parameters=c("SSC-H","FL3-H","FL4-H"), nRecursions=5) plot (fp, type='plate') @ \end{center} This is a stimulation dataset, described in \citep{Inokuma2007} (data were drastically sampled down to 1000 events per well so that they could be included in the package for illustration purposes). The original data are available at \citep{Inokuma2008}.\\ \section{Limitations, Caveats and Comments} It is important to note that fingerprinting of FC data is not without limitations. First, we note that fingerprinting approaches are sensitive to differences in multivariate probability distributions no matter their origin. Thus, instrumental, reagent or other systematic variations may cause spurious signals as large, or larger than true biological effects. For this reason it is important to measure and control for these effects\citep{Chattopadhyay2008}. In fact, fingerprinting itself can be used to assess and to help control for systematic effects, as was illustrated in Section 4.\\ Second, because fingerprinting is, in essence, the creation of a multivariate histogram, it responds to factors that might artificially emphasize certain bins in preference to others. In particular, events may pile up on either the zero or full- scale axis for one or more parameters. This situation frequently results from values that would be negative due to compensation or background subtraction (causing pile-up on the zero axis) or at the other end of the scale, values that exceed the dynamic range of the signal detection apparatus causing pile-up at full scale. At either end this results falsely in an apparent high density of events. Fingerprinting bins are thus ``attracted'' to these locations, causing a distortion in the proper characterization of the true multivariate probability distribution function.\\ Just as scaling and transformation of data are important for visualization of multi- parameter distributions\citep{Parks2006, Tung2004, Novo2008}, so they are also important for fingerprinting. Data acquired using linear amplifiers such as exist in some modern instruments, or data that have been ``linearized'' from instruments with logarithmic amplifiers, tend to be heavily skewed to the left, since in most cases data distributions are quasi-log-normally distributed. Bins determined from such data thus have extreme variations in size. A good rule of thumb is to use a data transformation that produces the most spread-out distribution, which also is often the transformation most effective for clear visualization of the distribution. \\ A key limitation for fingerprinting approaches, including \Rpackage{flowFP}, relates to the number of events available for analysis. Since the objective of probability binning is to find bins containing equal numbers of events, it follows that once the number of bins is on the order of the number of events in an instance, the expected number of events per bin will be of order unity. In this case differences in bin counts will not be statistically significant. On the other hand, if the dimensionality of the data set is high, the average number of times any parameter will be divided in the binning process will be small. For example, in a dataset with 18 parameters, if we demand at least, say, 10 events per bin for statistical accuracy, about $2.6 \times 10^6$ events would be required in order that each parameter be divided on average into at least two bins. Thus, the spatial resolution of binning is limited by the number of events collected, and as the number of parameters increases, the number of events needed to maintain resolution increases geometrically.\\ Finally, although \Rpackage{flowFP} is computationally fast, because of the way that flow cytometric data are represented in R large datasets consume vast amounts of memory. If you need to process 96-well data for example, you will probably either need a machine with lots of memory (>4 GByte), or you will have to use some tricks, like sampling the data in order to reduce memory footprint. Fortunately, memory is cheap and 64-bit operating systems are becoming commonplace. For example, just reading in the data in \citep{Inokuma2008} consumed 3.1 GB on a Linux 64-bit machine with 32 GB of memory. Fingerprinting required (briefly) an additional 2.5 GB for a total of 5.6 GB. However the whole process (reading in the data, computing the fingerprints, and displaying the result similar to the 96-well figure above) only took about 1 minute.\\ With recent technological advances, FC is now capable of operating as a true high-throughput technique. A key enabling requirement is the need to automate data analysis for speed, much as automation in sample preparation and data acquisition have accelerated the rate of generation of data and thereby enabled high-throughput FC. This requirement inevitably drives movement away from human-drawn, visually-based gating which is the single most significant obstacle preventing a true high-throughput FC workflow. We hope you find \Rpackage{flowFP} a useful tool in your toolbox to help you achieve this goal.\\ \section{Acknowledgements} We wish to express our deep gratitude to Jonni Moore and all of the people of the University of Pennsylvania Flow Cytometry Resource. We thank Florian Hahne, Nolwenn Le Meur and Ryan Brinkman for advice and assistance in programming in R and integration with flowCore. We are most especially grateful to Clarient, Inc. for generously making available data sets used to illustrate the utility of \Rpackage{flowFP}. \clearpage \bibliographystyle{plainnat} \bibliography{flowFP} \end{document}