% %\VignetteIndexEntry{Using SOMs for visualization of cytometry data} %\VignetteDepends{FlowSOM} %\VignetteKeywords{} %\VignettePackage{FlowSOM} % \documentclass[english]{article} \usepackage[T1]{fontenc} \usepackage[latin9]{inputenc} \usepackage{babel} <>= BiocStyle::latex() @ \begin{document} \SweaveOpts{concordance=TRUE} \begin{center} {\Large Using self-organizing maps for visualization and interpretation of cytometry data} \par\end{center}{\Large \par} \begin{center} Sofie Van Gassen, Britt Callebaut and Yvan Saeys \par\end{center} \begin{center} Ghent University \par\end{center} \begin{center} {\footnotesize September, 2014\bigskip{} \bigskip{} } \par\end{center}{\footnotesize \par} \begin{center} \textbf{Abstract\bigskip{} } \par\end{center} The \Biocpkg{FlowSOM} package provides new visualization opportunities for cytometry data. A four-step algorithm is provided: first, the data is read and preprocessed, then a self-organizing map is trained and a minimal spanning tree is build, and finally, a meta-clustering is computed. Several plotting options are available, using star charts to visualize marker intensities and pie charts to visualize correspondence with manual gating results or other automatic clustering results. \bigskip{} \bigskip{} \textbf{1. Reading the data} The FlowSOM package has several input options. The first possibility is to use an array of character strings, specifying paths to files or directories. When given a path to a directory, all files in the directory will be considered. This process does not happen recursively. You can specify a pattern to use only a selection of the files. The default pattern is \Rcode{".fcs"}, making sure that only fcs-files are selected. When you are already working with your data in \R{}, it might be easier to use a \Rclass{flowFrame} or \Rclass{flowSet} from the \Biocpkg{flowCore} package as input. This is also supported. If multiple paths or a \Rclass{flowSet} are provided, all data will be concatenated. When reading the data, several pre-processing options are available. The data can be automatically compensated using a specified matrix, or using the \Rcode{\$SPILL} variable from the fcs-file. The data can be logicle transformed for specified columns. If no columns are provided, all columns from the spillover matrix will be transformed. Finally, the data can be scaled. By default, it will scale to a mean of zero and standard deviation of one. However, specific scaling parameters can be set (see the base \R{} \Rfunction{scale} function for more detail). \medskip{} \noindent <<>>= set.seed(42) library(flowCore) library(FlowSOM) fileName <- system.file("extdata","lymphocytes.fcs", package="FlowSOM") fSOM <- ReadInput(fileName,compensate = TRUE,transform = TRUE, toTransform=c(8:18),scale = TRUE) ff <- read.FCS(fileName) fSOM <- ReadInput(ff,compensate = TRUE,transform = TRUE, scale = TRUE) @ \noindent \medskip{} This function returns a FlowSOM object, which is actually a list containing several parameters. The data is stored as a matrix in \$data, and all parameter settings to read the data are also stored. The begin and end indices of the subsets from the different files can be found in \$metadata. <<>>= str(fSOM) @ \bigskip{} \textbf{2. Building the self-organizing map} The next step in the algorithm is to build a self-organizing map. Several parameters for the self-organizing map algorithm can be provided, such as the dimensions of the grid, the learning rate, the number of times the dataset has to be presented. However, the most important parameter to decide is on which columns the self-organizing map should be trained. This should contain all the parameters that are useful to identify cell types, and exclude parameters of which you want to study the behaviour on all cell types such as activation markers. The BuildSOM function expects a FlowSOM object as input, and will return a FlowSOM object with all information about the self organizing map added in the map parameter of the FlowSOM object. \medskip{} <<>>= fSOM <- BuildSOM(fSOM,colsToUse = c(9,12,14:18)) str(fSOM$map) @ \bigskip{} \textbf{3. Building the minimal spanning tree} The third step of FlowSOM is to build the minimal spanning tree. This will again return a FlowSOM object, with extra information contained in the \$MST parameter. \medskip{} <<>>= fSOM <- BuildMST(fSOM) str(fSOM$MST) @ Once this step is finished, the FlowSOM object can be used for visualization. \medskip{} <>= PlotStars(fSOM) @ If you do not want the size to depend on the number of cells assigned to a node, you can reset the node size. <>= fSOM <- UpdateNodeSize(fSOM, reset=TRUE) PlotStars(fSOM,MST=FALSE) fSOM <- UpdateNodeSize(fSOM) @ It might also be interesting to compare with a manual gating. <>= library(flowUtils) flowEnv <- new.env() ff_c <- compensate(ff,ff@description$SPILL) colnames(ff_c)[8:18] <- paste("Comp-",colnames(ff_c)[8:18],sep="") gatingFile <- system.file("extdata","manualGating.xml", package="FlowSOM") read.gatingML(gatingFile, flowEnv) filterList <- list( "B cells" = flowEnv$ID52300206, "ab T cells" = flowEnv$ID785879196, "yd T cells" = flowEnv$ID188379411, "NK cells" = flowEnv$ID1229333490, "NKT cells" = flowEnv$ID275096433 ) results <- list() for(cellType in names(filterList)){ results[[cellType]] <- filter(ff_c,filterList[[cellType]])@subSet } manual <- rep("Unknown",nrow(ff)) for(celltype in names(results)){ manual[results[[celltype]]] <- celltype } # Use a factor to define order of the cell types manual <- factor(manual,levels = c("Unknown","B cells", "ab T cells","yd T cells", "NK cells","NKT cells")) PlotPies(fSOM,cellTypes=manual) @ \bigskip{} \textbf{4. Metaclustering the data} The fourth step of the FlowSOM algorithm is to perform a meta-clustering of the data. This can be the first step in further analysis of the data, and often gives a good approximation of manual gating results. If you have background knowledge about the number of cell types you are looking for, it might be optimal to provide this number to the algorithm. <>= metaClustering <- metaClustering_consensus(fSOM$map$codes,k=7) PlotPies(fSOM,cellTypes=manual,clusters = metaClustering) @ You can also extract the metaClustering for each cell individually <<>>= metaClustering_perCell <- metaClustering[fSOM$map$mapping[,1]] @ \bigskip{} \textbf{5. Summary} In summary, the FlowSOM package provides some new ways to look at cytometry data. It can help to keep an overview of how all markers are behaving on different cell types, and to reduce the probability of overlooking interesting things that are present in the data. \end{document}