--- title: "Quick start of CytoTree" author: "Yuting Dai" date: "`r Sys.Date()`" output: prettydoc::html_pretty: highlight: github theme: cayman toc: yes pdf_document: toc: yes html_document: df_print: paged toc: yes package: CytoTree vignette: | %\VignetteIndexEntry{Quick_start} \usepackage[utf8]{inputenc} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r echo = TRUE} knitr::opts_chunk$set(echo = TRUE, cache = FALSE, eval = TRUE, warning = TRUE, message = TRUE, fig.width = 6, fig.height = 5) ``` ## Introduction Although multidimensional single-cell-based flow and mass cytometry have been increasingly applied to microenvironmental composition and stem-cell research, integrated analysis workflows to facilitate the interpretation of experimental cytometry data remain underdeveloped. We present CytoTree, a comprehensive R package designed for the analysis and interpretation of flow and mass cytometry data. We applied CytoTree to mass cytometry and time-course flow cytometry data to demonstrate the usage and practical utility of its computational modules. CytoTree is a reliable tool for multidimensional cytometry data workflows and produces compelling results for trajectory construction and pseudotime estimation.

## Overview of CytoTree workflow The CytoTree package is developed to complete the majority of standard analysis and visualization workflow for FCS data. In CytoTree workflow, an S4 object in R is built to implement the statistical and computational approach, and all computational modules are integrated into one single channel which only requires a specified input data format. `CytoTree` can help you to perform four main types of analysis: - **Clustering**. `CytoTree` can help you to discover and identify subtypes of cells. - **Dimensionality Reduction**. Several dimensionality reduction methods are provided in `CytoTree` package such as Principal Components Analysis (PCA), t-distributed Stochastic Neighbor Embedding (tSNE), Diffusion Maps and Uniform Manifold Approximation and Projection (UMAP). CytoTree provides both cell-based and cluster-based dimensionality reduction. - **Trajectory Inference**. `CytoTree` can help you to construct the cellular differential based on minimum spanning tree (MST) algorithm. - **Pseudotime and Intermediate states definition**. The root cells need to be defined by users. The trajctroy value will be calculated based on Shortest Path from root cells and leaf cells using R `igraph` package. Subset FCS data set in `CytoTree` and find the key intermediate cell states based on trajectory value. Workflow of CytoTree

**Fig. 1 Workflow of CytoTree** ## Quick start ``` {r eval = TRUE} # Loading packages suppressMessages({ library(ggplot2) library(CytoTree) library(flowCore) library(stringr) }) # Read fcs files fcs.path <- system.file("extdata", package = "CytoTree") fcs.files <- list.files(fcs.path, pattern = '.FCS$', full = TRUE) fcs.data <- runExprsMerge(fcs.files, comp = FALSE, transformMethod = "none") # Refine colnames of fcs data recol <- c(`FITC-A` = "CD43", `APC-A` = "CD34", `BV421-A` = "CD90", `BV510-A` = "CD45RA", `BV605-A` = "CD31", `BV650-A` = "CD49f", `BV 735-A` = "CD73", `BV786-A` = "CD45", `PE-A` = "FLK1", `PE-Cy7-A` = "CD38") colnames(fcs.data)[match(names(recol), colnames(fcs.data))] = recol fcs.data <- fcs.data[, recol] day.list <- c("D0", "D2", "D4", "D6", "D8", "D10") meta.data <- data.frame(cell = rownames(fcs.data), stage = str_replace(rownames(fcs.data), regex(".FCS.+"), "") ) meta.data$stage <- factor(as.character(meta.data$stage), levels = day.list) markers <- c("CD43","CD34","CD90","CD45RA","CD31","CD49f","CD73","CD45","FLK1","CD38") # Build the CYT object cyt <- createCYT(raw.data = fcs.data, markers = markers, meta.data = meta.data, normalization.method = "log", verbose = TRUE) # See information cyt ``` ``` {r eval = TRUE} # Cluster cells by SOM algorithm # Set random seed to make results reproducible set.seed(1) cyt <- runCluster(cyt, cluster.method = "som") # Do not perform downsampling set.seed(1) cyt <- processingCluster(cyt) # run Principal Component Analysis (PCA) cyt <- runFastPCA(cyt) # run t-Distributed Stochastic Neighbor Embedding (tSNE) cyt <- runTSNE(cyt) # run Diffusion map cyt <- runDiffusionMap(cyt) # run Uniform Manifold Approximation and Projection (UMAP) cyt <- runUMAP(cyt) # build minimum spanning tree based on tsne cyt <- buildTree(cyt, dim.type = "tsne", dim.use = 1:2) # DEGs of different branch diff.list <- runDiff(cyt) # define root cells cyt <- defRootCells(cyt, root.cells = c(28,26)) # run pseudotime cyt <- runPseudotime(cyt, verbose = TRUE, dim.type = "raw") # define leaf cells cyt <- defLeafCells(cyt, leaf.cells = c(27, 13), verbose = TRUE) # run walk between root cells and leaf cells cyt <- runWalk(cyt, verbose = TRUE) # Save object if (FALSE) { save(cyt, file = "Path to you output directory") } ######################## Visualization # Plot 2D tSNE. And cells are colored by cluster id plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "cluster.id", alpha = 1, main = "tSNE", category = "categorical", show.cluser.id = TRUE) # Plot 2D UMAP. And cells are colored by cluster id plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "cluster.id", alpha = 1, main = "UMAP", category = "categorical", show.cluser.id = TRUE) # Plot 2D tSNE. And cells are colored by cluster id plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "branch.id", alpha = 1, main = "tSNE", category = "categorical", show.cluser.id = TRUE) # Plot 2D UMAP. And cells are colored by cluster id plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "branch.id", alpha = 1, main = "UMAP", category = "categorical", show.cluser.id = TRUE) # Plot 2D tSNE. And cells are colored by stage plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "stage", alpha = 1, main = "UMAP", category = "categorical") + scale_color_manual(values = c("#00599F","#009900","#FF9933", "#FF99FF","#7A06A0","#FF3222")) # Plot 2D UMAP. And cells are colored by stage plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), color.by = "stage", alpha = 1, main = "UMAP", category = "categorical") + scale_color_manual(values = c("#00599F","#009900","#FF9933", "#FF99FF","#7A06A0","#FF3222")) # Tree plot plotTree(cyt, color.by = "D0.percent", show.node.name = TRUE, cex.size = 1) + scale_colour_gradientn(colors = c("#00599F", "#EEEEEE", "#FF3222")) plotTree(cyt, color.by = "CD43", show.node.name = TRUE, cex.size = 1) + scale_colour_gradientn(colors = c("#00599F", "#EEEEEE", "#FF3222")) # plot clusters plotCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), category = "numeric", size = 100, color.by = "CD45RA") + scale_colour_gradientn(colors = c("#00599F", "#EEEEEE", "#FF3222")) # plot pie tree plotPieTree(cyt, cex.size = 3, size.by.cell.number = TRUE) + scale_fill_manual(values = c("#00599F","#FF3222","#009900", "#FF9933","#FF99FF","#7A06A0")) # plot pie cluster plotPieCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), cex.size = 40) + scale_fill_manual(values = c("#00599F","#FF3222","#009900", "#FF9933","#FF99FF","#7A06A0")) # plot heatmap of cluster plotClusterHeatmap(cyt) plotBranchHeatmap(cyt) # Violin plot plotViolin(cyt, color.by = "cluster.id", marker = "CD45RA", text.angle = 90) plotViolin(cyt, color.by = "branch.id", marker = "CD45RA", text.angle = 90) # UMAP plot colored by pseudotime plot2D(cyt, item.use = c("UMAP_1", "UMAP_2"), category = "numeric", size = 1, color.by = "pseudotime") + scale_colour_gradientn(colors = c("#F4D31D", "#FF3222","#7A06A0")) # tSNE plot colored by pseudotime plot2D(cyt, item.use = c("tSNE_1", "tSNE_2"), category = "numeric", size = 1, color.by = "pseudotime") + scale_colour_gradientn(colors = c("#F4D31D", "#FF3222","#7A06A0")) # denisty plot by different stage plotPseudotimeDensity(cyt, adjust = 1) + scale_color_manual(values = c("#00599F","#009900","#FF9933", "#FF99FF","#7A06A0","#FF3222")) # Tree plot plotTree(cyt, color.by = "pseudotime", cex.size = 1.5) + scale_colour_gradientn(colors = c("#F4D31D", "#FF3222","#7A06A0")) plotViolin(cyt, color.by = "cluster.id", order.by = "pseudotime", marker = "CD49f", text.angle = 90) # trajectory value plotPseudotimeTraj(cyt, var.cols = TRUE) + scale_colour_gradientn(colors = c("#F4D31D", "#FF3222","#7A06A0")) plotHeatmap(cyt, downsize = 1000, cluster_rows = TRUE, clustering_method = "ward.D", color = colorRampPalette(c("#00599F","#EEEEEE","#FF3222"))(100)) # plot cluster plotCluster(cyt, item.use = c("tSNE_1", "tSNE_2"), color.by = "traj.value.log", size = 10, show.cluser.id = TRUE, category = "numeric") + scale_colour_gradientn(colors = c("#EEEEEE", "#FF3222", "#CC0000", "#CC0000")) ``` ## Announcement The previous version of `CytoTree` is `flowSpy` **[link to GitHub](https://github.com/JhuangLab/CytoTree) and [link to Bioconductor](https://bioconductor.org/packages/flowSpy/)**. To improve the identification and avoid awkward duplication of names in some situations, we changed the name of `flowSpy` to `CytoTree`. `CytoTree` more fits the functional orientation of this software. We apologized for the inconvenience. ## References 1. Hahne F, Arlt D, Sauermann M, Majety M, Poustka A, Wiemann S, Huber W: Statistical methods and software for the analysis of highthroughput reverse genetic assays using flow cytometry readouts. Genome Biol 2006, 7:R77. 2. Olsen LR, Leipold MD, Pedersen CB, Maecker HT: The anatomy of single cell mass cytometry data. Cytometry A 2019, 95:156-172. 3. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R: Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 2018, 36:411-420. 4. Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL: The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014, 32:381-386. 5. Kiselev VY, Yiu A, Hemberg M: scmap: projection of single-cell RNA-seq data across data sets. Nat Methods 2018, 15:359-362. 6. Amir el AD, Davis KL, Tadmor MD, Simonds EF, Levine JH, Bendall SC, Shenfeld DK, Krishnaswamy S, Nolan GP, Pe'er D: viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nat Biotechnol 2013, 31:545-552. 7. Haghverdi L, Buettner F, Theis FJ: Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 2015, 31:2989-2998. 8. Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW: Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 2018. 9. Wang L, Hoffman RA: Standardization, Calibration, and Control in Flow Cytometry. Curr Protoc Cytom 2017, 79:1 3 1-1 3 27. 10. Hahne F, LeMeur N, Brinkman RR, Ellis B, Haaland P, Sarkar D, Spidlen J, Strain E, Gentleman R: flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinformatics 2009, 10:106. 11. Sarkar D, Le Meur N, Gentleman R: Using flowViz to visualize flow cytometry data. Bioinformatics 2008, 24:878-879. 12. Van Gassen S, Callebaut B, Van Helden MJ, Lambrecht BN, Demeester P, Dhaene T, Saeys Y: FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 2015, 87:636-645. 13. Qiu P, Simonds EF, Bendall SC, Gibbs KD, Jr., Bruggner RV, Linderman MD, Sachs K, Nolan GP, Plevritis SK: Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat Biotechnol 2011, 29:886-891. 14. Chen H, Lau MC, Wong MT, Newell EW, Poidinger M, Chen J: Cytofkit: A Bioconductor Package for an Integrated Mass Cytometry Data Analysis Pipeline. PLoS Comput Biol 2016, 12:e1005112. 15. Chattopadhyay PK, Winters AF, Lomas WE, 3rd, Laino AS, Woods DM: High-Parameter Single-Cell Analysis. Annu Rev Anal Chem (Palo Alto Calif) 2019, 12:411-430. 16. Bendall SC, Davis KL, Amir el AD, Tadmor MD, Simonds EF, Chen TJ, Shenfeld DK, Nolan GP, Pe'er D: Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 2014, 157:714-725. 17. Nowicka M, Krieg C, Crowell HL, Weber LM, Hartmann FJ, Guglietta S, Becher B, Levesque MP, Robinson MD: CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. F1000Res 2017, 6:748.