--- title: "scAlign Tutorial" author: "Nelson Johansen, Gerald Quon" date: "`r Sys.Date()`" output: pdf_document package: scAlign vignette: > %\VignetteIndexEntry{alignment_tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Introduction This tutorial provides a guided alignment for two groups of cells from [cellbench](https://github.com/LuyiTian/CellBench_data/blob/master/cellbench.md) RNA mixture experiments. In this tutorial we demonstrate the unsupervised alignment strategy of `scAlign` described in [Johansen et al, 2018](https://www.biorxiv.org/content/10.1101/504944v2) along with typical analysis utilizing the aligned dataset, and show how `scAlign` can identify and match cell types across platforms without using the labels as input. ## Alignment goals The following is a walkthrough of a typical alignment problem for `scAlign` and has been designed to provide an overview of data preprocessing, alignment and finally analysis in our joint embedding space. Here, our primary goals include: 1. Learning a low-dimensional cell state space in which cells group by function and type, regardless of condition (platform). 2. Accurately labeling old cells with cell cycle and cell type information using only the young cell annotations. ## Installation ```R if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("scAlign") ``` ``` Guide to installing python and tensorflow on different operating systems. On Windows: Download Python 3.6.8. Note, newer versions of Python (e.g. 3.7) cannot use TensorFlow at this time. Make sure pip is included in the installation. Open Windows Command Prompt, or PowerShell. Navigate to your Python installation directory (for Windows 10, the default seems to be C:\Users\userid\AppData\Local\Programs\Python\Python36\Scripts, where userid is your own username). Run .\pip install --upgrade tensorflow On Ubuntu: sudo apt update sudo apt install python3-dev python3-pip pip3 install --user --upgrade tensorflow # install in $HOME python3 -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))" On MacOS (homebrew): /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" export PATH="/usr/local/bin:/usr/local/sbin:$PATH" brew update brew install python # Python 3 pip3 install --user --upgrade tensorflow # install in $HOME python3 -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))" Further details at: https://www.tensorflow.org/install ``` ## Data setup The gene count matrices used for this tutorial are hosted on the cellbench github: [here](https://github.com/LuyiTian/CellBench_data/tree/master/data). First, we load in the normalized cellbench data. The data was normalized following the procedures defined on the cellbench github. ```{r, message=FALSE} library(scAlign) library(SingleCellExperiment) library(ggplot2) ## Load in cellbench data data("cellbench", package = "scAlign", envir = environment()) ## Extract RNA mixture cell types mix.types = unlist(lapply(strsplit(colnames(cellbench), "-"), "[[", 2)) ## Extract Platform batch = c(rep("CEL", length(which(!grepl("sortseq", colnames(cellbench)) == TRUE))), rep("SORT", length(which(grepl("sortseq", colnames(cellbench)) == TRUE)))) ``` ## scAlign setup The general design of scAlign's makes it agnostic to the input RNA-seq data representation. Thus, the input data can either be gene-level counts, transformations of those gene level counts or a preliminary step of dimensionality reduction such as canonical correlates or principal component scores. Here we create the `scAlign` object from the previously normalized cellbench data and perform CCA on the unaligned data. ```{r} ## Create SCE objects to pass into scAlignCreateObject youngMouseSCE <- SingleCellExperiment( assays = list(scale.data = cellbench[,batch=='CEL']) ) oldMouseSCE <- SingleCellExperiment( assays = list(scale.data = cellbench[,batch=='SORT']) ) ## Build the scAlign class object and compute PCs scAlignCB = scAlignCreateObject(sce.objects = list("CEL"=youngMouseSCE, "SORT"=oldMouseSCE), labels = list(mix.types[batch=='CEL'], mix.types[batch=='SORT']), data.use="scale.data", pca.reduce = FALSE, cca.reduce = TRUE, ccs.compute = 5, project.name = "scAlign_cellbench") ``` ## Alignment of cellbench RNAmixture Now we align the cell populations from both protocols. `scAlign` returns a low-dimensional joint embedding space where the effect of platform is removed allowing us to use the complete dataset for downstream analyses such as clustering or differential expression. ```{r} ## Run scAlign with all_genes scAlignCB = scAlign(scAlignCB, options=scAlignOptions(steps=1000, log.every=1000, norm=TRUE, early.stop=TRUE), encoder.data="scale.data", supervised='none', run.encoder=TRUE, run.decoder=FALSE, log.dir=file.path('~/models_temp','gene_input'), device="CPU") # ## Additional run of scAlign with CCA # scAlignCB = scAlign(scAlignCB, # options=scAlignOptions(steps=1000, # log.every=1000, # norm=TRUE, # early.stop=TRUE), # encoder.data="CCA", # supervised='none', # run.encoder=TRUE, # run.decoder=FALSE, # log.dir=file.path('~/models','cca_input'), # device="CPU") ## Plot aligned data in tSNE space, when the data was processed in three different ways: ## 1) either using the original gene inputs, ## 2) after CCA dimensionality reduction for preprocessing. ## Cells here are colored by input labels set.seed(5678) gene_plot = PlotTSNE(scAlignCB, "ALIGNED-GENE", title="scAlign-Gene", perplexity=30) ## Show plot gene_plot # cca_plot = PlotTSNE(scAlignCB, # "ALIGNED-CCA", # title="scAlign-CCA", # perplexity=30) # # multi_plot_labels = grid.arrange(gene_plot, cca_plot, nrow = 1) ``` # Session Info ```{r} sessionInfo() ```