--- title: "Retrofit Simulation Vignette" author: "Adam Keebum Park, Roopali Singh" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Retrofit Simulation Vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, comment = '##', results = 'markup', warning = FALSE) ``` # Introduction RETROFIT is a statistical method for reference-free deconvolution of spatial transcriptomics data to estimate cell type mixtures. In this Vignette, we will estimate cell type composition of a sample synthetic dataset. We will annotate cell types using an annotated single cell reference. # Package Installation Install and load the package using the following steps: ```{r, eval=FALSE} if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("retrofit") ``` The main functionalities of RETROFIT are covered in this tutorial. ```{r, load_library, echo = FALSE} library(retrofit) ``` # Spatial Transcriptomics Data First load the ST data, using the following command: ```{r, data} data("vignetteSimulationData") x = vignetteSimulationData$n10m3_x ``` The ST data matrix will consist of G = 500 genes and S = 1000 spots i.e., a matrix of order G x S. # Deconvolution Initialize the following parameters for deconvolution: - iterations: Number of iterations (default = 4000) - L: Number of components required ```{r, decompose initialization} iterations = 10 L = 20 ``` After initialization, run RETROFIT on the data (X) as follows: ```{r, decompose} result = retrofit::decompose(x, L=L, iterations=iterations, verbose=TRUE) H = result$h W = result$w Th = result$th ``` After deconvolution of ST data, we have our estimates of W (a matrix of order G x L), H (a matrix of order L x S) and Theta (a vector of L components). Here, we are using number of iterations as 10 for demonstration purposes. For reproducing results in the paper, we need to run RETROFIT for iterations = 4000. The whole computation is omitted here due to time complexity (> 10min). We will load the results from 4000 iterations for the rest of the analysis. ```{r, load_retrofit_results} H = vignetteSimulationData$results_4k_iterations$decompose$h W = vignetteSimulationData$results_4k_iterations$decompose$w Th = vignetteSimulationData$results_4k_iterations$decompose$th ``` The above results are obtained by running the code below. ```{r, retrofit reproducibility, eval = FALSE} iterations = 4000 set.seed(12) result = retrofit::decompose(x, L=L, iterations=iterations) ``` # Cell-type Annotation Next, we need to annotate the components, to get the proportion of, say K, cell types. We can do this in two ways: (a) using an annotated single cell reference or (b) using the known marker genes. Here, we will annotate using single cell reference. Get the single cell reference data: ```{r, reference} sc_ref_w = vignetteSimulationData$sc_ref_w ``` This file contains average gene expression values for G = 500 genes in K = 10 cell types i.e., a matrix of order G x K. Run the following command to get K cell-type mixtures from the ST data X: ```{r, annotate} K = 10 result = retrofit::annotateWithCorrelations(sc_ref=sc_ref_w, K=K, decomp_w=W, decomp_h=H) H_annotated = result$h W_annotated = result$w ranked_cells = result$ranked_cells ``` Above code assigns components to the cell type it has maximum correlation with. # Deconvolution and annotation in one step (Optional) We can also deconvolve the ST data matrix along with cell-type annotation all in one step, as follows: ```{r, retrofit} iterations = 10 L = 20 K = 10 result = retrofit::retrofit(x, sc_ref=sc_ref_w, iterations=iterations, L=L, K=K) ``` # Results Visualize the correlation values between each cell type and the chosen component, as follows: ```{r, visualize_correlations} # correlation between true and estimated cell-type proportions correlations = stats::cor(sc_ref_w[ranked_cells], W_annotated[,ranked_cells]) ranked_correlations = sort(diag(correlations), decreasing=TRUE) df = data.frame(x=1:length(ranked_correlations), y=ranked_correlations, label_x1=1, label_x2=2, label_y=seq(from=0.5, by=-0.05, length.out=10), label_cell=ranked_cells, label_corr=format(round(ranked_correlations, digits=4))) gg <- ggplot2::ggplot(df,ggplot2::aes(x=x, y=y, group=1)) + ggplot2::geom_line(ggplot2::aes(x=x, y=y)) + ggplot2::geom_point(ggplot2::aes(x=x, y=y)) + ggplot2::theme_bw() + ggplot2::theme(axis.text.x=ggplot2::element_blank()) + ggplot2::ylim(0, 1.05) + ggplot2::ylab(expression(paste("Correlation (",W^0,",",widetilde(W),")"))) + ggplot2::geom_text(data=df, ggplot2::aes(x=label_x1, y=label_y, label=label_corr), size=4, hjust=0) + ggplot2::geom_text(data=df, ggplot2::aes(x=label_x2, y=label_y, label=label_cell), size=4, hjust=0) plot(gg) ``` All mapped cell types have greater than 0.75 correlation with the component they are matched with. Since this is synthetic data, true cell type proportions are known. We can visualize the RMSE between the true and the estimated cell-type proportions as follows: ```{r, visualize_rmse} H_true = vignetteSimulationData$sc_ref_h H_est = H_annotated corrH = sort(diag(stats::cor(H_true,H_est)), decreasing=TRUE, na.last=TRUE) df = data.frame(x=seq(0,1,length.out = 1000), y=corrH) df_text = data.frame(x=0.2, y=0.6, label = c(paste("RETROFIT:", round(DescTools::AUC(x=seq(0,1,length.out = 1000), y=corrH),digits=3)))) gg <- ggplot2::ggplot(df, ggplot2::aes(x=x,y=y)) + ggplot2::geom_line() + ggplot2::scale_color_manual('gray30') + ggplot2::xlab("Normalized Rank") + ggplot2::ylab(expression(paste("Correlation (",H,",",widetilde(H),")"))) + ggplot2::theme_bw() + ggplot2::geom_text(data = df_text, ggplot2::aes(label = label)) plot(gg) ``` In case of real data, where ground truth is not available, we can do certain diagnostics on the quality of results. For example, compute the correlation between marker expression (if known) and cell type proportion in ST data and visualize the agreement of the spatial pattern between markers and estimated proportion (read the paper for more information). # Session information ```{r} sessionInfo() ```