--- title: "Retrofit Colon Vignette" author: "Adam Keebum Park, Roopali Singh" date: "`r Sys.Date()`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Retrofit Colon Vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, comment = '##', results = 'markup', warning = FALSE) ``` # Introduction RETROFIT is a statistical method for reference-free deconvolution of spatial transcriptomics data to estimate cell type mixtures. In this Vignette, we will estimate cell type composition of a Colon dataset. We will annotate cell types using marker gene lists as well as single cell reference. We will also reproduce some of the results from the paper for illustration. # Package Installation and other requirements Install and load the packages using the following steps: ```{r, eval=FALSE} if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("retrofit") ``` ```{r, load_library, message=FALSE} library(retrofit) ``` # Spatial Transcriptomics Data First load the Colon ST data from fetal 12 PCW tissue, using the following command: ```{r, data} data("vignetteColonData") X = vignetteColonData$a3_x ``` # Reference-free Deconvolution Initialize the required parameters for deconvolution. - iterations: Number of iterations (default = 4000) - L: Number of components required ```{r, decompose initialization} iterations = 10 L = 20 ``` After initialization, run retrofit on the data (X) as follows: ```{r, decompose} result = retrofit::decompose(X, L=L, iterations=iterations, verbose=TRUE) H = result$h W = result$w Th = result$th ``` After deconvolution of ST data, we have our estimates of W (a matrix of order G x L), H (a matrix of order L x S) and Theta (a vector of L components). Here, we are using number of iterations as 20 for demonstration purposes. For reproducing results in the paper, we need to run RETROFIT for iterations = 4000. The whole computation is omitted here due to time complexity (> 10min). We will load the results from 4000 iterations for the rest of the analysis. ```{r, load_retrofit_results} H = vignetteColonData$a3_results_4k_iterations$decompose$h W = vignetteColonData$a3_results_4k_iterations$decompose$w Th = vignetteColonData$a3_results_4k_iterations$decompose$th ``` Above results are obtained by running the code below. ```{r, reproducible codes, eval=FALSE} iterations = 4000 set.seed(1) result = retrofit::decompose(x, L=L, iterations=iterations) ``` Next, we need to annotate the components, to get the proportion of K cell types. We can do this in two ways: (a) using an annotated single cell reference or (b) using the known marker genes. # Cell-type Annotation via annotated single-cell reference Here, we will annotate components using single-cell reference. Load the single-cell reference data: ```{r, sc} sc_ref = vignetteColonData$sc_ref ``` This file contains average gene expression values for K=8 cell types. Run the following command to get K cell-type mixtures from the ST data X: ```{r, annotate with sc} K = ncol(sc_ref) # number of cell types result = annotateWithCorrelations(sc_ref, K, W, H) H_sc = result$h W_sc = result$w H_sc_prop = result$h_prop W_sc_prop = result$w_prop ``` We assign components to the cell type it has maximum correlation with as shown in figure above. Finally, cell-type proportions for each spot in the ST data (X) are stored in H_mark_prop of order K x S. You can verify that the mappings from both methods are largely similar. ```{r, visualize correlation matrix between the sc reference and W} W_norm <- W for(i in 1:nrow(W)){ W_norm[i,]=W[i,]/sum(W[i,]) } corrplot::corrplot(stats::cor(sc_ref, W_norm), is.corr=FALSE, mar=c(0,0,1,0), col = colorRampPalette(c("white", "deepskyblue", "blue4"))(100), main="Correlation-based Mapping Matrix") ``` # Cell-type Annotation via known marker genes Here, we will annotate using known marker genes for cell types. Load the marker gene list: ```{r, marker} marker_ref = vignetteColonData$marker_ref ``` This file contains the list of marker genes for K = 8 cell types. Run the following command to get K cell-type mixtures from the ST data X: ```{r, annotate with marker} K = length(marker_ref) # number of cell types result = retrofit::annotateWithMarkers(marker_ref, K, W, H) H_mark = result$h W_mark = result$w H_mark_prop = result$h_prop W_mark_prop = result$w_prop gene_sums = result$gene_sums ``` ```{r, visualize correlation matrix marker} corrplot::corrplot(gene_sums, is.corr=FALSE, mar=c(0,0,1,0), col=colorRampPalette(c("white", "deepskyblue", "blue4"))(100), main="Marker-based Mapping Matrix") ``` We assign components to the cell type it has maximum average marker expression in, as shown in figure above. Finally, cell-type proportions for each spot in the ST data (X) are stored in H_est of order K x S. # Results and visualization Hereon, we will be reproducing some of the analysis from the paper using marker-based mapping results. ## Figure 4A: Proportion of different cell types in the tissue. Proportion of cell-types in this tissue: ```{r} rowSums(H_mark_prop)/ncol(H_mark_prop) ``` ## Figure 4C: Localization of cell types with the dominant cell type in each spot Load the coordinates for the spots in the tissue and the tissue image: ```{r} coords=vignetteColonData$a3_coords img <- png::readPNG(RCurl::getURLContent("https://user-images.githubusercontent.com/90921267/210159136-96b56551-f414-4b0b-921e-98f05a98c8bc.png")) t <- grid::rasterGrob(img, width=ggplot2::unit(1,"npc"), height=ggplot2::unit(1,"npc"), interpolate = FALSE) ``` Find the cell-type with maximum proportion in each spot and plot: ```{r, fig.dim = c(4.5, 4.5*(456/480))} dat=as.data.frame(cbind(coords,t(H_mark_prop))) colnames(dat)=c(colnames(coords),rownames(H_mark_prop)) df=dat[,-c(1:3)] df$CellType=NA for(i in 1:nrow(df)){ df$CellType[i]=names(which.max(df[i,-c(1:2)])) } df$CellType=factor(df$CellType,levels=c("Epithelial", "Immune", "Myo.Meso","Muscle", "Neural", "Endothelium", "Pericytes","Fibroblasts")) ggplot2::ggplot(df, ggplot2::aes(x=imagerow, y=imagecol)) + ggplot2::geom_point(size=1.8, shape=21, ggplot2::aes(fill=CellType))+ ggplot2::scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9","#009E73", "#CC79A7", "#0072B2", "#D55E00" ,"#F0E442"), labels=c("Epithelial", "Immune","Myofibroblasts/\nMesothelium", "Muscle","Neural","Endothelium", "Pericytes","Fibroblasts"))+ ggplot2::xlab("") + ggplot2::ylab("") + ggplot2::theme_classic()+ ggplot2::theme(legend.position = "none", axis.line = ggplot2::element_blank(), plot.margin = ggplot2::margin(-0.2, -0.2, 0.2, -1, "cm"), axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank(), axis.ticks.x = ggplot2::element_blank(), axis.ticks.y = ggplot2::element_blank()) ``` ## Figure 4D: Gene expression of Epithelial marker genes across spots Plot the spatial expression for Epithelial markers: ```{r, fig.dim = c(4.5, 4.5*(456/480))} df = as.data.frame(cbind(coords, t(X[marker_ref$Epithelial,]))) df$metagene = unname(rowSums(df[,-c(1:5)])) df$metagene[df$metagene>quantile(df$metagene,0.95)]=quantile(df$metagene,0.95) ggplot2::ggplot(df, ggplot2::aes(x=imagerow, y=imagecol)) + # ggplot2::annotation_custom(t, 700, 6400, 990, 6550) + ggplot2::geom_point(size=1.5, ggplot2::aes(color=metagene))+ ggplot2::scale_color_gradientn(name="Epithelial Markers ", colors=colorspace::adjust_transparency("grey20", alpha = c(0,0.5,1)), n.breaks=3) + ggplot2::xlab("") + ggplot2::ylab("") + ggplot2::theme_classic()+ ggplot2::theme(legend.key.height = ggplot2::unit(0.2, 'cm'), #change legend key size legend.key.width = ggplot2::unit(0.8, 'cm'), legend.title = ggplot2::element_text(size=10), legend.position = "bottom", legend.box.margin=ggplot2::margin(-20,-10,-5,-10), plot.margin = ggplot2::margin(-0.2, -0.2, 0.2, -1, "cm"), axis.line=ggplot2::element_blank(), legend.text=ggplot2::element_text(size=8), axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank(), axis.ticks.x=ggplot2::element_blank(), axis.ticks.y=ggplot2::element_blank()) ``` ## Figure 4E: Proportion of Epithelial cells across spots Plot the estimated proportions for Epithelial cell-type: ```{r, fig.dim = c(4.5, 4.5*(456/480))} df=dat[,-c(1:3)] ggplot2::ggplot(df, ggplot2::aes(x=imagerow, y=imagecol)) + # ggplot2::annotation_custom(t, 700, 6400, 990, 6550) + ggplot2::geom_point(size=1.5, ggplot2::aes(color=Epithelial))+ ggplot2::scale_color_gradientn(name="Epithelial ", colors=colorspace::adjust_transparency("grey20", alpha = c(0,0.5,1)), n.breaks=3, limits=c(0,1)) + ggplot2::xlab("") + ggplot2::ylab("") + ggplot2::theme_classic()+ ggplot2::theme(legend.key.height = ggplot2::unit(0.2, 'cm'), #change legend key size legend.key.width = ggplot2::unit(0.8, 'cm'), legend.title = ggplot2::element_text(size=10), legend.position = "bottom", legend.box.margin=ggplot2::margin(-20,-10,-5,-10), plot.margin = ggplot2::margin(-0.2, -0.2, 0.2, -1, "cm"), axis.line=ggplot2::element_blank(), legend.text=ggplot2::element_text(size=8), axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank(), axis.ticks.x=ggplot2::element_blank(),axis.ticks.y=ggplot2::element_blank()) ``` Verify that the spatial pattern exhibited by the epithelial markers is largely similar to that of the estimated proportion for epithelial cells. ## Figure 5A: Proportion of different cell types in different spots Visualize the proportion of different cell types and colocalization patterns: ```{r, fig.dim = c(10,5)} ## Structure Plots plotfn <- function(df=NULL) { #reshape to long format df$num <- 1:nrow(df) df1 <- reshape2::melt(df,id.vars = "num") #reversing order for cosmetic reasons df1 <- df1[rev(1:nrow(df1)),] #plot p <- ggplot2::ggplot(df1, ggplot2::aes(x=num,y=value,fill=variable))+ ggplot2::geom_bar(stat="identity",position="fill",width = 1, space = 0)+ ggplot2::scale_x_continuous(expand = c(0, 0))+ ggplot2::scale_y_continuous(expand = c(0, 0))+ ggplot2::labs(x = NULL, y = NULL)+ ggplot2::theme_grey(base_size=7)+ ggplot2::theme(legend.text = ggplot2::element_text(size = 20),legend.position = "bottom", legend.key.size = ggplot2::unit(0.4, "cm"), legend.title = ggplot2::element_blank(), axis.ticks = ggplot2::element_blank(), axis.text.y = ggplot2::element_text(size = 15), axis.text.x = ggplot2::element_blank())+ ggplot2::scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73", "#CC79A7", "#0072B2", "#D55E00", "#F0E442")) p } df=as.data.frame(t(H_mark_prop[c(2,4,6,5,7,1,8,3),])) #pick max cluster, match max to cluster maxval <- apply(df,1,max) matchval <- vector(length=nrow(df)) for(j in 1:nrow(df)) matchval[j] <- match(maxval[j],df[j,]) #add max and match to df df_q <- df df_q$maxval <- maxval df_q$matchval <- matchval #order dataframe ascending match and decending max df_q <- df_q[with(df_q, order(matchval,-maxval)), ] #remove max and match df_q$maxval <- NULL df_q$matchval <- NULL plotfn(df=df_q)+ ggplot2::theme(legend.position="bottom")+ ggplot2::ggtitle("Fetal 12 PCW (B)")+ ggplot2::theme(plot.title = ggplot2::element_text(hjust=0.5,size=25)) ``` ## Figure 5D: Spots with 1 dominant cell type i.e., proportion > 0.5 Plot the proportion of only dominant cell types i.e., cell types with proportion > 0.5 in a spot. ```{r, fig.dim = c(4.5, 4.5*(456/480))} df=dat df_new=df[df$Epithelial>0.5 | df$Fibroblasts>0.5 | df$Myo.Meso>0.5 | df$Endothelium>0.5 | df$Pericytes>0.5 | df$Neural>0.5 | df$Immune>0.5 | df$Muscle>0.5,-c(1:3)] df_new$CellType=1*(df_new$Epithelial>0.5) + 2*(df_new$Immune>0.5) + 3*(df_new$Myo.Meso>0.5) + 4*(df_new$Muscle>0.5) + 5*(df_new$Neural>0.5)+ 6*(df_new$Endothelium>0.5) + 7*(df_new$Pericytes>0.5) + 8*(df_new$Fibroblasts>0.5) df_new$CellType=factor(df_new$CellType) ggplot2::ggplot(df_new, ggplot2::aes(x=imagerow, y=imagecol)) + # ggplot2::annotation_custom(t, 700, 6400, 990, 6550) + ggplot2::geom_point(size=1.8, shape=21, ggplot2::aes(fill=CellType))+ ggplot2::scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9", "#009E73", "#CC79A7", "#0072B2", "#D55E00", "#F0E442"), labels=c("Epithelial","Immune", "Myofibroblasts\nMesothelium", "Muscle", "Neural", "Endothelium", "Pericytes", "Fibroblasts"))+ ggplot2::xlab("") + ggplot2::ylab("") + ggplot2::theme_classic()+ ggplot2::theme(legend.key.height = ggplot2::unit(0.5, 'cm'), #change legend key size legend.key.width = ggplot2::unit(1, 'cm'), legend.title = ggplot2::element_blank(), legend.position = "none", plot.margin = ggplot2::margin(-0.2, -0.2, 0.2, -1, "cm"), axis.line=ggplot2::element_blank(), legend.text=ggplot2::element_text(size=9), axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank(), axis.ticks.x=ggplot2::element_blank(),axis.ticks.y=ggplot2::element_blank()) ``` ## Figure 5E: Co-localized spots for Epithelial cells Visualize the spots that contain cell types co-localized with Epithelial cells: ```{r, fig.dim = c(4.5, 4.5*(456/480))} #Epithelial df=dat df_new=df[df$Epithelial>0.25,] df_new=df_new[df_new$Fibroblasts>0.25 | df_new$Myo.Meso>0.25 | df_new$Endothelium>0.25 | df_new$Pericytes>0.25 | df_new$Neural>0.25 | df_new$Immune>0.25 | df_new$Muscle>0.25,-c(1:3)] df_new$CellType=NA for(i in 1:nrow(df_new)){ df_new$CellType[i]=names(which.max(df_new[i,-c(1:2,4)])) } df_new$CellType=factor(df_new$CellType,c("Immune", "Myo.Meso", "Muscle","Neural", "Endothelium", "Pericytes","Fibroblasts")) ggplot2::ggplot(df_new, ggplot2::aes(x=imagerow, y=imagecol)) + ggplot2::ylim(min(df$imagecol),max(df$imagecol)) + ggplot2::xlim(min(df$imagerow),max(df$imagerow))+ ggplot2::geom_point(size=1.8, shape=21, ggplot2::aes(fill=CellType))+ ggplot2::scale_fill_manual(values=c("#E69F00", "#56B4E9", "#0072B2","#F0E442"), labels=c("Immune", "Myofibroblasts\nMesothelium", "Endothelium", "Fibroblasts"))+ ggplot2::xlab("") + ggplot2::ylab("") + ggplot2::theme_classic()+ ggplot2::theme(legend.key.height = ggplot2::unit(0.2, 'cm'), #change legend key size legend.key.width = ggplot2::unit(1, 'cm'), legend.title = ggplot2::element_blank(), legend.position = "bottom", plot.margin = ggplot2::margin(-0.15, -0.5, 0.2, -0.55, "cm"), #A3 axis.line = ggplot2::element_blank(), axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank(), axis.ticks.x=ggplot2::element_blank(),axis.ticks.y=ggplot2::element_blank()) ``` ## Figure 6C: Concordance between expression profiles of found genes obtained from RETROFIT and scRNA-seq data ```{r, fig.dim = c(12, 12)} df=vignetteColonData$marker_ref_df gene1=vignetteColonData$fetal12_genes gene1=gene1[!(gene1 %in% df$Gene)] #common genes W1=vignetteColonData$a3_results_4k_iterations$annotateWithCorrelations$w W12=vignetteColonData$intestine_w_12pcw W12=W12[gene1,] W12_prop=W12 w_rowsums1 = rowSums(W12) for (i in 1:length(w_rowsums1)){ if(w_rowsums1[i] > 0){ W12_prop[i,] = W12_prop[i,]/w_rowsums1[i] } } W1_prop=W1[gene1,-c(1:2)] dat=data.frame(gene=c(rep(gene1,8)), celltype=factor(c(rep("Epithelial",length(gene1)),rep("Fibroblasts",length(gene1)), rep("Myofibroblasts\nMesothelium",length(gene1)),rep("Endothelium",length(gene1)), rep("Pericytes",length(gene1)),rep("Neural",length(gene1)), rep("Immune",length(gene1)),rep("Muscle",length(gene1)) ),levels=c("Epithelial","Immune","Fibroblasts","Endothelium", "Neural","Pericytes", "Myofibroblasts\nMesothelium","Muscle")), stage=c(rep("Fetal 12 PCW (B)",length(gene1)*8)), gene_exp1=c(W1_prop[gene1,"Epithelial"],W1_prop[gene1,"Fibroblasts"], W1_prop[gene1,"Myo.Meso"],W1_prop[gene1,"Endothelium"], W1_prop[gene1,"Pericytes"],W1_prop[gene1,"Neural"], W1_prop[gene1,"Immune"],W1_prop[gene1,"Muscle"] ), gene_exp2=c(W12_prop[gene1,"Epithelial"],W12_prop[gene1,"Fibroblasts"], W12_prop[gene1,"Myo.Meso"],W12_prop[gene1,"Endothelium"], W12_prop[gene1,"Pericytes"],W12_prop[gene1,"Neural"], W12_prop[gene1,"Immune"],W12_prop[gene1,"Muscle"] )) ggplot2::ggplot(dat, ggplot2::aes(celltype, gene)) + ggplot2::geom_point(shape=21, ggplot2::aes(fill=gene_exp1, size=gene_exp2)) + ggplot2::theme_classic() + ggplot2::scale_fill_gradientn(colors = pals::brewer.blues(20)[2:20], name = "Expression (RETROFIT)") + ggplot2::facet_grid(cols=ggplot2::vars(stage))+ ggplot2::theme(axis.text.x=ggplot2::element_text(size=20, angle = 45, hjust = 1), axis.text.y=ggplot2::element_text(size=20), legend.title=ggplot2::element_text(size=20), legend.text=ggplot2::element_text(size=20), legend.key.width=ggplot2::unit(0.3,"cm"), legend.position="right" ) + ggplot2::guides(size=ggplot2::guide_legend("Expression (scRNA-seq)"))+ ggplot2::xlab('')+ ggplot2::ylab('Genes')+ ggplot2::theme() ``` # Session information ```{r} sessionInfo() ```