--- title: "SingleCellSignalR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{my-vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(mc.cores=2) ``` # User's Guide ### Simon Cabello-Aguilar1, Jacques Colinge1 1 Institut de Recherche en Cancérologie de Montpellier, Inserm, Montpellier, France ; Institut régional du Cancer Montpellier, Montpellier, France ; Université de Montpellier, Montpellier, France ---- ## Introduction This guide provides an overview of the SingleCellSignalR package, a comprehensive framework to obtain cellular network maps from scRNA-seq data. SingleCellSignalR comes with a complete pipeline integrating existing methods to cluster individual cell transcriptomes and identify cell subpopulations as well as novel cellular network-specific algorithms. More advanced users can substitute their own logic or alternative tools at various stages of data processing. SingleCellSignalR also maps cell subpopulation internal network linked to genes of interest through the integration of regulated KEGG and Reactome pathways together with ligands and receptors involved in inferred cell-cell interactions. The cellular networks can be exported in text files and graphML objects to be further explored with Cytoscape (www.cytoscape.org), yEd (www.yworks.com), or similar software tools. ----- ## Quick Start Independent of the chosen scRNA-seq platform, deep or shallower, data comes as a table of read or unique molecule identifier (UMI) counts, one column per individual cell and one row per gene. Initial processing is required to prepare such data for subsequent analysis and we decided to propose a generic solution for the sake of convenience, though users can easily substitute their own computations. Gene names (HUGO symbols) are provided in the first column of the table. Each analysis is organized around a working directory (or project folder): ![][id0] [id0]: ./directory.png The file containing the read counts should be placed in the working directory. ```r > file <- "example_data.txt" ``` Data processing can then start: ```r > data <- data_prepare(file = file) log-Normalization 14223 genes 400 cells Zero rate = 89.6% ``` ```{r, echo=FALSE, eval=TRUE, results='hide'} library(SingleCellSignalR) data(example_dataset, package = "SingleCellSignalR") data = example_dataset genes = data$genes rownames(data) = genes data = data[,-1] ``` The **data_prepare()** function eliminates non expressed genes before performing read count normalization. Normalized data are submitted to a clustering algorithm to identify cell subpopulations: ```{r, echo=TRUE, eval=FALSE} clust <- clustering(data = data, n.cluster = 4, n = 10, method = "simlr",write = FALSE,pdf=FALSE) ``` ```{r, echo=FALSE, eval=TRUE} clust <- clustering(data = data,n.cluster = 4, n = 10, method = "kmeans",write = FALSE,pdf=FALSE) ``` We set the method argument to `simlr`, which caused the **SIMLR()** function of the SIMLR package [1] to be used. The **SIMLR_Estimate_Number_of_Clusters()** function determined the number of clusters, between 2 and n (n=10 above). Next, differentially expressed genes in one cluster compared to the others are identified using the **cluster_analysis()** function, which relies on **edgeR**. A result table is automatically created in the *cluster-analysis* folder: ```{r, eval=TRUE, results='hide'} clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = clust$cluster, write = FALSE) ``` Once the preliminary steps illustrated above are completed, **SingleCellSignalR** can be used to generate cellular interaction lists using the **cell_signaling()** function: ```{r, eval=TRUE} signal <- cell_signaling(data = data, genes = rownames(data), cluster = clust$cluster, write = FALSE) ``` An intercellular network can also be generated to map the overall ligand/receptor interactions invoking the **inter_network()** function: ```{r, eval=TRUE} inter.net <- inter_network(data = data, signal = signal, genes = genes, cluster = clust$cluster, write = FALSE) ``` At this point the intercellular network have been generated and exported in text and graphML formats in the *networks* folder. A summary of the interactions between cell clusters can be output in the form of a chord diagram by the **visualize_interactions()** function: ```{r, eval=TRUE} visualize_interactions(signal = signal) ``` This function will create a *plot* in the R plot window. The details of the interactions between two clusters, for example cluster 1 and 2, can also be shown in the plot window with the **visualize_interactions()** function. Note that in the example below we ask for the display of two pairs of cell clusters, pair 1 that contains interactions from cluster 1 to 2, and pair 4 from cluster 2 to 1. (`names(signal)` returns the cell cluster names in each pair, see function **visualize_interactions()** details.) ```{r, eval=TRUE} visualize_interactions(signal = signal,show.in=c(1,4)) ``` And these plots can be saved into pdf files in the *images* folder using the `write.in` argument of the **visualize_interactions()** function. ```r > visualize_interactions(signal = signal,write.in=c(1,4)) ``` red red red ------ ## Examples of use **SingleCellSignalR** package functions have many arguments parameters that can be changed by the user to fit her needs (see Reference Manual for more details). Furthermore, several handy functions that were not illustrated above are provided to generate additional plots or reports. --- ### **Exploiting the cell_classifier clustering** ```{r, echo=FALSE, eval=TRUE, results='hide'} data(example_dataset, package = "SingleCellSignalR") data = example_dataset genes = data$genes rownames(data) = genes data = data[,-1] ``` After running the example in the **Quick Start** section, the user can define cell clusters after the output of the **cell_classifier()**. The demo data set is comprised of a subset of the 10x PBMC dataset [3], i.e. immune cells. The t-SNE map calculated with the **clustering()** function will also be used. For this example we will set the `plot.details` argument to TRUE to monitor the choice of the threshold of gene signature scores. ```{r, eval=TRUE, results='hide'} class = cell_classifier(data=data, genes=rownames(data), markers = markers(c("immune")), tsne=clust$`t-SNE`,plot.details=TRUE,write = FALSE) ``` Let us use the cell clustering obtained with the **cell_classifier()** function. Although "undefined" cells may be interesting in some cases, here they form a heterogeneous cluster because they represent cells that seem to be in a transition between two states ("T-cells" and "Cytotoxic cells", or "Neutrophils" and "Macrophages", see heatmap above). We discard these cells. ```{r, eval=TRUE, results='hide'} # Define the cluster vector and the cluster names cluster <- class$cluster c.names <- class$c.names # Remove undefined cells data <- data[,cluster!=(max(cluster))] tsne <- clust$`t-SNE`[cluster!=(max(cluster)),] c.names <- c.names[-max(cluster)] cluster <- cluster[cluster!=(max(cluster))] ``` Then the analysis can be carried on. ```{r, eval=TRUE} clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = cluster, c.names = c.names, write = FALSE) ``` Once the cluster analysis is done, the **cell_signaling()**, **inter_network()** functions can be used. ```{r, eval=TRUE} signal <- cell_signaling(data = data, genes = genes, cluster = cluster, c.names = c.names, write = FALSE) inter.net <- inter_network(data = data, signal = signal, genes = genes, cluster = cluster, write = FALSE) ``` If we take a look at `signal[[6]]` (or `signal[["B-cells-Macrophages"]]`) ```{r, eval=TRUE} signal[[6]] ``` We can be interested in genes participating in pathways with a receptor of interest inside a cluster of interest. Let us say *ASGR1* in "Macrophages". ```{r, eval=FALSE} intra = intra_network(goi = "ASGR1",data = data,genes = rownames(data),cluster = cluster, coi = "Macrophages", c.names = c.names, signal = signal,write=FALSE) ``` Now, let us take an overview of the signaling between the cell types. ```{r, eval=TRUE} visualize_interactions(signal) ``` Let us get deeper and look at the signaling between "T-cells" and "B-cells" for example. ```{r, eval=TRUE} visualize_interactions(signal, show.in=c(1,6)) ``` The following command will save these plots in the *images* folder. ```r > visualize_interactions(signal, write.in=c(1,6)) ``` --- ### **Marker analysis on a cancer dataset** For this example we use the scRNAseq dataset from Tirosh et al. [4]. We use only the data from patient 80. ```r > file <- "patient_80.txt" > data <- data_prepare(file = file) log-Normalization 19452 genes 480 cells Zero rate = 75.5% ``` *Remark: One can notice that the zero rate is lower than in the previous example which reflects the fact that the sequencing is deeper.* We know that this dataset is composed of melanoma cells and their microenvironment, we hence define our markers table using the **markers()** function. ```r > my.markers <- markers(category = c("immune", "tme", "melanoma")) > head(my.markers) T-cells B-cells Macrophages Cytotoxic cells DC Mast cells Neutrophils NK cells Treg Endothelial cells CAFs melanoma 1 CD2 CD19 CD163 PRF1 CCL13 TPSB2 FPR1 XCL1 FOXP3 PECAM1 FAP MIA 2 CD3D CD79A CD14 GZMA CD209 TPSAB1 SIGLEC5 XCL2 VWF THY1 TYR 3 CD3E CD79B CSF1R GZMB HSD11B1 CPA3 CSF3R NCR1 CDH5 DCN SLC45A2 4 CD3G BLK C1QC NKG7 MS4A2 FCAR KIR2DL3 CLDN5 COL1A1 CDH19 5 CD8A MS4A1 VSIG4 GZMH HDC FCGR3B KIR3DL1 PLVAP COL1A2 PMEL 6 SIRPG BANK1 C1QA KLRK1 CEACAM3 KIR3DL2 ECSCR COL6A1 SLC24A5 ``` Let us perform the clustering. For this example, we set the *method* argument to "kmeans" and the *n* argument to 12. ```r > clust <- clustering(data = data,n = 12, method = "kmeans") Estimating the number of clusters Estimated number of clusters = 6 6 clusters detected cluster 1 -> 157 cells cluster 2 -> 15 cells cluster 3 -> 137 cells cluster 4 -> 6 cells cluster 5 -> 114 cells cluster 6 -> 51 cells ``` ![][id10] [id10]: ./Rplot10.png Now we take advantage of the `markers` argument of the **cluster_analysis()** function using *my.markers* obtained above with the **markers()** function. ```r > clust.ana <- cluster_analysis(data = data, genes = rownames(data), cluster = clust$cluster, markers = my.markers) edgeR differential gene expression (dge) processing: Looking for differentially expressed genes in cluster 1 Looking for differentially expressed genes in cluster 2 Looking for differentially expressed genes in cluster 3 Looking for differentially expressed genes in cluster 4 Looking for differentially expressed genes in cluster 5 Looking for differentially expressed genes in cluster 6 ``` ![][id11] [id11]: ./Rplot11.png We can see that the clusters 2 and 5 are well defined, they are respectively cancer associated fibroblasts (CAFs) and melanoma cells. The cluster 6 is also clearly composed of endothelial cells. Clusters 1 and 2 are immune cells but the clustering did not succeed in sorting them correctly and cluster 4 counts only 6 cells. Those do not seem to be homogeneous and we decide to remove them. ```r > data <- data[,clust$cluster!=4] > cluster <- clust$cluster[clust$cluster!=4] > cluster[cluster>4] <- cluster[cluster>4] - 1 ``` Then we can name our clusters manually before pursuing the analysis. ```r > c.names <- c("Immune 1", "CAFs", "Immune 2", "melanoma", "endothelial") > signal <- cell_signaling(data = data, genes = rownames(data), cluster = cluster, c.names = c.names) Paracrine signaling: Checking for signaling between cell types 78 interactions from Immune 1 to CAFs 30 interactions from Immune 1 to melanoma 11 interactions from Immune 1 to endothelial 67 interactions from CAFs to Immune 1 85 interactions from CAFs to Immune 2 86 interactions from CAFs to melanoma 78 interactions from CAFs to endothelial 84 interactions from Immune 2 to CAFs 55 interactions from Immune 2 to melanoma 44 interactions from Immune 2 to endothelial 12 interactions from melanoma to Immune 1 33 interactions from melanoma to CAFs 19 interactions from melanoma to Immune 2 12 interactions from melanoma to endothelial 53 interactions from endothelial to Immune 1 33 interactions from endothelial to CAFs 69 interactions from endothelial to Immune 2 28 interactions from endothelial to melanoma ``` *Remark: the names of the dge tables in the* cluster_analysis *folder must be changed according to the cluster names (c.names).* And now visualize! ```r > visualize_interactions(signal, show.in = c(12,18)) ``` ![][id16] [id16]: ./Rplot16.png ![][id13] [id13]: ./Rplot13.png ![][id12] [id12]: ./Rplot12.png *Remark: We observe that in the chord diagrams above, the "specific" interactions were highlighted with a thick black line.* Let us look at one of these specific interactions using the **expression_plot_2()** function. ```r > expression_plot_2(data,"NID1","COL13A1",clust$`t-SNE`) ``` ![][id17] [id17]: ./Rplot17.png red red red Thank you for reading this guide and for using **SingleCellSignalR**. ---- ## *References* 1. Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods. 2017;14:414-6. 2. McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012;40:4288-97. 3. 8k PBMCs from a Healthy Donor [Internet]. 2017. Available from: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k 4. Tirosh I, Izar B, Prakadan SM, Wadsworth MH, Treacy D, Trombetta JJ, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352:189-96.