--- title: "Vignette for Dune: merging clusters to improve replicability through ARI merging" author: "Hector Roux de BĂ©zieux" output: rmarkdown::html_document: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Dune Vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Installation ```{r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("Dune") ``` We use a subset of the Allen Smart-Seq nuclei dataset. Run `?Dune::nuclei` for more details on pre-processing. ```{r setup} suppressPackageStartupMessages({ library(RColorBrewer) library(dplyr) library(ggplot2) library(tidyr) library(knitr) library(purrr) library(Dune) }) data("nuclei", package = "Dune") theme_set(theme_classic()) ``` # Initial visualization We have a dataset of $1744$ cells, with the results from 3 clustering algorithms: Seurat3, Monocle3 and SC3. The Allen Institute also produce hand-picked cluster and subclass labels. Finally, we included the coordinates from a t-SNE representation, for visualization. ```{r} ggplot(nuclei, aes(x = x, y = y, col = subclass_label)) + geom_point() ``` We can also see how the three clustering algorithm partitioned the dataset initially: ```{r, fig.hold='hold', out.width="33%", fig.height=9} walk(c("SC3", "Seurat", "Monocle"), function(clus_algo){ df <- nuclei df$clus_algo <- nuclei[, clus_algo] p <- ggplot(df, aes(x = x, y = y, col = as.character(clus_algo))) + geom_point(size = 1.5) + # guides(color = FALSE) + labs(title = clus_algo, col = "clusters") + theme(legend.position = "bottom") print(p) }) ``` # Merging with **Dune** ## Initial ARI The adjusted Rand Index between the three methods can be computed. ```{r} plotARIs(nuclei %>% select(SC3, Seurat, Monocle)) ``` As we can see, the ARI between the three methods is initially quite low. ## Actual merging We can now try to merge clusters with the `Dune` function. At each step, the algorithm will print which clustering label is merged (by its number, so `1~SC3` and so on), as well as the pair of clusters that get merged. ```{r} merger <- Dune(clusMat = nuclei %>% select(SC3, Seurat, Monocle), verbose = TRUE) ``` The output from `Dune` is a list with four components: ```{r} names(merger) ``` `initialMat` is the initial matrix. of cluster labels. `currentMat` is the final matrix of cluster labels. `merges` is a matrix that recapitulates what has been printed above, while `ImpARI` list the ARI improvement over the merges. ## ARI improvement We can now see how much the ARI has improved: ```{r} plotARIs(clusMat = merger$currentMat) ``` The methods now look much more similar, as can be expected. We can also see how the number of clusters got reduced. ```{r} plotPrePost(merger) ``` For SC3 for example, we can visualize how the clusters got merged: ```{r} ConfusionPlot(merger$initialMat[, "SC3"], merger$currentMat[, "SC3"]) + labs(x = "Before merging", y = "After merging") ``` Finally, the __ARIImp__ function tracks mean ARI improvement as pairs of clusters get merged down. ```{r} ARItrend(merger) ``` # Session ```{r} sessionInfo() ```