---
title: "Vignette for Dune: merging clusters to improve replicability through ARI merging"
author: "Hector Roux de BÃ©zieux"
output: 
  rmarkdown::html_document:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Dune Vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Installation

```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("Dune")
```


We use a subset of the Allen Smart-Seq nuclei dataset. Run `?Dune::nuclei` for more details on pre-processing.

```{r setup}
suppressPackageStartupMessages({
  library(RColorBrewer)
  library(dplyr)
  library(ggplot2)
  library(tidyr)
  library(knitr)
  library(purrr)
  library(Dune)
})
data("nuclei", package = "Dune")
theme_set(theme_classic())
```

# Initial visualization

We have a dataset of $1744$ cells, with the results from 3 clustering algorithms: Seurat3, Monocle3 and SC3. The Allen Institute also produce hand-picked cluster and subclass labels. Finally, we included the coordinates from a t-SNE representation, for visualization.


```{r}
ggplot(nuclei, aes(x = x, y = y, col = subclass_label)) +
  geom_point()
```

We can also see how the three clustering algorithm partitioned the dataset initially:

```{r, fig.hold='hold', out.width="33%", fig.height=9}
walk(c("SC3", "Seurat", "Monocle"), function(clus_algo){
  df <- nuclei
  df$clus_algo <- nuclei[, clus_algo]
  p <- ggplot(df, aes(x = x, y = y, col = as.character(clus_algo))) +
    geom_point(size = 1.5) +
    # guides(color = FALSE) +
    labs(title = clus_algo, col = "clusters") +
    theme(legend.position = "bottom")
  print(p)
})
```

# Merging with **Dune**
## Initial ARI

The adjusted Rand Index between the three methods can be computed.

```{r}
plotARIs(nuclei %>% select(SC3, Seurat, Monocle))
```

As we can see, the ARI between the three methods is initially quite low.

## Actual merging

We can now try to merge clusters with the `Dune` function. At each step, the algorithm will print which clustering label is merged (by its number, so `1~SC3` and so on), as well as the pair of clusters that get merged.

```{r}
merger <- Dune(clusMat = nuclei %>% select(SC3, Seurat, Monocle), verbose = TRUE)
```

The output from `Dune` is a list with four components:

```{r}
names(merger)
```

`initialMat` is the initial matrix. of cluster labels. `currentMat` is the final matrix of cluster labels. `merges` is a matrix that recapitulates what has been printed above, while `ImpARI` list the ARI improvement over the merges.

## ARI improvement

We can now see how much the ARI has improved:

```{r}
plotARIs(clusMat = merger$currentMat)
```

The methods now look much more similar, as can be expected.

We can also see how the number of clusters got reduced.

```{r}
plotPrePost(merger)
```

For SC3 for example, we can visualize how the clusters got merged:

```{r}
ConfusionPlot(merger$initialMat[, "SC3"], merger$currentMat[, "SC3"]) +
  labs(x = "Before merging", y = "After merging")
```

Finally, the __ARIImp__ function tracks mean ARI improvement as pairs of clusters get merged down.

```{r}
ARItrend(merger)
```

# Session

```{r}
sessionInfo()
```