---
title: "scDotPlot"
author: 
  - name: "Ben Laufer"
output: 
  BiocStyle::html_document:
    toc_float: TRUE
date: "`r doc_date()`"
package: "`r pkg_ver('scDotPlot')`"
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{scDotPlot}
  %\VignetteEncoding{UTF-8}  
---

```{r Setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE,
                      message = FALSE,
                      warning = FALSE,
                      crop = NULL)
```

# Introduction

Dot plots of single-cell RNA-seq data allow for an examination of the relationships between cell groupings (e.g. clusters) and marker gene expression. The scDotPlot package offers a unified approach to perform a hierarchical clustering analysis and add annotations to the columns and/or rows of a scRNA-seq dot plot. It works with SingleCellExperiment and Seurat objects as well as data frames. The `scDotPlot()` function uses data from `scater::plotDots()` or `Seurat::DotPlot()` along with the `aplot` package to add dendrograms from `ggtree` and optional annotations.

# Installation

```{r Install, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("scDotPlot")
```

To install the development version directly from GitHub:

```{r Install GitHub, eval = FALSE}
if (!requireNamespace("remotes", quietly = TRUE)) {
    install.packages("remotes")
}

remotes::install_github("ben-laufer/scDotPlot")
```

# SingleCellExperiment

## Prepare object

First, we normalize the object and then, for the purpose of this example, subset to remove cells without cell-type labels.

```{r Prepare SingleCellExperiment}
library(scRNAseq)
library(scuttle)

sce <- ZeiselBrainData()

sce <- sce |> 
    logNormCounts() |>  
    subset(x = _, , level2class != "(none)")
```

## Get features

The features argument accepts a character vector with the gene IDs. For this example, we quickly obtain the top markers of for each cell type and then add them to the rowData of the object.

```{r Get features SingleCellExperiment}
library(scran)
library(purrr)
library(dplyr)
library(AnnotationDbi)

features <- sce |>
    scoreMarkers(sce$level1class) |>
    map(~ .x |>
            as.data.frame() |>
            arrange(desc(mean.AUC))|>
            dplyr::slice(1:6) |>
            rownames()) |> 
    unlist2()

rowData(sce)$Marker <- features[match(rownames(sce), features)] |>
    names()
```

## Plot logcounts

Finally, we create the plot. The `group` arguments utilize the colData, while the `features` arguments use the rowData. The `paletteList` argument can be used to manually specify the colors for the annotations specified through `groupAnno` and `featureAnno`. The clustering of the columns shows that cell the cell sub-types cluster by cell-type, while the clustering of the rows shows that most of the markers clusters by their cell type.

```{r scePlot1, fig.cap = "scDotPlot of SingleCellExperiment logcounts", fig.width=12, fig.height=12, dpi=50}
library(scDotPlot)
library(ggsci)

sce |>
    scDotPlot(features = features,
              group = "level2class",
              groupAnno = "level1class",
              featureAnno = "Marker",
              groupLegends = FALSE,
              annoColors = list("level1class" = pal_d3()(7),
                                "Marker" = pal_d3()(7)),
              annoWidth = 0.02)
```

## Plot Z-scores

Plotting by Z-score through `scale = TRUE` improves the clustering result for the rows.

```{r scePlot2, fig.cap = "scDotPlot of SingleCellExperiment Z-scores", fig.width=12, fig.height=12, dpi=50}
sce |>
    scDotPlot(scale = TRUE,
              features = features,
              group = "level2class",
              groupAnno = "level1class",
              featureAnno = "Marker",
              groupLegends = FALSE,
              annoColors = list("level1class" = pal_d3()(7),
                                "Marker" = pal_d3()(7)),
              annoWidth = 0.02)
```

# Seurat

## Get features

After loading the example Seurat object, we find the top markers for each cluster and add them to the assay of interest.

```{r Get features Seurat}
library(Seurat)
library(SeuratObject)
library(tibble)

data("pbmc_small")

features <- pbmc_small |>
    FindAllMarkers(only.pos = TRUE, verbose = FALSE) |>
    group_by(cluster) |>
    dplyr::slice(1:6) |>
    dplyr::select(cluster, gene)

pbmc_small[[DefaultAssay(pbmc_small)]][[]] <- pbmc_small[[DefaultAssay(pbmc_small)]][[]] |>
    rownames_to_column("gene") |> 
    left_join(features, by = "gene") |> 
    column_to_rownames("gene")

features <- features |> 
    deframe()
```

## Plot logcounts

Plotting a Seurat object is similar to plotting a SingleCellExperiment object. 

```{r SeuratPlot1, fig.cap = "scDotPlot of Seurat logcounts", fig.width=4, fig.height=5, out.width="75%", out.height="75%", dpi=50}
pbmc_small |>
    scDotPlot(features = features,
              group = "RNA_snn_res.1",
              groupAnno = "RNA_snn_res.1",
              featureAnno = "cluster",
              annoColors = list("RNA_snn_res.1" = pal_d3()(7),
                                "cluster" = pal_d3()(7)),
              groupLegends = FALSE,
              annoWidth = 0.075)
```

## Plot Z-scores

Again, we see that plotting by Z-score improves the clustering result for the rows.

```{r SeuratPlot2, fig.cap = "scDotPlot of Seurat Z-scores", fig.width=4, fig.height=5, out.width="75%", out.height="75%", dpi=50}
pbmc_small |>
    scDotPlot(scale = TRUE,
              features = features,
              group = "RNA_snn_res.1",
              groupAnno = "RNA_snn_res.1",
              featureAnno = "cluster",
              annoColors = list("RNA_snn_res.1" = pal_d3()(7),
                                "cluster" = pal_d3()(7)),
              groupLegends = FALSE,
              annoWidth = 0.075)
```

# Package support

The [Bioconductor support site](https://support.bioconductor.org/) is the preferred method to ask for help. Before posting, it's recommended to check [previous posts](https://support.bioconductor.org/tag/scDotPlot/) for the answer and look over the [posting guide](http://www.bioconductor.org/help/support/posting-guide/). For the post, it's important to use the `scDotPlot` tag and provide both a minimal reproducible example and session information.

# Acknowledgement 

This package was inspired by the [single-cell example from aplot](https://yulab-smu.top/pkgdocs/aplot.html#a-single-cell-example).

# Session info

```{r Session info, echo=FALSE}
sessionInfo()
```