---
title: "Getting Started with `escheR`"
author: 
  - name: Boyi Guo
    affiliation: &id1 "Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA"
  - name: Stephanie C. Hicks
    affiliation: *id1
package: escheR
output: BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{Getting Start with `escheR`}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  message = FALSE,
  comment = "#>",
  # fig.width = 8,
  # fig.height = 8,
  fig.small = TRUE,
  fig.retina = NULL
)
```

# Introduction

The goal of `escheR` is to create an unified multi-dimensional spatial visualizations for spatially-resolved transcriptomics data following Gestalt principles.

Our manuscript describing the innovative visualization is published in [_Bioinformatics Advances_](https://doi.org/10.1093/bioadv/vbad179).


## Installation

You can install the latest release version of `escheR` from Bioconductor via the following code. Additional details are shown on the [Bioconductor](https://bioconductor.org/packages/escheR) page.

```{r install_bioc, eval=FALSE}
if (!require("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("escheR")
```

The latest development version can also be installed from the `devel` version of Bioconductor or from [GitHub](https://github.com/boyiguo1/escheR) following

```{r install_github, eval = FALSE}
if (!require("devtools")) install.packages("devtools")
devtools::install_github("boyiguo1/escheR")

# `devel` version from Bioconductor
BiocManager::install(version='devel')
```

# Input data format
Starting from Version 1.2.0, `escheR` package supports three data structures, including [`SpatialExperiment`](https://bioconductor.org/packages/release/bioc/html/SpatialExperiment.html), [`SingleCellExperiment`](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html), and `data.frame` from `base` R.


In the following example, we demonstrate the how to use `escheR` with a [`SpatialExperiment`](https://bioconductor.org/packages/release/bioc/html/SpatialExperiment.html) object. Please visit our other tutorials for [TODO: add items and list]. 

# Making escheR Plot
## Load Packages

To run the demonstration, there are two necessary packages to load, `escheR` and `STexampleData`. [`STexampleData`](https://bioconductor.org/packages/STexampleData) contains a pre-processed 10x Visium dataset.

To note, `escheR`  will automatically load `ggplot2` package. Hence, explicitly loading `ggplot2` is not required. 
```{r setup}
library(escheR)
library(STexampleData)
```


## Preparing example data

In this step, we will find one 10xVisium sample from [`STexampleData`](https://bioconductor.org/packages/release/data/experiment/html/STexampleData.html) package, indexed by brain number of "151673". For more information, please see the vignettes of [`STexampleData`](https://bioconductor.org/packages/release/data/experiment/vignettes/STexampleData/inst/doc/STexampleData_overview.html).

```{r data.import}
spe <- Visium_humanDLPFC()

# Subset in-tissue spots
spe <- spe[, spe$in_tissue == 1]
spe <- spe[, !is.na(spe$ground_truth)]
```

Here is a summary of the `SpatialExperiment` object called `spe`.
```{r spe_summary}
spe
```

## Set up an `escheR` plot
Similar to `ggplot2::ggplot()`, we first use the function `make_escheR()` to create an empty plot. The input of `make_escheR()` is a [`SpatialExperiment`](https://bioconductor.org/packages/release/bioc/html/SpatialExperiment.html) object. The output of the function is a `ggplot` object with no layer in it. 
```{r create_plot}
p <- make_escheR(spe)
```

## Adding layers

Unlike `ggplot2`, we use piping `|>` instead of `+` to apply layers the figure. Mainly, we have three functions `add_fill`, `add_ground`, `add_symbol`. The inputs of these `add_*` functions include the plots created using `make_scheR()` and the variable name for the layer. Currently, the variable name should be in the the column data of the `spe` object, i.e. `colData(spe)`.

Here we first apply the `add_fill` to add the spots color-coded by the total number of cells all spots(`sum_umi`).  

```{r creat_fill}
(p1 <- p |>
   add_fill(var = "cell_count"))
```

It is okay to use any combination of the `add_*` functions. For example, we want to show the spatial domains of the samples as the ground of the figure and use symbols to denote if each spot is within the outline of the tissue slice. In this example, all plotted spots are in the outlines of the tissue slice and hence marked with dots.
```{r create_ground}
(p2 <- p |>
   add_ground(var = "ground_truth")) # round layer
```


```{r add_symbol}
p2 |>
  add_symbol(var = "ground_truth", size = 0.2) # Symbol layer
```

It is okay to change the ordering of these `add_*` functions. However, we advise to always have the `add_fill` as the first step to achieve the best visual effect due to the laying mechanism. 


# Customize escheR Plot
## Choosing Color Palette for `add_fill` and `add_ground`
To maximize the utility of the multi-dimensional plot by applying both color-coded layers using `add_fill()` and `add_ground()`, it is important to choose minimally interfering color-palette for the `fill` and `ground` to avoid visualization confusion. The following demonstration provide some examples for simultaneously visualization of two variables regardless of their types (continuous vs categorical, or categorical vs categorical.)

### Coninuous variable (gene expression) vs Categorical variable (Spatial Domains)
The following example visualizes the differential gene expression of _MOBP_, a marker gene for white matter, across different spatial domains. The default color palette, `viridis`, are not easily visible with color-coded spatial domains as there are overlapping in the color space, which could lead to possible confusion.

```{r default_con_cat_eg}
# Prep data
# Adding gene counts for MBP to the colData
spe$counts_MOBP <- counts(spe)[which(rowData(spe)$gene_name=="MOBP"),]

(p <- make_escheR(spe) |> 
    add_fill(var = "counts_MOBP") |> 
    add_ground(var = "ground_truth", stroke = 0.5))
```

To improve the visualization, we choose to use a color that is not included in the color palette for `ground_truth`, which is the color _black_. Specifically, we use a color gradient from white (no expression) to black (maximum of gene counts) to represent the expression of MOBP in each spot. By using the white-black color gradient for the gene expression, we minimize the overlapping of the choice of color for spatial domains.


```{r improved_con_cat_eg}
(p2 <- p + 
   scale_fill_gradient(low = "white", high = "black"))
```

After customizing the color palettes to be minimally overlapping, it is easier to observe that _MOBP_ has higher raw gene counts in the white matter (`WM`) region than other regions.

### Categorical variable Vs Categorical variable
In this example, we demonstrate how to optimize color palettes for visualizing two categorical variables. We first create an arbitrary 5-level categorical variable called `tmp_group`, representing different horizontal regions of the tissue section. 

```{r creat_group}
spe$tmp_group <- cut(
  spe$array_row, 
  breaks = c(min(spe$array_row)-1 ,
             fivenum(spe$array_row))
)

table(spe$tmp_group)
```

Following the principle to avoid overlapping of two color palettes, we use gradients of blue for different levels of `tmp_group`. 

```{r cat_cat_eg}
make_escheR(spe) |> 
  add_fill(var = "tmp_group") |> 
  add_ground(var = "ground_truth", stroke = 0.5) +
  scale_fill_brewer() +
  theme_void()
```

Here is another example where we try another manually-curated color-palette. We follow the same principle, minimize overlapping of two color-palettes for ground (`scale_color_manual`) and fill (`scale_fill_brewer`) respectively. Specifically, we use gradients of blue to show `tmp_group` and other colors for spatial domains `ground_truth`.
```{r cat_cat_eg2}
make_escheR(spe) |> 
  add_fill(var = "tmp_group") |> 
  add_ground(var = "ground_truth", stroke = 0.5) +
  scale_fill_brewer() +
  scale_color_manual(
    name = "", # turn off legend name for ground_truth
    values = c(
      "Layer1" = "#F0027F",
      "Layer2" = "transparent",
      "Layer3" = "#4DAF4A",
      "Layer4" = "#984EA3",
      "Layer5" = "#FFD700",
      "Layer6" = "#FF7F00",
      "WM" = "#1A1A1A")
  ) 
```

> NOTE: It would be intrinsically difficult to construct a clear, intuitive and precise visualization when the number of categories/levels is large.


### Guidance in choosing bivariate color palette 

In this vignettes, we don't provide or recommend specific color palettes, because the selection of color palettes is highly relevant to the underlying message and heterogeneous across analysis and studies, e.g. sequential palettes, qualitative palette, and divergent palette. Instead, we direct interested user to explore the topic on _bivariate color palette_. The [blog post](https://jakubnowosad.com/posts/2020-08-25-cbc-bp2/) by Jakub Nowosad and R package [`biscale`](https://cran.r-project.org/web/packages/biscale/vignettes/bivariate_palettes.html) could be helpful to optimize your color palette for bivariate visualization.


In addition, if color palette is extremely to curate, e.g. large number of levels, it is possible to use symbols ([`add_symbol()`](##Adding-layers)) to annotate specific levels to avoid clutter in the color space.


## Adjusting aesthetics
Given that the `escheR` package is developed based on `ggplot2`, aesthetics can be easily adjusted following the `ggplot2` syntax. For example, given a `escheR` plot object, one can use `+` with `theme_*`, `scale_*` functions.

For example, to change the aesthetics of each layer, one can simply use the `scale_*` from `ggplot2` to optimize the final visual presentation. For example, to optimize `add_fill`, one can use `scale_fill_*`; to optimize `add_ground`, one can use `scale_color_*`; to optimize `add_sumbol`, one use `scale_shape_*`. Here, we demonstrate how to change the color for the ground layer ( `add_ground`) using `scale_color_manual`.

```{r customize, eval = FALSE}
(p_final <- p2 +
  scale_color_manual(
    name = "", # No legend name
    values = c(
      "Layer1" = "#F0027F",
      "Layer2" = "#377EB8",
      "Layer3" = "#4DAF4A",
      "Layer4" = "#984EA3",
      "Layer5" = "#FFD700",
      "Layer6" = "#FF7F00",
      "WM" = "#1A1A1A")
  ) +
  labs(title = "Example Title"))
```


## Show a subset of levels
The easiest way to show only a subset of levels of a categorical variable is to
create a new variable where all the unwanted levels will be set to NA values.
Please see the example below

```{r subset}
table(spe$ground_truth, useNA = "ifany")

spe$tmp_fac <- factor(spe$ground_truth,
                      levels = c("Layer1", "Layer2"))
table(spe$tmp_fac, useNA = "ifany")

make_escheR(spe) |> 
  add_ground(var = "ground_truth") |> 
  add_symbol(var = "tmp_fac", size = 0.4) + 
  scale_shape_manual(
    values=c(3, 16),    #> Set different symbols for the 2 levels
    breaks = c("Layer1", "Layer2") #> Remove NA from legend
  )
```


## Multi-sample Figure
By design, `make_escheR` operates on only one sample. In order to create a figure compiling the spatial plots for multiple samples, individual plots are required via a series of calls to `make_escheR`, possibly via a `for` loop or an iterator function (e.g. `lapply`). 

```{r make_plot_list, eval = FALSE}
# Create a list of `escheR` plots
plot_list <- unique(spe$sample_id) |> # Create a list of sample names
  lapply(FUN = function(.sample_id){ # Iterate over all samples
    spe_single <- spe[, spe$sample_id == .sample_id]
    make_escheR(spe_single) |> 
      add_fill(var = "counts_MOBP") |> 
      add_ground(var = "ground_truth", stroke = 0.5))
# Customize theme
  })
```

Given all plots made for individual samples are stored in a preferred data structure 
(e.g. a `list`), one can use many functions, e.g. [`cowplot::plot_grid`](https://wilkelab.org/cowplot/articles/plot_grid.html), [`patchwork`](https://patchwork.data-imaginist.com/index.html), to compile and arrange
individual plots to a paneled figure. The following example uses 
[`ggpubr::ggarrange`](https://rpkgs.datanovia.com/ggpubr/reference/ggarrange.html) to create a figure from a list of `escheR` plots.


```{r paneling, fig.width=8, fig.height=5}
library(ggpubr)
plot_list <- list(p2, p2)
ggarrange(
  plotlist = plot_list,
  ncol = 2, nrow = 1,
  common.legend = TRUE)
```

# Save escheR plot
The procedure to save `escheR` plots is exactly the same as saving a `ggplot` object.
In the example below, we use the function `ggplot2::ggsave()` to save `escheR` plots in the `pdf` format.

```{r save_escheR, eval = FALSE}
ggsave(
  filename = "path/file_name.pdf",
  plot = p_final
)
```


# Session information

```{r}
utils::sessionInfo()
```