---
title: "SEtools"
author:
- name: Pierre-Luc Germain
  affiliation:
  - D-HEST Institute for Neurosciences, ETH Zürich
  - Laboratory of Statistical Bioinformatics, University Zürich
package: SEtools
output:
  BiocStyle::html_document:
        fig_height: 3.5
abstract: |
  Showcases the use of SEtools to merge objects of the SummarizedExperiment class, melt them, and plot annotated heatmaps from them.
vignette: |
  %\VignetteIndexEntry{SEtools}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE}
library(BiocStyle)
```

# Getting started

The `r Rpackage("SEtools")` package is a set of convenience functions for the _Bioconductor_ class `r Biocpkg("SummarizedExperiment")`. It facilitates merging, melting, and plotting `SummarizedExperiment` objects.

## Package installation

```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("SEtools")
```

Or, until the new bioconductor release:
```{r, eval=FALSE}
devtools::install_github("plger/SEtools")
```

## Example data

To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:

```{r}
suppressPackageStartupMessages({
  library(SummarizedExperiment)
  library(SEtools)
})
data("SE", package="SEtools")
SE
```

This is taken from [Floriou-Servou et al., Biol Psychiatry 2018](https://doi.org/10.1016/j.biopsych.2018.02.003).

## Heatmaps

### sehm

The `sehm` function simplifies the generation of heatmaps from `SummarizedExperiment`. It uses `r CRANpkg("pheatmap")`, so any argument supported by it can in principle be passed:

```{r}
g <- c("Egr1", "Nr4a1", "Fos", "Egr2", "Sgk1", "Arc", "Dusp1", "Fosb", "Sik1")
sehm(SE, genes=g)
sehm(SE, genes=g, scale="row")
```

Annotation from the object's `rowData` and `colData` can be plotted simply by specifying the column name (some will be shown by default if found):
```{r}
sehm(SE, genes=g, scale="row", anno_rows="meanTPM")
```

These can also be used to create gaps:
```{r}
sehm(SE, genes=g, scale="row", anno_rows="meanTPM", gaps_at="Condition")
```

The specific assay to use for plotting can be specified with the `assayName` argument.

#### Row/column ordering

By default, rows are sorted not with hierarchical clustering, but from the angle on a MDS plot, which tends to give nicer results than bottom-up hierarchical clustering. This can be disabled using `sortRowsOn=NULL` or `cluster_rows=TRUE` (to avoid any row reordering and use the order given, use `sortRowsOn=NULL, cluster_rows=FALSE`). Column clustering is disabled by default, but this can be changed with `cluster_cols=TRUE`.

It is common to cluster features into groups, and such a clustering can be used simultaneously with row sorting using the `toporder` argument. For instance:

```{r}
lfcs <- assays(SE)$logcpm-rowMeans(assays(SE)$logcpm[,which(SE$Condition=="Homecage")])
rowData(SE)$cluster <- as.character(kmeans(lfcs,4)$cluster)
sehm(SE, scale="row", anno_rows="cluster", toporder="cluster", gaps_at="Condition")
```

### crossHm

Heatmaps from multiple SE can be created either by merging the objects (see below), or using the `crossHm` function, which uses the `r CRANpkg("ComplexHeatmap")` pacakge:

```{r}
crossHm( list(se1=SE, se2=SE), g, 
         anno_colors = list( Condition=c( Homecage="green",
                                          Handling="orange",
                                          Restraint="red",
                                          Swim="blue")
                            )
        )
```

### Default arguments

For some arguments (for instance colors), if they are not specified in the function call, `SEtools` will try to see whether the corresponding global options have been set, before using default colors. This means that if, in the context of a given project, the same colors are repeatedly being used, they can be specified a single time, and all subsequent plots will be affected:

```{r}
options("SEtools_def_hmcols"=c("white","grey","black"))
ancols <- list( Condition=c( Homecage="green",
                             Handling="orange",
                             Restraint="red",
                             Swim="blue" ) )
options("SEtools_def_anno_colors"=ancols)
sehm(SE, g, do.scale = TRUE)
```

At the moment, the following arguments can be set as global options:
`assayName`, `hmcols`, `anno_columns`, `anno_rows`, `anno_colors`, `gaps_at`, `breaks`.
Options must be set with the prefix `SEtools_def_`, followed by the name of the argument.
You may use `resetAllSEtoolsOptions()` to remove all global options relative to the package.

***

## Merging SEs

```{r}
se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3
```

All assays were merged, along with rowData and colData slots.

By default, row z-scores are calculated for each object when merging. This can be prevented with:
```{r}
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)
```

If more than one assay is present, one can specify a different scaling behavior for each assay:
```{r}
se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))
```


## Melting SE

To facilitate plotting features with `r CRANpkg("ggplot2")`, the `meltSE` function combines assay values along with row/column data:

```{r, fig.cap="An example ggplot created from a melted SE.", fig.height=5}
d <- meltSE(SE, genes=g[1:4])
head(d)
suppressPackageStartupMessages(library(ggplot2))
ggplot(d, aes(Condition, counts)) + geom_violin() + facet_wrap(~feature, scale="free")
```


# Session info {.unnumbered}

```{r sessionInfo, echo=FALSE}
sessionInfo()
```