--- title: "SEtools" author: - name: Pierre-Luc Germain affiliation: - D-HEST Institute for Neurosciences, ETH Zürich - Laboratory of Statistical Bioinformatics, University Zürich package: SEtools output: BiocStyle::html_document: fig_height: 3.5 abstract: | Showcases the use of SEtools to merge objects of the SummarizedExperiment class, melt them, and plot annotated heatmaps from them. vignette: | %\VignetteIndexEntry{SEtools} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} library(BiocStyle) ``` # Getting started The `r Rpackage("SEtools")` package is a set of convenience functions for the _Bioconductor_ class `r Biocpkg("SummarizedExperiment")`. It facilitates merging, melting, and plotting `SummarizedExperiment` objects. ## Package installation ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SEtools") ``` Or, until the new bioconductor release: ```{r, eval=FALSE} devtools::install_github("plger/SEtools") ``` ## Example data To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors: ```{r} suppressPackageStartupMessages({ library(SummarizedExperiment) library(SEtools) }) data("SE", package="SEtools") SE ``` This is taken from [Floriou-Servou et al., Biol Psychiatry 2018](https://doi.org/10.1016/j.biopsych.2018.02.003). ## Heatmaps ### sehm The `sehm` function simplifies the generation of heatmaps from `SummarizedExperiment`. It uses `r CRANpkg("pheatmap")`, so any argument supported by it can in principle be passed: ```{r} g <- c("Egr1", "Nr4a1", "Fos", "Egr2", "Sgk1", "Arc", "Dusp1", "Fosb", "Sik1") sehm(SE, genes=g) sehm(SE, genes=g, scale="row") ``` Annotation from the object's `rowData` and `colData` can be plotted simply by specifying the column name (some will be shown by default if found): ```{r} sehm(SE, genes=g, scale="row", anno_rows="meanTPM") ``` These can also be used to create gaps: ```{r} sehm(SE, genes=g, scale="row", anno_rows="meanTPM", gaps_at="Condition") ``` The specific assay to use for plotting can be specified with the `assayName` argument. #### Row/column ordering By default, rows are sorted not with hierarchical clustering, but from the angle on a MDS plot, which tends to give nicer results than bottom-up hierarchical clustering. This can be disabled using `sortRowsOn=NULL` or `cluster_rows=TRUE` (to avoid any row reordering and use the order given, use `sortRowsOn=NULL, cluster_rows=FALSE`). Column clustering is disabled by default, but this can be changed with `cluster_cols=TRUE`. It is common to cluster features into groups, and such a clustering can be used simultaneously with row sorting using the `toporder` argument. For instance: ```{r} lfcs <- assays(SE)$logcpm-rowMeans(assays(SE)$logcpm[,which(SE$Condition=="Homecage")]) rowData(SE)$cluster <- as.character(kmeans(lfcs,4)$cluster) sehm(SE, scale="row", anno_rows="cluster", toporder="cluster", gaps_at="Condition") ``` ### crossHm Heatmaps from multiple SE can be created either by merging the objects (see below), or using the `crossHm` function, which uses the `r CRANpkg("ComplexHeatmap")` pacakge: ```{r} crossHm( list(se1=SE, se2=SE), g, anno_colors = list( Condition=c( Homecage="green", Handling="orange", Restraint="red", Swim="blue") ) ) ``` ### Default arguments For some arguments (for instance colors), if they are not specified in the function call, `SEtools` will try to see whether the corresponding global options have been set, before using default colors. This means that if, in the context of a given project, the same colors are repeatedly being used, they can be specified a single time, and all subsequent plots will be affected: ```{r} options("SEtools_def_hmcols"=c("white","grey","black")) ancols <- list( Condition=c( Homecage="green", Handling="orange", Restraint="red", Swim="blue" ) ) options("SEtools_def_anno_colors"=ancols) sehm(SE, g, do.scale = TRUE) ``` At the moment, the following arguments can be set as global options: `assayName`, `hmcols`, `anno_columns`, `anno_rows`, `anno_colors`, `gaps_at`, `breaks`. Options must be set with the prefix `SEtools_def_`, followed by the name of the argument. You may use `resetAllSEtoolsOptions()` to remove all global options relative to the package. *** ## Merging SEs ```{r} se1 <- SE[,1:10] se2 <- SE[,11:20] se3 <- mergeSEs( list(se1=se1, se2=se2) ) se3 ``` All assays were merged, along with rowData and colData slots. By default, row z-scores are calculated for each object when merging. This can be prevented with: ```{r} se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE) ``` If more than one assay is present, one can specify a different scaling behavior for each assay: ```{r} se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE)) ``` ## Melting SE To facilitate plotting features with `r CRANpkg("ggplot2")`, the `meltSE` function combines assay values along with row/column data: ```{r, fig.cap="An example ggplot created from a melted SE.", fig.height=5} d <- meltSE(SE, genes=g[1:4]) head(d) suppressPackageStartupMessages(library(ggplot2)) ggplot(d, aes(Condition, counts)) + geom_violin() + facet_wrap(~feature, scale="free") ``` # Session info {.unnumbered} ```{r sessionInfo, echo=FALSE} sessionInfo() ```