---
title: >
  The `mosdef` User's Guide
author:
- name: Leon Dammer
  affiliation: 
  - Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz
  email: dammerle@uni-mainz.de
- name: Federico Marini
  affiliation: 
  - Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz
  - Research Center for Immunotherapy (FZI), Mainz
  email: marinif@uni-mainz.de
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('mosdef')`"
output: 
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{The mosdef User's Guide}
  %\VignetteEncoding{UTF-8}  
  %\VignettePackage{mosdef}
  %\VignetteKeywords{GeneExpression, RNASeq, FunctionalAnnotation, Sequencing, Visualization}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: mosdef.bib
---
  
```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.height = 6,
  fig.width = 8
)
```

# Introduction {#introduction}

This vignette describes how to use the `r BiocStyle::Biocpkg("mosdef")` package for performing tasks commonly associated to your Differential Expression workflows.

This includes functionality for

* plotting your expression values and differential expression results, both individually and as summary overviews (`gene_plot`, `de_volcano`, `go_volcano`, `plot_ma`, `get_expr_values`)

* running different methods for functional enrichment analysis, providing a unified API that simplifies the calls to the individual methods, ensuring the results are also provided in a standardized format (`run_cluPro`, `run_topGO`, `run_goseq`)

* decorating and improving your analysis reports (assuming these are generated as Rmarkdown documents). 
  This can enhance the experience of browsing the results by automatical linking to external databases (ENSEMBL, GTEX, HPA, NCBI, ... via `create_link_` functions, wrapped into a `buttonifier` to seamlessly multiply the information in a simple tabular representation). 
  Additional info on frequently used features such as genes and Gene Ontology terms can also be embedded with `geneinfo_to_html` and `go_to_html`

The `r BiocStyle::Biocpkg("mosdef")` package as a whole aims to collect the MOSt frequently used DE-related Functions, and is open to further contributions from the community.

All in all, the objective for `r BiocStyle::Biocpkg("mosdef")` is to streamline the generation of comprehensive, information-rich analysis reports.

Historically, many of these functions (at least conceptually) have been developed in some of our other Bioconductor packages, such as `r BiocStyle::Biocpkg("pcaExplorer")`, `r BiocStyle::Biocpkg("ideal")` and `r BiocStyle::Biocpkg("GeneTonic")`. 
`r BiocStyle::Biocpkg("mosdef")` is the attempt to achieve a better modularization for the most common tasks in the DE workflow.

## Required input

In order to use `r BiocStyle::Biocpkg("mosdef")` in your workflow, two main inputs are required: 

- `de_container`, a dataset containing the expression matrix for example a `DESeqDataSet`
- `res_de`, a dataset, such as a `DESeqResults` storing the results of the differential expression analysis

Additionally, the `mapping` parameters, shared by a number of functions, refers to the annotation of your species provided by `r BiocStyle::Biocpkg("AnnotationDbi")`-like packages, which are commonplace in the Bioconductor environment. 
For human, this would be `r BiocStyle::Biocannopkg("org.Hs.eg.db")`, and for mouse `r BiocStyle::Biocannopkg("org.Mm.eg.db")`.

Currently, `r BiocStyle::Biocpkg("mosdef")` is able to feed on the classes used throughout the `r BiocStyle::Biocpkg("DESeq2")` approach, but can easily be extended for the corresponding implementations in `r BiocStyle::Biocpkg("edgeR")` and `r BiocStyle::Biocpkg("limma")`.

## Demonstrating `mosdef` on the `macrophage data`

In the remainder of this vignette, we will illustrate the main features of `r BiocStyle::Biocpkg("mosdef")` on a publicly available dataset from Alasoo et al.,  "Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response", published in Nature Genetics, January 2018 [@Alasoo2018]
[doi:10.1038/s41588-018-0046-7](https://doi.org/10.1038/s41588-018-0046-7).

The data is made available via the `r BiocStyle::Biocpkg("macrophage")` Bioconductor package, which contains the files output from the Salmon quantification (version 0.12.0, with Gencode v29 reference), as well as the values summarized at the gene level, which we will use to exemplify the analysis steps.

In the `macrophage` experimental setting, the samples are available from 6 different donors, in 4 different conditions (naive, treated with Interferon gamma, with SL1344, or with a combination of Interferon gamma and SL1344).
For simplicity, we will restrict our attention on the comparison between Interferon gamma treated samples vs naive samples.

# Installation {#installation}

To install `r BiocStyle::Biocpkg("mosdef")`, you can run the following commands:

```{r install, eval = FALSE}
if (!require("BiocManager")) {
  install.packages("BiocManager")
}
BiocManager::install("mosdef")
```

If you want to install the development version from GitHub, you can alternatively
run this command:

```{r installgh, eval = FALSE}
BiocManager::install("imbeimainz/mosdef")
```

# Getting started {#gettingstarted}

Once installed, the `r BiocStyle::Biocpkg("mosdef")` package can be loaded and attached to your current workspace as follows:

```{r loadlib, eval = TRUE}
library("mosdef")
```

# Load the data

We will use the well known `macrophage` data as an example.  
Notably, we correctly specify the experimental design as `~ line + condition`, to obtain the effect due to the `condition`, while accounting for the cell `line`.

If you are familiar with the DE workflow, you can skim over this section and read more on the functionality of `r BiocStyle::Biocpkg("mosdef")` in Section \@ref(mosdefenrich).

```{r loaddata, warning=FALSE}
suppressPackageStartupMessages({
  library("macrophage")
  library("DESeq2")
  library("org.Hs.eg.db")
})
data(gse, package = "macrophage")
dds_macrophage <- DESeqDataSet(gse, design = ~ line + condition)
rownames(dds_macrophage) <- substr(rownames(dds_macrophage), 1, 15)
```

We perform some filtering on the features to be kept, and define the set of differentially expressed genes contrasting the `IFNg` and the `naive` samples.

Notably, we correctly specify the `lfcThreshold` parameter instead of a post-hoc approach to filter the DE table based on the log2 fold change values - see https://support.bioconductor.org/p/101504/ for an excellent explanation on why to prefer the more rigorous (yet, likely conservative) method defined in the chunk below.

```{r processdata}
keep <- rowSums(counts(dds_macrophage) >= 10) >= 6
dds_macrophage <- dds_macrophage[keep, ]

dds_macrophage <- DESeq(dds_macrophage)
res_macrophage_IFNg_vs_naive <- results(dds_macrophage,
                                        contrast = c("condition", "IFNg", "naive"),
                                        lfcThreshold = 1, 
                                        alpha = 0.05)
```

Please refer to the vignette of the `r BiocStyle::Biocpkg("DESeq2")` or `r BiocStyle::Biocpkg("edgeR")` packages for more complex experimental designs and/or additional options in each workflow.  
The aim for this section was simply to generate exemplary objects to work with and provide to the `r BiocStyle::Biocpkg("mosdef")` functions.

# Generating enrichment results with a unified API {#mosdefenrich}

`r BiocStyle::Biocpkg("mosdef")` allows you to create your enrichment results right from your `DESeqDataset` and `DESeqResults` objects using 3 possible algorithms, widely used:

* `r BiocStyle::Biocpkg("topGO")`
* `r BiocStyle::Biocpkg("goseq")`
* `r BiocStyle::Biocpkg("clusterProfiler")`

For more information on the differences between these algorithms we refer to their individual vignettes and publications.

All of these algorithms require an annotation to function properly, so make sure you have installed and use the correct one for your experimental data. 
The default is `org.Mm.eg.db` (Mus musculus). The `macrophage` data however stems from human, so we need `org.Hs.eg.db`, and we load this package in the following chunk:

```{r orgdbs}
library("AnnotationDbi")
library("org.Hs.eg.db")
```

We also want to add a symbol column for later use - and, in order to add a human readable name for our features of interest:

```{r addsymbols}
res_macrophage_IFNg_vs_naive$symbol <- 
  AnnotationDbi::mapIds(org.Hs.eg.db,
                        keys = row.names(res_macrophage_IFNg_vs_naive),
                        column = "SYMBOL",
                        keytype = "ENSEMBL",
                        multiVals = "first"
)
```

We indeed recommend to use identifiers as row names that are machine-readable and stable over time, such as ENSEMBL or GENCODE.  
To ensure that we are using objects that would work out-of-the-box into `r BiocStyle::Biocpkg("mosdef")`, we provide some utilities to check that in advance - this can relax the need of specifying a number of parameters in the other functions.

```{r checkfunctions}
mosdef_de_container_check(dds_macrophage)
mosdef_res_check(res_macrophage_IFNg_vs_naive)
```


## `mosdef` and `topGO`

`r BiocStyle::Biocpkg("topGO")` is a widely used option to obtain a set of spot-on Gene Ontology terms, removing some of the more generic ones and therefore also reducing the redundancy which is inherent in the GO database [@Ashburner2000].

```{r runtopgo}
library("topGO")
res_enrich_macrophage_topGO <- run_topGO(
  de_container = dds_macrophage,
  res_de = res_macrophage_IFNg_vs_naive,
  ontology = "BP",
  mapping = "org.Hs.eg.db",
  FDR_threshold = 0.05,
  gene_id = "symbol",
  de_type = "up_and_down",
  add_gene_to_terms = TRUE,
  topGO_method2 = "elim",
  min_counts = 20,
  top_de = 700,
  verbose = TRUE
)
```

The `run_topGO` function will return a table with the analysis for all possible GO terms (currently when using the "BP" ontology on the macrophage dataset that is `r nrow(res_enrich_macrophage_topGO)` terms).
Not all of these results are significant, and this list can (should) be further subset/filtered. For example by using a p-value cutoff. 

The key parameters for `run_topGO()` are defined as follows:

* `de_container`: Your `DESeqDataset` object
* `res_de`: Your `DESeqResults` object
* `ontology`: Which gene ontology to analyse, default is "BP"
* `mapping`: The annotation/species
* `gene_id`: Which format the genes are provided. If you provide a `DESeqDataset` and `DESeqResults`, then `r BiocStyle::Biocpkg("mosdef")` does it for you and uses symbols. If you provide vectors please specify a value.
* `FDR_threshold`: The pvalue to use to count a gene as significant. The default is 0.05 but if you want a stricter analysis you could set this to 0.01 for example
* `de_type`: Which genes to analyse. The default is all ("up_and_down")
Other possibilities are only up-/down-regulated ("up"/"down") genes.
* `add_gene_to_terms`: Logical, whether to add a column with all genes annotated to each GO term.
* `topGO_method2`: Character, specifying which of the methods implemented by `topGO` is to be used. The default is elim. For more info look at the documentation of `r BiocStyle::Biocpkg("topGO")`.
* `min_counts`: Minimum number of counts a gene has to have to be considered for the background. 
  The default is 0 and we advise this parameter is only used by expert users that understand the impact of selecting "non-standard" backgrounds. 
* `top_de`: The number of genes to be considered in the enrich analysis.
  The default is all genes, this can be reduced to reduce redundancy.
  In this case, we take the top 700 highest DE genes based of padj score.
  If this number is bigger than the total amount of de genes the parameter defaults back to all genes.
* `verbose`: Whether or not to summarise the analysis in a message.

```{r resultstopgo}
head(res_enrich_macrophage_topGO)
```

This will return a table with `r ncol(res_enrich_macrophage_topGO)` columns. They contain:

* `GO.ID`: The computer readable GO term ID.
* `Term`: The human readable GO term name.
* `Annotated`: The overall number of detected genes associated with this term (as defined in the `de_container` or in the `bg_genes` parameters).
* `Significant`: The number of DE genes associated with this term (as defined in the `res_de` or in the `de_genes` parameters)
* `Expected`: The expected number of genes associated with this term.
* `Rank in p.value_classic`: specifies the rank of each term, if using the ascending order of `p.value_classic` to sort the entries.
* `p.value_elim`: p-value for the enrichment test of each term (recommended for sorting the table), based on the `elim` algorithm in `topGO`.
* `p.value_classic`: p-value for the enrichment, using the "classical" Fisher test.
* `genes`: Specifies which DE genes are associated with that term (separated by a comma). This column will later on be used to label the genes in the `go_volcano()` function (see below in Section \@ref(volcanoplots))


## `mosdef` and `goseq`

The method implemented in the `r BiocStyle::Biocpkg("goseq")` package is able to handle the selection bias inherent in assays such as RNA-seq, whereas highly expressed genes have a higher probability of being detected as differentially expressed.

Parameters like `top_de`, `min_counts`, `verbose`, `FDR_threshold` and `de_type` can also be used here (for more detail on these parameters see above), as they are a part of the shared API with `run_topGO` and `run_cluPro`.

Importantly, the feature length retrieval is based on the `goseq()` function, and this requires that the corresponding TxDb packages are installed and available. So make sure one is installed on your machine. For human samples, the recommended one is `r BiocStyle::Biocannopkg("TxDb.Hsapiens.UCSC.hg38.knownGene")`.


```{r rungoseq, eval = TRUE}
goseq_macrophage <- run_goseq(
  de_container = dds_macrophage,
  res_de = res_macrophage_IFNg_vs_naive,
  mapping = "org.Hs.eg.db",
  testCats = "GO:BP" # which categories to test of ("GO:BP, "GO:MF", "GO:CC")
)
head(goseq_macrophage)
```

The function will return a table with `r ncol(goseq_macrophage)` columns

* `category`: The computer readable GO term ID.
* `over_represented_pvalue`: p-value for the overrepresentation test for enrichment (as specified by the `goseq` algorithm)
* `under_represented_pvalue`: p-value for the underrepresentation test for enrichment
* `numDEInCat`: The number of DE genes associated with this term (as defined in the `res_de` or in the `de_genes` parameters) (similar to the `Annotated` column of the output from `run_topGO()`)
* `numInCat` The overall number of detected genes associated with this term (as defined in the `de_container` or in the `bg_genes` parameters).
* `term`: The human readable GO term name.
* `ontology`: The ontology used to run the enrichment tests (BP/MF/CC)
* `p.adj`: FDR of your findings adjusted for multiple testing
* `genes`: Lists the DE gene identifiers associated with that term (separated by a comma)
* `genesymbols`: Lists the DE gene symbols associated with that term (separated by a comma)

## `mosdef` and `clusterProfiler`

Parameters like `top_de`, `min_counts`, `verbose`, `FDR_threshold` and `de_type` 
can also be used here (For more detail on these parameters see above).
If you want to further customize the call of `enrichGO()` inside the function, have a look at the documentation for `enrichGO()` from `r BiocStyle::Biocpkg("clusterProfiler")` Those parameters can be added to the `run_cluPro()` function call within the ellipsis (`...`).
For example, as we are doing in the chunk that follows, we set Biological Process as the ontology to be used, by specifying `ont = "BP"`

```{r runclupro, eval = FALSE}
clupro_macrophage <- run_cluPro(
  de_container = dds_macrophage,
  res_de = res_macrophage_IFNg_vs_naive,
  mapping = "org.Hs.eg.db",
  keyType = "SYMBOL",
  ont = "BP"
)
head(clupro_macrophage)
```

Importantly, `keyType` is relevant for the `enrichGO()` function that is wrapped in this routine.
If using `DESeqDataset` and `DESeqResults`, this has to be "SYMBOL" which is also the default. 
If you use vectors please specify here what type of IDs you provide.

Again, to save time when rendering the vignette, we load the objects provided alongside this package to demonstrate the output (see also [this script included in the package](https://github.com/imbeimainz/mosdef/blob/devel/inst/scripts/create_mosdef_data.R) to inspect the code used to generate the objects):

```{r resultsclupro}
data(res_enrich_macrophage_cluPro, package = "mosdef")
```

The function will return a large enrich result containing some metadata and the enrichment results with `r ncol(as.data.frame(res_enrich_macrophage_cluPro))` columns.

```{r cluproview}
head(res_enrich_macrophage_cluPro)
```

The definitions of these columns are included in the extensive `r BiocStyle::Biocpkg("clusterProfiler")` package documentation, please refer to that for more details.

## Alternative ways to run enrichment analyses, within `mosdef`

All of these functions tailored to run enrichment methods also work if you only have/provide a vector of differentially expressed genes and a vector of background genes. 
Most of the above mentioned parameters work here as well (`top_de`, `verbose`), however parameters like `min_counts` and `de_type` will not affect the result, since they need further information which can only be found in the `DESeqDataset` and `DESeqResults` (in this case, access to the counts from the `DESeqDataset` object `de_container` and the Log2FoldChange from the `DESeqResults` object passed to `res_de`).

```{r topgoalt}
res_subset <- deresult_to_df(res_macrophage_IFNg_vs_naive)[1:100, ] 
myde <- res_subset$id
myassayed <- rownames(res_macrophage_IFNg_vs_naive)
## Here keys are Ensembl not symbols
res_enrich_macrophage_topGO_vec <- run_topGO(
  de_genes = myde,
  bg_genes = myassayed,
  mapping = "org.Hs.eg.db",
  gene_id = "ensembl"
)
head(res_enrich_macrophage_topGO_vec)
```


# Plotting expression values in the context of DE

`r BiocStyle::Biocpkg("mosdef")` provides some wrappers to commonly used visualizations of individual genes, as well as summary visualizations for all features at once.

## Individual genes - `gene_plot()`

An elegant way to plot the expression values (by default the normalized counts) of a certain gene of interest, split up by a covariate of interest - for example, the `condition`, IFNg vs naive.

```{r geneplot}
gene_plot(
  de_container = dds_macrophage,
  gene = "ENSG00000125347",
  intgroup = "condition"
)
```

Key parameters are in this case:

* `de_container`: Your `DESeqDataset` 
* `gene`: The gene of interest
* `intgroup`: A character vector of names in `colData(de_container)` used for grouping the expression values. 

Notably, `gene_plot()` also has some heuristics to suggest an appropriate layer of plotting the data points, depending on the number of samples included in each individual group - this include the simple jittered points, a boxplot, a violin plot, or a sina plot. 
This automatic behavior can be suppressed by specifying a different value for the `plot_type` parameter.

## All genes at once - Volcano plots {#volcanoplots}

Volcano plots are one of the most well known and used plots to display differentially expressed genes. 
These functions return a basic `ggplot` object including the most important parts when creating a volcano plot. 

```{r volcanoplot}
volcPlot <- de_volcano(
  res_de = res_macrophage_IFNg_vs_naive,
  mapping = "org.Hs.eg.db",
  logfc_cutoff = 1,
  FDR = 0.05, 
  labeled_genes = 25
)
volcPlot
```

Again, an overview on the key parameters:

* `res_de`: Your `DESeqResults`
* `mapping`: The annotation/species: Important to generate symbols for labeling.
* `labeled_genes`: The number of the top DE genes to be labeled. Default is 30.
* `logfc_cutoff` and `FDR`: Where to draw the lines in the plot and which genes to mark as significant. The default is 1 (meaning L2FC +/-1): So genes with a FoldChange higher than 1 or lower -1 and a padj value below 0.05.  

This plot can be later modified by the user, like any regular `ggplot` object. For example:

```{r volcanocustom}
library("ggplot2")

volcPlot +
  ggtitle("macrophage Volcano") +
  ylab("-log10 PValue") +
  xlab("Log 2 FoldChange (Cutoff 1/-1)")
```

For further possibilities please look at the `r BiocStyle::CRANpkg("ggplot2")` documentation.

In addition to only focusing on differentially expressed genes, in a volcano plot the user can also highlight genes associated with a certain GO term of interest.
This can be done with the `go_volcano()` function:

```{r volcanogo}
Volc_GO <- go_volcano(
  res_de = res_macrophage_IFNg_vs_naive,
  res_enrich = res_enrich_macrophage_topGO,
  term_index = 1,
  logfc_cutoff = 1,
  FDR = 0.05,
  mapping = "org.Hs.eg.db",
  n_overlaps = 50,
  col_to_use = "symbol",
  enrich_col = "genes",
  down_col = "black",
  up_col = "black",
  highlight_col = "tomato"
)

Volc_GO
```

The key parameters:

* `res_de`: Your `DESeqResults`
* `res_enrich`: Your enrichment results
* `term_index`: The index(row) where your term of interest is located in your enrichment result.
* `logfc_cutoff` and `FDR`: Where to draw the lines in the plot and which genes to mark as significant. The default is one (meaning L2FC +/-1): So genes with a FoldChange higher than 1 or lower -1 and a padj value below 0.05. 
* `mapping`: The annotation/species: Important to generate symbols for labeling.
* `n_overlaps`: The number of overlaps `ggrepel` is supposed to allow for labeling (increases  number of labeled genes).
* `col_to_use`: Name of the column in your res_de containing the gene symbols.
* `enrich_col`: Name of the column in your res_enrich containing the gene symbols. For an example see `run_topGO` data provided in mosdef: `data(res_enrich_macrophage_topGO, package = "mosdef")`.
* `down_col`: Colour for your downregulated genes (genes with a `logfc_cutoff` below the value specified).
* `up_col`: Colour for your upregulated genes (genes with a `logfc_cutoff` above the value specified).
* `highlight_col`: Colour for your genes associated with the given term of interest.


## All genes at once - MA plots

An alternative to the volcano plot, less focused on the individual significance values and more focused on the combination of mean expression and changes in expression levels, is the MA plot. It can be considered an extension of the Bland-Altman plot for genomics data.
This grants an overview of the differentially expressed genes across the different levels of expression.

`plot_ma()` also allows you to set x and y labels right away, but we provide some default values.
However, similar to `de_volcano()`, these can also be set later on by directly modifying the returned `ggplot` object.

```{r maplot}
maplot <- plot_ma(
  res_de = res_macrophage_IFNg_vs_naive,
  FDR = 0.05,
  hlines = 1
)
# For further parameters please check the function documentation
maplot
```

All key parameters at a glance:

* `res_de`: Your `DESeqResults` object
* `FDR`: Which padj cutoff value to set for genes to be counted as DE (default < 0.05)
* `hlines`: whether or not (and where) to draw the horizontal line (optional)

Further control on the aspect of the output plot is enabled via the other possible parameters; please refer to the documentation of the `plot_ma()` function itself.

If desired, `plot_ma()` further allows you to highlight certain genes of interest to you, if providing them via the `intgenes` parameter. 

```{r maplotannotated}
maplot_genes <- plot_ma(
  res_de = res_macrophage_IFNg_vs_naive,
  FDR = 0.1,
  add_rug = FALSE,
  intgenes = c(
    "SLC7A2",
    "CFLAR",
    "PDK4",
    "IFNGR1"
  ), # suggested genes of interest
  hlines = 1,
  intgenes_color = "darkblue"
)
maplot_genes
```

# Beautifying and enhancing analysis reports

Analysis reports, often generated via Rmarkdown, are a great way of handing over data, results, and output to collaborators, colleagues, PIs, ... 

`r BiocStyle::Biocpkg("mosdef")` provides a set of functions aiming to enhance the report quality, e.g. by turning normal tables into interactive tables, linking to a number of additional external databases - thus simplifying the search & exploration steps which naturally follow the inspection of a DE table.

## More information on features/genes

The life of a bioinformatician, but also the life of a biologist and a medical scientist, often contains a fair amount of searches into external databases, in order to obtain additional information on the shortlisted features.

Simplifying the time to reach these resources and embedding them into one info-rich analysis report is `r BiocStyle::Biocpkg("mosdef")`'s proposal to streamline this as a whole. 

All of these (except ENSEMBL, using their internal identifier system) require gene symbols as the input.
Currently, `r BiocStyle::Biocpkg("mosdef")` has functions to create automated links to:

* ENSEMBL (https://www.ensembl.org/index.html)
* GeneCards (https://www.genecards.org/): For information/overview on the gene
* Pubmed (https://pubmed.ncbi.nlm.nih.gov/): For gene/GOterm related publications
* NCBI (https://www.ncbi.nlm.nih.gov/): For overview and chromosomal information on the gene
* dbPTM (https://awi.cuhk.edu.cn/dbPTM/): For post-translational modifications
* GTEX (https://www.gtexportal.org/home/): For data on expression of the gene in different tissues
* UniProt ("https://www.uniprot.org/"): For information on the protein related to the gene
* Human Protein Atlas ("https://www.proteinatlas.org/"): For information on the protein related to the gene for humans specifically

You can access all of these easily by using one function that uses a `data.frame` as input:

```{r runbuttonifier}
# creating a smaller subset for visualization purposes and to keep the main res_de
res_subset <- deresult_to_df(res_macrophage_IFNg_vs_naive, FDR = 0.05)[1:100, ]

buttonifier(
  df = res_subset,
  col_to_use = "symbol",
  create_buttons_to = c("GC", "NCBI", "GTEX", "UNIPROT", "dbPTM", "HPA", "PUBMED"),
  ens_col = "id",
  ens_species = "Homo_sapiens"
)
```

Again, an overview of the key parameters:

* `df`: A data.frame object containing your data. To get one from your `DESeqResults` data use the function: `deresult_to_df()`.
* `col_to_use`: Column where the gene names are stored, default is "SYMBOL", in this example however the column is named "symbol".
* `create_buttons_to`: All of the supported websites. You can pick however many you want.
* `ens_col`: Where to find the Ensembl IDs in case you want to turn those into buttons too. If not this defaults to NULL and that part is skipped.
* `ens_species`: The species you are working on. Only needed if you want to turn the Ensembl IDs into buttons.


Importantly, the `buttonifier()` function directly returns a `DT::datatable` by default (not a `data.frame`). 
This is to ensure that the `escape = FALSE` parameter of `datatable` is set and not forgotten as the links/buttons will not work otherwise (or at least, will displayed very oddly as "simple text", not interpreted as the code to generate buttons).

Advanced users that want further customization options to their `datatable` can ensure a `data.frame` is returned using the `output_format` parameter (then the `escape = FALSE` must be set by hand):

```{r dtcustomized}
res_subset <- deresult_to_df(res_macrophage_IFNg_vs_naive, FDR = 0.05)[1:100, ]

res_subset <- buttonifier(res_subset,
  col_to_use = "symbol",
  create_buttons_to = c("GC", "NCBI", "HPA"),
  output_format = "DF"
)

DT::datatable(res_subset,
  escape = FALSE,
  rownames = FALSE,
  # other parameters specifically controlling the look of the DT...
  options = list(
    scrollX = TRUE
  )
)
```

As an additional prettifying element, the information on the log2 fold change can be also encoded with small transparent colored bars, representing the underlying effect sizes.
This can be done with the `de_table_painter()` function, displayed in the following chunk:

```{r tablepainter}
de_table_painter(res_subset,
                 rounding_digits = 3,
                 signif_digits = 5)

## This also works directly on the DESeqResults objects:
de_table_painter(res_macrophage_IFNg_vs_naive[1:50, ],
                 rounding_digits = 3,
                 signif_digits = 5)


```


All of the functions included inside the `buttonifier()` function are also available as singular functions in case you are only interested in a subset of them.
As a reminder: all functions, except the one related to the ENSEMBL database, can use/need gene symbols as input, so that the call to build up the table from its individual columns could be specified as in the following chunk:

```{r createlinks}
res_subset <- deresult_to_df(res_macrophage_IFNg_vs_naive, FDR = 0.05)[1:100, ]

row.names(res_subset) <- create_link_ENSEMBL(row.names(res_subset), species = "Homo_sapiens")
res_subset$symbol_GC <-  create_link_GeneCards(res_subset$symbol)
res_subset$symbol_PubMed <- create_link_PubMed(res_subset$symbol)
res_subset$symbol_NCBI <- create_link_NCBI(res_subset$symbol)
res_subset$symbol_dbPTM <- create_link_dbPTM(res_subset$symbol)
res_subset$symbol_GTEX <- create_link_GTEX(res_subset$symbol)
res_subset$symbol_UniProt <- create_link_UniProt(res_subset$symbol)
res_subset$symbol_HPA <- create_link_HPA(res_subset$symbol)

DT::datatable(res_subset, escape = FALSE,
              options = list(
                scrollX = TRUE
              )
)
```

For information on singular genes you can use:

```{r geneinfo}
geneinfo_to_html("IRF1",
  res_de = res_macrophage_IFNg_vs_naive,
  col_to_use = "symbol"
)
```

It can however also be used without a res_de for a general overview.

```{r geneinfocompact}
geneinfo_to_html("ACTB")
```

This can be a practical way to generate some HTML content to be embedded e.g. in other contexts such as dashboards, as it is currently implemented in `r BiocStyle::Biocpkg("pcaExplorer")`, `r BiocStyle::Biocpkg("ideal")` and `r BiocStyle::Biocpkg("GeneTonic")`.

## More information on GO terms

We display an interactive table for a subset of GO terms - in this case, we select the first 100 rows.

```{r golink}
res_enrich_macrophage_topGO$GO.ID <- create_link_GO(res_enrich_macrophage_topGO$GO.ID)

DT::datatable(res_enrich_macrophage_topGO[1:100, ], 
              escape = FALSE,
              options = list(
                scrollX = TRUE
              )
)
```

Setting `escape = FALSE` is important here to ensure the link is turned into a button - since we are dealing with a `datatable` where we need to interpret some content directly as HTML code. 

To get information on a singular GO term of interest you can use:

```{r goinfo}
go_to_html("GO:0001525")
```

This not only creates a link to the AmiGO database, but also extracts some information about the term itself from the `r BiocStyle::Biocpkg("GO.db")` package.

This approach can be extended to link to additional external resources on genesets, such as MSigDB or Reactome.

# Session Info {-}

```{r sessioninfo}
sessionInfo()
```

# References {-}