---
title: "Visualize Differential Expression results"
author: "David Barrios and Carlos Prieto"
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('Rvisdiff')`"
abstract: >
  `Rvisdiff` is an R/Bioconductor package which generates an interactive
  interface for the interpretation of differential expression results. It
  generates a local Web page which enables the exploration of statistical
  analysis results with the generation of auto-analytical visualizations. The
  package supports as input the output of popular differential expression
  packages such as `r BiocStyle::Biocpkg("DESeq2")`,
  `r BiocStyle::Biocpkg("edgeR")` and `r BiocStyle::Biocpkg("limma")`.
output:
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{Visualize Differential Expression results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---

# Getting started

`Rvisdiff` is an R package distributed as part of the Bioconductor project.
To install the package, start R and enter:

```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("Rvisdiff")
```
The GitHub repository for `Rvisdiff` is https://github.com/BioinfoUSAL/Rvisdiff.
This is the place to file an issue, report a bug, or provide a pull request.

Once `Rvisdiff` is installed, it can be loaded by the following command.
```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE}
library("Rvisdiff")
```

# Introduction

Differential expression analysis generates a big report which needs a manual
inspection for the optimization and interpretation of results. Researchers have
designed visualization techniques to facilitate these tasks but their generation
with code or statistics packages avoids the quick and massive exploration of
results. We have designed `Rvisdiff` to integrate graphs in an easy to use and
interactive web page.The user can explore the differential expression results
and the source expression data in the same view.

As input data the package receives two tables with the differential expression
results and the raw/normalized expression data. It detects the default output of
`r BiocStyle::Biocpkg("DESeq2")`, `r BiocStyle::Biocpkg("edgeR")` and
`r BiocStyle::Biocpkg("limma")` packages and no data conversion is needed. The
user can also generate a custom data frame which integrates a statistical
testing output with a fold change and mean calculation for each variable. 

As output the package generates a local HTML page that can be seen in a Web
browser. It is not necessary the installation of additional software such as
application servers or programming languages. This feature ensures portability
and ease of use. Moreover, results are stored in the local computer, avoiding
any network sharing or data upload to external servers, which ensures the data
privacy.

# Input data

In this example we use as input the `r BiocStyle::Biocpkg("airway")` data
package which contains the read counts in genes for an RNA-Seq experiment on
four human airway smooth muscle cell lines treated with dexamethasone. The code
below shows how to load the package and the data extraction of main data
features that we need for the differential expression analysis and the posterior
visualization with `Rvisdiff`. The `countdata` variable contains a data frame
with the number of sequence counts for each gene (rows) and sample (columns).
The `coldata` variable contains input phenotypes for the differential expression
analysis and its posterior representation.

The following code loads the necessary libraries and formats the input sample
conditions.
```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE}
library(Rvisdiff)
library(airway)
data("airway")
se <- airway
se$dex <- relevel(se$dex, ref="untrt")
countdata <- assay(se)
coldata <- colData(se)
```

# Generating the Report

## Generating Report From `r BiocStyle::Biocpkg("DESeq2")` results

The code below shows how to perform a differential expression analysis with
`r BiocStyle::Biocpkg("DESeq2")` and its representation with `Rvisdiff`.
```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE}
library(DESeq2)
dds <- DESeqDataSet(se, design = ~ cell + dex)
dds <- DESeq(dds)
dres <- results(dds, independentFiltering = FALSE)
DEreport(dres, countdata, coldata$dex)
```

## Generating Report From `r BiocStyle::Biocpkg("edgeR")` results

The code below shows how to perform a differential expression analysis with
`r BiocStyle::Biocpkg("edgeR")` and its representation with `Rvisdiff`.
```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE}
library(edgeR)
design <- model.matrix(~ cell + dex, data = coldata)
dl <- DGEList(counts = countdata, group = coldata$dex)
dl <- calcNormFactors(dl)
dl <- estimateDisp(dl, design=design)
de <- exactTest(dl,pair=1:2)
tt <- topTags(de, n = Inf, adjust.method = "BH", sort.by = "none")
DEreport(tt, countdata, coldata$dex) 
```

## Generating Report From `r BiocStyle::Biocpkg("limma")` results

The code below shows how to perform a differential expression analysis with
`r BiocStyle::Biocpkg("limma")` and its representation with `Rvisdiff`.
```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE}
library(limma)
design <- model.matrix(~ 0 + dex + cell, data = coldata)
contr <- makeContrasts(dextrt - dexuntrt,levels=colnames(design))
limmaexprs <- voom(countdata, design)
fit <- lmFit(limmaexprs, design)
fit <- contrasts.fit(fit, contrasts=contr)
fit <- eBayes(fit)
limmares <- topTable(fit, coef = 1, number = Inf, sort.by = "none",
    adjust.method = "BH")
DEreport(limmares, countdata, coldata$dex) 
```

## Generating Report From Differential test results

The code below shows how to perform a Wilcoxon test with expression data and its
representation with `Rvisdiff`. This example can be also followed for the
representation of resulting analysis from differential means tests.
```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE}
untrt <- countdata[,coldata$dex=="untrt"]
trt <- countdata[,coldata$dex=="trt"]

library(matrixTests)
wilcox <- col_wilcoxon_twosample(t(untrt), t(trt))
stat <- wilcox$statistic
p <- wilcox$pvalue
log2FoldChange <- log2(rowMeans(trt)+1) - log2(rowMeans(untrt)+1)
wilcox <- cbind(stat = stat, pvalue = round(p, 6),
    padj = p.adjust(wilcox[,2], method = "BH"),
    baseMean = rowMeans(countdata),
    log2FoldChange = log2FoldChange)
rownames(wilcox) <- rownames(countdata)

DEreport(wilcox, countdata, coldata$dex)
```

# Resulting Graphical User Interface
Figure 1 shows the resulting Web page generated with the `DEreport` function.
The user can select which genes appear in the graphs selecting them in the
results table. It contains the following graphs:

* Volcano Plot: It is a scatter plot in which the values of rate of change are
plotted in logarithmic scale (log2foldchange) versus the p-value resulting from
the contrast test is scale minus logarithm 10 (-log10pvalue). Points are
highlighted when the mouse is hovered over the results table. Variable name
appears on point mouse over.
* MA-Plot: is a scatter plot showing mean expression values versus rate of
change, both are plotted in logarithmic scale to avoid excessive scatter. It has
the same interactivity features as Volcano plots.
* Line diagram: the gene expression levels (ordinates) in each sample
(abscissae) are represented as a line. Diagram is divided based on input
phenotype.
* Box plot: they allow us to visualize the distribution, degree of asymmetry,
extreme values and value of the median. It is also useful for comparing two
distributions if we represent them in the same graph. The resulting graphs show
the difference in expression between genes or conditions.
* Cluster Heatmap: expression data are displayed in a grid where each row
represents a gene and each column represents a sample. The color and intensity
of the boxes are used to represent changes (usually scaled per gene, avoiding
absolute values) in gene expression. The heatmap shows also a clustering tree
that groups genes and samples based on the similarity of their gene expression
pattern. The user can change the color scale and toggle rendering from raw to
scaled values. Moreover, the graph provides a zoom feature which enables to set
the focus on a set of samples or genes.

```{r echo=FALSE}
knitr::include_graphics("figure1.png")
```
**Figure 1** web interface

# SessionInfo {-}

```{r sessionInfo}
sessionInfo()
```