--- title: "Visualize Differential Expression results" author: "David Barrios and Carlos Prieto" date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('Rvisdiff')`" abstract: > `Rvisdiff` is an R/Bioconductor package which generates an interactive interface for the interpretation of differential expression results. It generates a local Web page which enables the exploration of statistical analysis results with the generation of auto-analytical visualizations. The package supports as input the output of popular differential expression packages such as `r BiocStyle::Biocpkg("DESeq2")`, `r BiocStyle::Biocpkg("edgeR")` and `r BiocStyle::Biocpkg("limma")`. output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Visualize Differential Expression results} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Getting started `Rvisdiff` is an R package distributed as part of the Bioconductor project. To install the package, start R and enter: ```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("Rvisdiff") ``` The GitHub repository for `Rvisdiff` is https://github.com/BioinfoUSAL/Rvisdiff. This is the place to file an issue, report a bug, or provide a pull request. Once `Rvisdiff` is installed, it can be loaded by the following command. ```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE} library("Rvisdiff") ``` # Introduction Differential expression analysis generates a big report which needs a manual inspection for the optimization and interpretation of results. Researchers have designed visualization techniques to facilitate these tasks but their generation with code or statistics packages avoids the quick and massive exploration of results. We have designed `Rvisdiff` to integrate graphs in an easy to use and interactive web page.The user can explore the differential expression results and the source expression data in the same view. As input data the package receives two tables with the differential expression results and the raw/normalized expression data. It detects the default output of `r BiocStyle::Biocpkg("DESeq2")`, `r BiocStyle::Biocpkg("edgeR")` and `r BiocStyle::Biocpkg("limma")` packages and no data conversion is needed. The user can also generate a custom data frame which integrates a statistical testing output with a fold change and mean calculation for each variable. As output the package generates a local HTML page that can be seen in a Web browser. It is not necessary the installation of additional software such as application servers or programming languages. This feature ensures portability and ease of use. Moreover, results are stored in the local computer, avoiding any network sharing or data upload to external servers, which ensures the data privacy. # Input data In this example we use as input the `r BiocStyle::Biocpkg("airway")` data package which contains the read counts in genes for an RNA-Seq experiment on four human airway smooth muscle cell lines treated with dexamethasone. The code below shows how to load the package and the data extraction of main data features that we need for the differential expression analysis and the posterior visualization with `Rvisdiff`. The `countdata` variable contains a data frame with the number of sequence counts for each gene (rows) and sample (columns). The `coldata` variable contains input phenotypes for the differential expression analysis and its posterior representation. The following code loads the necessary libraries and formats the input sample conditions. ```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE} library(Rvisdiff) library(airway) data("airway") se <- airway se$dex <- relevel(se$dex, ref="untrt") countdata <- assay(se) coldata <- colData(se) ``` # Generating the Report ## Generating Report From `r BiocStyle::Biocpkg("DESeq2")` results The code below shows how to perform a differential expression analysis with `r BiocStyle::Biocpkg("DESeq2")` and its representation with `Rvisdiff`. ```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE} library(DESeq2) dds <- DESeqDataSet(se, design = ~ cell + dex) dds <- DESeq(dds) dres <- results(dds, independentFiltering = FALSE) DEreport(dres, countdata, coldata$dex) ``` ## Generating Report From `r BiocStyle::Biocpkg("edgeR")` results The code below shows how to perform a differential expression analysis with `r BiocStyle::Biocpkg("edgeR")` and its representation with `Rvisdiff`. ```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE} library(edgeR) design <- model.matrix(~ cell + dex, data = coldata) dl <- DGEList(counts = countdata, group = coldata$dex) dl <- calcNormFactors(dl) dl <- estimateDisp(dl, design=design) de <- exactTest(dl,pair=1:2) tt <- topTags(de, n = Inf, adjust.method = "BH", sort.by = "none") DEreport(tt, countdata, coldata$dex) ``` ## Generating Report From `r BiocStyle::Biocpkg("limma")` results The code below shows how to perform a differential expression analysis with `r BiocStyle::Biocpkg("limma")` and its representation with `Rvisdiff`. ```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE} library(limma) design <- model.matrix(~ 0 + dex + cell, data = coldata) contr <- makeContrasts(dextrt - dexuntrt,levels=colnames(design)) limmaexprs <- voom(countdata, design) fit <- lmFit(limmaexprs, design) fit <- contrasts.fit(fit, contrasts=contr) fit <- eBayes(fit) limmares <- topTable(fit, coef = 1, number = Inf, sort.by = "none", adjust.method = "BH") DEreport(limmares, countdata, coldata$dex) ``` ## Generating Report From Differential test results The code below shows how to perform a Wilcoxon test with expression data and its representation with `Rvisdiff`. This example can be also followed for the representation of resulting analysis from differential means tests. ```{r , echo=TRUE, message=FALSE, warning=FALSE, eval=TRUE} untrt <- countdata[,coldata$dex=="untrt"] trt <- countdata[,coldata$dex=="trt"] library(matrixTests) wilcox <- col_wilcoxon_twosample(t(untrt), t(trt)) stat <- wilcox$statistic p <- wilcox$pvalue log2FoldChange <- log2(rowMeans(trt)+1) - log2(rowMeans(untrt)+1) wilcox <- cbind(stat = stat, pvalue = round(p, 6), padj = p.adjust(wilcox[,2], method = "BH"), baseMean = rowMeans(countdata), log2FoldChange = log2FoldChange) rownames(wilcox) <- rownames(countdata) DEreport(wilcox, countdata, coldata$dex) ``` # Resulting Graphical User Interface Figure 1 shows the resulting Web page generated with the `DEreport` function. The user can select which genes appear in the graphs selecting them in the results table. It contains the following graphs: * Volcano Plot: It is a scatter plot in which the values of rate of change are plotted in logarithmic scale (log2foldchange) versus the p-value resulting from the contrast test is scale minus logarithm 10 (-log10pvalue). Points are highlighted when the mouse is hovered over the results table. Variable name appears on point mouse over. * MA-Plot: is a scatter plot showing mean expression values versus rate of change, both are plotted in logarithmic scale to avoid excessive scatter. It has the same interactivity features as Volcano plots. * Line diagram: the gene expression levels (ordinates) in each sample (abscissae) are represented as a line. Diagram is divided based on input phenotype. * Box plot: they allow us to visualize the distribution, degree of asymmetry, extreme values and value of the median. It is also useful for comparing two distributions if we represent them in the same graph. The resulting graphs show the difference in expression between genes or conditions. * Cluster Heatmap: expression data are displayed in a grid where each row represents a gene and each column represents a sample. The color and intensity of the boxes are used to represent changes (usually scaled per gene, avoiding absolute values) in gene expression. The heatmap shows also a clustering tree that groups genes and samples based on the similarity of their gene expression pattern. The user can change the color scale and toggle rendering from raw to scaled values. Moreover, the graph provides a zoom feature which enables to set the focus on a set of samples or genes. ```{r echo=FALSE} knitr::include_graphics("figure1.png") ``` **Figure 1** web interface # SessionInfo {-} ```{r sessionInfo} sessionInfo() ```