--- title: "Publication-ready volcano plots with enhanced colouring and labeling" author: "Kevin Blighe" date: "`r Sys.Date()`" package: "`r packageVersion('EnhancedVolcano')`" output: html_document: toc: true toc_depth: 3 number_sections: true theme: united highlight: tango fig_width: 7 bibliography: library.bib vignette: > %\VignetteIndexEntry{Publication-ready volcano plots with enhanced colouring and labeling} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\usepackage[utf8]{inputenc} --- # Introduction Volcano plots represent a useful way to visualise the results of differential expression analyses. Here, we present a highly-configurable function that produces publication-ready volcano plots. EnhancedVolcano [@EnhancedVolcano] will attempt to fit as many transcript names in the plot window as possible, thus avoiding 'clogging' up the plot with labels that could not otherwise have been read. Other functionality allows the user to identify up to 3 different types of attributes in the same plot space via colour, shape, and shade parameter configurations. ```{r, echo=FALSE} library(knitr) opts_chunk$set(tidy = FALSE, message = FALSE, warning = FALSE) ``` # Installation ## 1. Download the package from Bioconductor ```{r getPackage, eval=FALSE} if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager') BiocManager::install('EnhancedVolcano') ``` Note: to install development version: ```{r getPackageDevel, eval=FALSE} devtools::install_github('kevinblighe/EnhancedVolcano') ``` ## 2. Load the package into R session ```{r Load, message=FALSE} library(EnhancedVolcano) ``` # Quick start For this example, we will follow the tutorial (from Section 3.1) of [RNA-seq workflow: gene-level exploratory analysis and differential expression](http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html). Specifically, we will load the 'airway' data, where different airway smooth muscle cells were treated with dexamethasone. ```{r} library(airway) library(magrittr) data('airway') airway$dex %<>% relevel('untrt') ``` Conduct differential expression using DESeq2 in order to create 2 sets of results: ```{r} library('DESeq2') dds <- DESeqDataSet(airway, design = ~ cell + dex) dds <- DESeq(dds, betaPrior=FALSE) res1 <- results(dds, contrast = c('dex','trt','untrt')) res1 <- lfcShrink(dds, contrast = c('dex','trt','untrt'), res=res1) res2 <- results(dds, contrast = c('cell', 'N061011', 'N61311')) res2 <- lfcShrink(dds, contrast = c('cell', 'N061011', 'N61311'), res=res2) ``` ## Plot the most basic volcano plot For the most basic volcano plot, only a single data-frame or -matrix of test results is required, containing transcript names, log2FC, and adjusted or unadjusted P values. The default cut-off for log2FC is >|2|; the default cut-off for P value is 0.05. ```{r ex1, fig.height = 8.5, fig.width = 7, fig.cap = "Plot the most basic volcano plot."} EnhancedVolcano(res1, lab = rownames(res1), x = 'log2FoldChange', y = 'pvalue', xlim = c(-5, 8)) ``` # Advanced features Virtually all aspects of an EnhancedVolcano plot can be configured for the purposes of accommodating all types of statistical distributions and labelling preferences. By default, EnhancedVolcano will only attempt to label genes that pass the thresholds that you set for statistical significance, i.e., 'pCutoff' and 'FCcutoff'. In addition, it will only label as many of these that can reasonably fit in the plot space. The user can optionally supply a vector of transcript names (as 'selectLab') that s/he wishes to label in the plot. ## Modify cut-offs for log2FC and P value; specify title; adjust point and label size The default P value cut-off of 10e-6 may be too relaxed for most studies, which may therefore necessitate increasing this threshold by a few orders of magnitude. Equally, the log2FC cut-offs may be too stringent, given that moderated 'shrunk' estimates of log2FC differences in differential expression analysis can now be calculated. In this example, we also modify the point and label size, which can help to improve clarity where many transcripts went into the differential expression analysis. ```{r ex2, fig.height = 8.5, fig.width = 7, fig.cap = "Modify cut-offs for log2FC and P value; specify title; adjust point and label size."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', xlim = c(-8, 8), title = 'N061011 versus N61311', pCutoff = 10e-16, FCcutoff = 1.5, transcriptPointSize = 1.5, transcriptLabSize = 3.0) ``` ## Adjust colour and alpha for point shading The default colour scheme may not be to everyone's taste. Here we make it such that only the transcripts passing both the log2FC and P value thresholds are coloured red, with everything else black. We also adjust the value for 'alpha', which controls the transparency of the plotted points: 1 = 100% opaque; 0 = 100% transparent. ```{r ex3, fig.height = 8.5, fig.width = 7, fig.cap = "Adjust colour and alpha for point shading."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', xlim = c(-8, 8), title = 'N061011 versus N61311', pCutoff = 10e-16, FCcutoff = 1.5, transcriptPointSize = 1.5, transcriptLabSize = 3.0, col=c('black', 'black', 'black', 'red3'), colAlpha = 1) ``` ## Adjust shape of plotted points It can help, visually, to also plot different points as different shapes. The default shape is a circle. The user can specify their own shape encoding via the 'shape' parameter, which accepts either a single or four possible values: if four values, these then map to the standard designation that is also assigned by the colours; if a single value, all points are shaped with this value. For more information on shape encoding search online at [ggplot2 Quick Reference: shape](http://sape.inf.usi.ch/quick-reference/ggplot2/shape) ```{r ex4, fig.height = 8.5, fig.width = 7, fig.cap = "Adjust shape of plotted points."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', xlim = c(-8, 8), title = 'N061011 versus N61311', pCutoff = 10e-16, FCcutoff = 1.5, transcriptPointSize = 3.0, transcriptLabSize = 3.0, shape = 8, colAlpha = 1) EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', xlim = c(-8, 8), title = 'N061011 versus N61311', pCutoff = 10e-16, FCcutoff = 1.5, transcriptPointSize = 2.0, transcriptLabSize = 3.0, shape = c(1, 4, 23, 25), colAlpha = 1) ``` ## Adjust cut-off lines and add extra threshold lines The lines that are drawn to indicate cut-off points are also modifiable. The parameter 'cutoffLineType' accepts the following values: "blank", "solid", "dashed", "dotted", "dotdash", "longdash", and "twodash". The colour and thickness of these can also be modified with 'cutoffLineCol' and 'cutoffLineWidth'. To disable the lines, set either cutoffLineType="blank" or cutoffLineWidth=0. Extra lines can also be added via 'hline' and 'vline' to display other cut-offs. To make these more visible, we will also remove the default gridlines. ```{r ex5, fig.height = 8.5, fig.width = 7, fig.cap = "Adjust cut-off lines and add extra threshold lines."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', xlim = c(-6, 6), title = 'N061011 versus N61311', pCutoff = 10e-12, FCcutoff = 1.5, transcriptPointSize = 1.5, transcriptLabSize = 3.0, colAlpha = 1, cutoffLineType = 'blank', cutoffLineCol = 'black', cutoffLineWidth = 0.8, hline = c(10e-12, 10e-36, 10e-60, 10e-84), hlineCol = c('grey0', 'grey25','grey50','grey75'), hlineType = 'longdash', hlineWidth = 0.8, gridlines.major = FALSE, gridlines.minor = FALSE) ``` ## Adjust legend position, size, and text The position of the legend can also be changed to "left" or "right" (and stacked vertically), or 'top' or "bottom" (stacked horizontally). The legend text, label size, and icon size can also be modified. ```{r ex6, fig.height = 8.5, fig.width = 12, fig.cap = "Adjust legend position, size, and text."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', xlim = c(-6, 6), pCutoff = 10e-12, FCcutoff = 1.5, cutoffLineType = 'twodash', cutoffLineWidth = 0.8, transcriptPointSize = 3.0, transcriptLabSize = 4.0, colAlpha = 1, legend=c('NS','Log (base 2) fold-change','P value', 'P value & Log (base 2) fold-change'), legendPosition = 'right', legendLabSize = 16, legendIconSize = 5.0) ``` Note: to make the legend completely invisible, specify: ```{r eval=FALSE} legendVisible = FALSE ``` ## Plot adjusted p-values Volcano plots do not have to be produced with nominal (unadjusted P values), even if this is the common practice. Simply provide a column name relating to adjusted P values and you can also generate a volcano with these. In this case, the cutoff for the P value then relates to the adjusted P value. Here, we also modify the axis titles by supplying an expression via the bquote function. ```{r ex7, fig.height = 8.5, fig.width = 7, fig.cap = "Plot adjusted p-values."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'padj', xlim=c(-6,6), xlab = bquote(~Log[2]~ 'fold change'), ylab = bquote(~-Log[10]~adjusted~italic(P)), pCutoff = 0.0001, FCcutoff = 1.0, transcriptLabSize = 4.0, colAlpha = 1, legend=c('NS','Log2 FC','Adjusted p-value', 'Adjusted p-value & Log2 FC'), legendPosition = 'bottom', legendLabSize = 10, legendIconSize = 3.0) ``` ## Fit more labels by adding connectors In order to maximise free space in the plot window, one can fit more transcript labels by adding connectors from labels to points, where appropriate. The width and colour of these connectors can also be modified with 'widthConnectors' and 'colConnectors', respectively. Further configuration is achievable via 'typeConnectors' ("open", "closed"), 'endsConnectors' ("last", "first", "both"), and lengthConnectors (default = unit(0.01, 'npc')). The result may not always be desirable as it can make the plot look overcrowded. ```{r ex8, fig.height = 8.5, fig.width = 11, fig.cap = "Fit more labels by adding connectors."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', xlim = c(-6,6), xlab = bquote(~Log[2]~ 'fold change'), pCutoff = 10e-14, FCcutoff = 2.0, transcriptPointSize = 3.0, transcriptLabSize = 4.0, colAlpha = 1, legend=c('NS','Log (base 2) fold-change','P value', 'P value & Log (base 2) fold-change'), legendPosition = 'right', legendLabSize = 12, legendIconSize = 4.0, drawConnectors = TRUE, widthConnectors = 0.2, colConnectors = 'grey30') ``` ## Only label key transcripts In many situations, people may only wish to label their key transcripts / transcripts of interest. One can therefore supply a vector of these transcripts via the 'selectLab' parameter, the contents of which have to also be present in the vector passed to 'lab'. In addition, only those transcripts that pass both the cutoff for log2FC and P value will be labelled. ```{r ex9, fig.height = 8.5, fig.width = 11, fig.cap = "Only label key transcripts."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', selectLab = c('ENSG00000106565','ENSG00000187758'), xlim = c(-6,7), xlab = bquote(~Log[2]~ 'fold change'), pCutoff = 10e-14, FCcutoff = 2.0, transcriptPointSize = 3.0, transcriptLabSize = 5.0, shape = c(4, 35, 17, 18), colAlpha = 1, legend=c('NS','Log (base 2) fold-change','P value', 'P value & Log (base 2) fold-change'), legendPosition = 'right', legendLabSize = 14, legendIconSize = 5.0) ``` ## Draw labels in boxes To improve label clarity, we can draw simple boxes around the plots labels. This works much better when drawConnectors is also TRUE. ```{r ex10, fig.height = 8.5, fig.width = 12, fig.cap = "Draw labels in boxes."} EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', selectLab = c('ENSG00000106565','ENSG00000187758', 'ENSG00000230795', 'ENSG00000164530', 'ENSG00000143153'), xlim = c(-5.5,8), xlab = bquote(~Log[2]~ 'fold change'), pCutoff = 10e-14, FCcutoff = 2.0, transcriptPointSize = 3.0, transcriptLabSize = 5.0, transcriptLabCol = 'black', transcriptLabFace = 'bold', boxedlabels = TRUE, colAlpha = 4/5, legend=c('NS','Log (base 2) fold-change','P value', 'P value & Log (base 2) fold-change'), legendPosition = 'right', legendLabSize = 14, legendIconSize = 4.0, drawConnectors = TRUE, widthConnectors = 1.0, colConnectors = 'black') ``` ## Over-ride colouring scheme with custom key-value pairs In certain situations, one may wish to over-ride the default colour scheme with their own colour-scheme, such as colouring transcripts by pathway, cell-type or group. This can be achieved by supplying a named vector as 'colCustom'. In this example, we just wish to colour all transcripts with log2FC > 2.5 as 'high' and those with log2FC < -2.5 as 'low'. ```{r eval = TRUE} # create custom key-value pairs for 'high', 'low', 'mid' expression by fold-change # set the base colour as 'black' keyvals <- rep('black', nrow(res2)) # set the base name/label as 'Mid' names(keyvals) <- rep('Mid', nrow(res2)) # modify keyvals for transcripts with fold change > 2.5 keyvals[which(res2$log2FoldChange > 2.5)] <- 'gold' names(keyvals)[which(res2$log2FoldChange > 2.5)] <- 'high' # modify keyvals for transcripts with fold change < -2.5 keyvals[which(res2$log2FoldChange < -2.5)] <- 'royalblue' names(keyvals)[which(res2$log2FoldChange < -2.5)] <- 'low' unique(names(keyvals)) unique(keyvals) keyvals[1:20] ``` ```{r ex11, fig.height = 9.5, fig.width = 15, fig.cap = "Over-ride colouring scheme with custom key-value pairs."} p1 <- EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', selectLab = rownames(res2)[which(names(keyvals) %in% c('high', 'low'))], xlim = c(-6.5,6.5), xlab = bquote(~Log[2]~ 'fold change'), title = 'Custom colour over-ride', pCutoff = 10e-14, FCcutoff = 1.0, transcriptPointSize = 3.5, transcriptLabSize = 4.5, shape = c(6, 4, 2, 11), colCustom = keyvals, colAlpha = 1, legendPosition = 'top', legendLabSize = 15, legendIconSize = 5.0, drawConnectors = TRUE, widthConnectors = 0.5, colConnectors = 'grey50', gridlines.major = TRUE, gridlines.minor = FALSE, border = 'partial', borderWidth = 1.5, borderColour = 'black') p2 <- EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', selectLab = rownames(res2)[which(names(keyvals) %in% c('high', 'low'))], xlim = c(-6.5,6.5), xlab = bquote(~Log[2]~ 'fold change'), title = 'No custom colour over-ride', pCutoff = 10e-14, FCcutoff = 1.0, transcriptPointSize = 3.5, transcriptLabSize = 4.5, colCustom = NULL, colAlpha = 1, legendPosition = 'top', legendLabSize = 15, legendIconSize = 5.0, drawConnectors = FALSE, widthConnectors = 0.5, colConnectors = 'grey50', gridlines.major = TRUE, gridlines.minor = FALSE, border = 'full', borderWidth = 1.0, borderColour = 'black') library(gridExtra) library(grid) grid.arrange(p1, p2, ncol=2, top = textGrob('EnhancedVolcano', just = c('center'), gp = gpar(fontsize = 32))) grid.rect(gp=gpar(fill=NA)) ``` ## Over-ride colour and/or shape scheme with custom key-value pairs In this example, we first over-ride the existing shape scheme and then both the colour and shape scheme at the same time. ```{r eval = TRUE} # define different cell-types that will be shaded celltype1 <- c('ENSG00000106565', 'ENSG00000002933', 'ENSG00000165246', 'ENSG00000224114') celltype2 <- c('ENSG00000230795', 'ENSG00000164530', 'ENSG00000143153', 'ENSG00000169851', 'ENSG00000231924', 'ENSG00000145681') # create custom key-value pairs for different cell-types # set the base shape as '3' keyvals.shape <- rep(3, nrow(res2)) # set the base name/label as 'PBC' names(keyvals.shape) <- rep('PBC', nrow(res2)) # modify the keyvals for cell-type 1 keyvals.shape[which(rownames(res2) %in% celltype1)] <- 17 names(keyvals.shape)[which(rownames(res2) %in% celltype1)] <- 'Cell-type 1' # modify the keyvals for cell-type 2 keyvals.shape[which(rownames(res2) %in% celltype2)] <- 64 names(keyvals.shape)[which(rownames(res2) %in% celltype2)] <- 'Cell-type 2' unique(names(keyvals.shape)) unique(keyvals.shape) keyvals.shape[1:20] ``` ```{r ex12, fig.height = 9, fig.width = 17, fig.cap = "Over-ride colour and/or shape scheme with custom key-value pairs."} p1 <- EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', selectLab = rownames(res2)[which(names(keyvals) %in% c('high', 'low'))], xlim = c(-6.5,6.5), xlab = bquote(~Log[2]~ 'fold change'), title = 'Custom shape over-ride', pCutoff = 10e-14, FCcutoff = 1.0, transcriptPointSize = 3.5, transcriptLabSize = 4.5, shapeCustom = keyvals.shape, colCustom = NULL, colAlpha = 1, legendLabSize = 15, legendPosition = 'left', legendIconSize = 5.0, drawConnectors = TRUE, widthConnectors = 0.5, colConnectors = 'grey50', gridlines.major = TRUE, gridlines.minor = FALSE, border = 'partial', borderWidth = 1.5, borderColour = 'black') # create custom key-value pairs for 'high', 'low', 'mid' expression by fold-change # set the base colour as 'black' keyvals.colour <- rep('black', nrow(res2)) # set the base name/label as 'Mid' names(keyvals.colour) <- rep('Mid', nrow(res2)) # modify keyvals for transcripts with fold change > 2.5 keyvals.colour[which(res2$log2FoldChange > 2.5)] <- 'gold' names(keyvals.colour)[which(res2$log2FoldChange > 2.5)] <- 'high' # modify keyvals for transcripts with fold change < -2.5 keyvals.colour[which(res2$log2FoldChange < -2.5)] <- 'royalblue' names(keyvals.colour)[which(res2$log2FoldChange < -2.5)] <- 'low' unique(names(keyvals.colour)) unique(keyvals.colour) p2 <- EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', selectLab = rownames(res2)[which(names(keyvals) %in% c('High', 'Low'))], xlim = c(-6.5,6.5), xlab = bquote(~Log[2]~ 'fold change'), title = 'Custom shape & colour over-ride', pCutoff = 10e-14, FCcutoff = 1.0, transcriptPointSize = 5.5, transcriptLabSize = 0.0, shapeCustom = keyvals.shape, colCustom = keyvals.colour, colAlpha = 1, legendPosition = 'top', legendLabSize = 15, legendIconSize = 5.0, drawConnectors = TRUE, widthConnectors = 0.5, colConnectors = 'grey50', gridlines.major = TRUE, gridlines.minor = FALSE, border = 'full', borderWidth = 1.0, borderColour = 'black') library(gridExtra) library(grid) grid.arrange(p1, p2, ncol=2, top = textGrob('EnhancedVolcano', just = c('center'), gp = gpar(fontsize = 32))) grid.rect(gp=gpar(fill=NA)) ``` ## Shade certain transcripts In this example we add an extra level of highlighting key transcripts by shading. This feature works best for shading just 1 or 2 key transcripts. It is expected that the user can use the 'shapeCustom' parameter for more in depth identification of different types of transcripts. ```{r eval = TRUE} # define different cell-types that will be shaded celltype1 <- c('ENSG00000106565', 'ENSG00000002933') celltype2 <- c('ENSG00000230795', 'ENSG00000164530') ``` ```{r ex13, fig.height = 9, fig.width = 16, fig.cap = "Shade certain transcripts."} p1 <- EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', selectLab = celltype1, xlim = c(-6.5,6.5), xlab = bquote(~Log[2]~ 'fold change'), title = 'Shading cell-type 1', pCutoff = 10e-14, FCcutoff = 1.0, transcriptPointSize = 8.0, transcriptLabSize = 5.0, transcriptLabCol = 'purple', transcriptLabFace = 'bold', boxedlabels = TRUE, shape = 42, colCustom = keyvals, colAlpha = 1, legendPosition = 'top', legendLabSize = 15, legendIconSize = 5.0, shade = celltype1, shadeLabel = 'Cell-type I', shadeAlpha = 1/2, shadeFill = 'purple', shadeSize = 1, shadeBins = 5, drawConnectors = TRUE, widthConnectors = 1.0, colConnectors = 'grey30', gridlines.major = TRUE, gridlines.minor = FALSE, border = 'partial', borderWidth = 1.5, borderColour = 'black') p2 <- EnhancedVolcano(res2, lab = rownames(res2), x = 'log2FoldChange', y = 'pvalue', selectLab = celltype2, xlim = c(-6.5,6.5), xlab = bquote(~Log[2]~ 'fold change'), title = 'Shading cell-type 2', pCutoff = 10e-14, FCcutoff = 1.0, transcriptLabSize = 5.0, transcriptLabCol = 'forestgreen', transcriptLabFace = 'bold', shapeCustom = keyvals.shape, colCustom = keyvals.colour, colAlpha = 1, legendPosition = 'top', transcriptPointSize = 4.0, legendLabSize = 15, legendIconSize = 5.0, shade = celltype2, shadeLabel = 'Cell-type II', shadeAlpha = 1/2, shadeFill = 'forestgreen', shadeSize = 1, shadeBins = 5, drawConnectors = TRUE, widthConnectors = 1.0, colConnectors = 'grey30', gridlines.major = TRUE, gridlines.minor = FALSE, border = 'full', borderWidth = 1.0, borderColour = 'black') library(gridExtra) library(grid) grid.arrange(p1, p2, ncol=2, top = textGrob('EnhancedVolcano', just = c('center'), gp = gpar(fontsize = 32))) grid.rect(gp=gpar(fill=NA)) ``` # Acknowledgments The development of *EnhancedVolcano* has benefited from contributions and suggestions from: Sharmila Rana, [Myles Lewis](https://www.qmul.ac.uk/whri/people/academic-staff/items/lewismyles.html), Luke Dow - Assistant Professor at Weill Cornell Medicine, Tokhir Dadaev - Institute of Cancer Research, Alina Frolova # Session info ```{r} sessionInfo() ``` ## References @EnhancedVolcano