1 Basic DESeq2 results exploration

Project: SRP009615.

2 Introduction

This report is meant to help explore DESeq2 (Love, Huber, and Anders, 2014) results and was generated using the regionReport (Collado-Torres, Jaffe, and Leek, 2016) package. While the report is rich, it is meant to just start the exploration of the results and exemplify some of the code used to do so. If you need a more in-depth analysis for your specific data set you might want to use the customCode argument. This report is based on the vignette of the DESeq2 (Love, Huber, and Anders, 2014) package which you can find here.

2.1 Code setup

This section contains the code for setting up the rest of the report.

## knitrBoostrap and device chunk options
load_install('knitr')
opts_chunk$set(bootstrap.show.code = FALSE, dev = device)
if(!outputIsHTML) opts_chunk$set(bootstrap.show.code = FALSE, dev = device, echo = FALSE)
#### Libraries needed

## Bioconductor
load_install('DESeq2')
if(isEdgeR) load_install('edgeR')

## CRAN
load_install('ggplot2')
if(!is.null(theme)) theme_set(theme)
load_install('knitr')
if(is.null(colors)) {
    load_install('RColorBrewer')
}
load_install('pheatmap')
load_install('DT')
load_install('devtools')

## Working behind the scenes
# load_install('knitcitations')
# load_install('rmarkdown')
## Optionally
# load_install('knitrBootstrap')

#### Code setup

## For ggplot
res.df <- as.data.frame(res)

## Sort results by adjusted p-values
ord <- order(res.df$padj, decreasing = FALSE)
res.df <- res.df[ord, ]
res.df <- cbind(data.frame(Feature = rownames(res.df)), res.df)
rownames(res.df) <- NULL

3 PCA

## Transform count data
rld <- tryCatch(rlog(dds), error = function(e) { rlog(dds, fitType = 'mean') })

## Perform PCA analysis and make plot
plotPCA(rld, intgroup = intgroup)

## Get percent of variance explained
data_pca <- plotPCA(rld, intgroup = intgroup, returnData = TRUE)
percentVar <- round(100 * attr(data_pca, "percentVar"))

The above plot shows the first two principal components that explain the variability in the data using the regularized log count data. If you are unfamiliar with principal component analysis, you might want to check the Wikipedia entry or this interactive explanation. In this case, the first and second principal component explain 65 and 15 percent of the variance respectively.

4 Sample-to-sample distances

## Obtain the sample euclidean distances
sampleDists <- dist(t(assay(rld)))
sampleDistMatrix <- as.matrix(sampleDists)
## Add names based on intgroup
rownames(sampleDistMatrix) <- apply(as.data.frame(colData(rld)[, intgroup]), 1,
    paste, collapse = ' : ')
colnames(sampleDistMatrix) <- NULL

## Define colors to use for the heatmap if none were supplied
if(is.null(colors)) {
    colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
}

## Make the heatmap
pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists,
    clustering_distance_cols = sampleDists, color = colors)

This plot shows how samples are clustered based on their euclidean distance using the regularized log transformed count data. This figure gives an overview of how the samples are hierarchically clustered. It is a complementary figure to the PCA plot.

5 MA plots

This section contains three MA plots (see Wikipedia) that compare the mean of the normalized counts against the log fold change. They show one point per feature. The points are shown in red if the feature has an adjusted p-value less than alpha, that is, the statistically significant features are shown in red.

## MA plot with alpha used in DESeq2::results()
plotMA(res, alpha = metadata(res)$alpha, main = paste('MA plot with alpha =',
    metadata(res)$alpha))

This first plot shows uses alpha = 0.1, which is the alpha value used to determine which resulting features were significant when running the function DESeq2::results().

## MA plot with alpha = 1/2 of the alpha used in DESeq2::results()
plotMA(res, alpha = metadata(res)$alpha / 2,
    main = paste('MA plot with alpha =', metadata(res)$alpha / 2))

This second MA plot uses alpha = 0.05 and can be used agains the first MA plot to identify which features have adjusted p-values between 0.05 and 0.1.

## MA plot with alpha corresponding to the one that gives the nBest features
nBest.actual <- min(nBest, nrow(head(res.df, n = nBest)))
nBest.alpha <- head(res.df, n = nBest)$padj[nBest.actual]
plotMA(res, alpha = nBest.alpha * 1.00000000000001,
    main = paste('MA plot for top', nBest.actual, 'features'))

The third and final MA plot uses an alpha such that the top 10 features are shown in the plot. These are the features that whose details are included in the top features interactive table.

6 P-values distribution

## P-value histogram plot
ggplot(res.df[!is.na(res.df$pvalue), ], aes(x = pvalue)) +
    geom_histogram(alpha=.5, position='identity', bins = 50) +
    labs(title='Histogram of unadjusted p-values') +
    xlab('Unadjusted p-values') +
    xlim(c(0, 1.0005))

This plot shows a histogram of the unadjusted p-values. It might be skewed right or left, or flat as shown in the Wikipedia examples. The shape depends on the percent of features that are differentially expressed. For further information on how to interpret a histogram of p-values check David Robinson’s post on this topic.

## P-value distribution summary
summary(res.df$pvalue)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.207   0.540   0.504   0.798   1.000    7979

This is the numerical summary of the distribution of the p-values.

## Split features by different p-value cutoffs
pval_table <- lapply(c(1e-04, 0.001, 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
    0.6, 0.7, 0.8, 0.9, 1), function(x) {
    data.frame('Cut' = x, 'Count' = sum(res.df$pvalue <= x, na.rm = TRUE))
})
pval_table <- do.call(rbind, pval_table)
if(outputIsHTML) {
    kable(pval_table, format = 'markdown', align = c('c', 'c'))
} else {
    kable(pval_table)
}
Cut Count
0.0001 243
0.0010 776
0.0100 2371
0.0250 3817
0.0500 5470
0.1000 8065
0.2000 12280
0.3000 16038
0.4000 19670
0.5000 23433
0.6000 27397
0.7000 32537
0.8000 38161
0.9000 44657
1.0000 50058

This table shows the number of features with p-values less or equal than some commonly used cutoff values.

7 Adjusted p-values distribution

## Adjusted p-values histogram plot
ggplot(res.df[!is.na(res.df$padj), ], aes(x = padj)) +
    geom_histogram(alpha=.5, position='identity', bins = 50) +
    labs(title=paste('Histogram of', elementMetadata(res)$description[grep('adjusted', elementMetadata(res)$description)])) +
    xlab('Adjusted p-values') +
    xlim(c(0, 1.0005))

This plot shows a histogram of the BH adjusted p-values. It might be skewed right or left, or flat as shown in the Wikipedia examples.

## Adjusted p-values distribution summary
summary(res.df$padj)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    0.31    0.60    0.57    0.84    1.00   33963

This is the numerical summary of the distribution of the BH adjusted p-values.

## Split features by different adjusted p-value cutoffs
padj_table <- lapply(c(1e-04, 0.001, 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
    0.6, 0.7, 0.8, 0.9, 1), function(x) {
    data.frame('Cut' = x, 'Count' = sum(res.df$padj <= x, na.rm = TRUE))
})
padj_table <- do.call(rbind, padj_table)
if(outputIsHTML) {
    kable(padj_table, format = 'markdown', align = c('c', 'c'))
} else {
    kable(padj_table)
}
Cut Count
0.0001 9
0.0010 36
0.0100 234
0.0250 608
0.0500 1114
0.1000 2141
0.2000 3961
0.3000 5893
0.4000 7937
0.5000 9901
0.6000 11994
0.7000 14323
0.8000 16814
0.9000 19854
1.0000 24074

This table shows the number of features with BH adjusted p-values less or equal than some commonly used cutoff values.

8 Top features

This interactive table shows the top 10 features ordered by their BH adjusted p-values. Use the search function to find your feature of interest or sort by one of the columns.

## Add search url if appropriate
if(!is.null(searchURL) & outputIsHTML) {
    res.df$Feature <- paste0('<a href="', searchURL, res.df$Feature, '">',
        res.df$Feature, '</a>')
}

for(i in which(colnames(res.df) %in% c('pvalue', 'padj'))) res.df[, i] <- format(res.df[, i], scientific = TRUE)

if(outputIsHTML) {
    datatable(head(res.df, n = nBest), options = list(pagingType='full_numbers', pageLength=10, scrollX='100%'), escape = FALSE, rownames = FALSE) %>% formatRound(which(!colnames(res.df) %in% c('pvalue', 'padj', 'Feature')), digits)
} else {
    res.df_top <- head(res.df, n = 20)
    for(i in which(!colnames(res.df) %in% c('pvalue', 'padj', 'Feature'))) res.df_top[, i] <- round(res.df_top[, i], digits)
    kable(res.df_top)
}

9 Count plots top features

This section contains plots showing the normalized counts per sample for each group of interest. Only the best 2 features are shown, ranked by their BH adjusted p-values. The Y axis is on the log10 scale and the feature name is shown in the title of each plot.

plotCounts_gg <- function(i, dds, intgroup) {
    group <- if (length(intgroup) == 1) {
        colData(dds)[[intgroup]]
    } else if (length(intgroup) == 2) {
        lvls <- as.vector(t(outer(levels(colData(dds)[[intgroup[1]]]), 
            levels(colData(dds)[[intgroup[2]]]), function(x, 
                y) paste(x, y, sep = " : "))))
        droplevels(factor(apply(as.data.frame(colData(dds)[, 
            intgroup, drop = FALSE]), 1, paste, collapse = " : "), 
            levels = lvls))
    } else {
        factor(apply(as.data.frame(colData(dds)[, intgroup, drop = FALSE]), 
            1, paste, collapse = " : "))
    }
    data <- plotCounts(dds, gene=i, intgroup=intgroup, returnData = TRUE)
    data <- cbind(data, data.frame('group' = group))
    main <- rownames(dds)[i]

    ggplot(data, aes(x = group, y = count)) + geom_point() + ylab('Normalized count') + ggtitle(main) + coord_trans(y = "log10")
}
for(i in head(ord, nBestFeatures)) {
    print(plotCounts_gg(i, dds = dds, intgroup = intgroup))
}

10 Reproducibility

The input for this report was generated with DESeq2 (Love, Huber, and Anders, 2014) using version 1.16.1 and the resulting features were called significantly differentially expressed if their BH adjusted p-values were less than alpha = 0.1. This report was generated in path /tmp/RtmpW7WKLD/Rbuild60fe33fe1737/recount/vignettes using the following call to DESeq2Report():

## DESeq2Report(dds = dds, project = "SRP009615", intgroup = c("group", 
##     "gene_target"), res = res, nBest = 10, nBestFeatures = 2, 
##     outdir = ".", output = "SRP009615-results", device = "png", 
##     template = "SRP009615-results-template.Rmd")

Date the report was generated.

## [1] "2017-08-11 20:36:12 EDT"

Wallclock time spent generating the report.

## Time difference of 51.5 secs

R session information.

## Session info ----------------------------------------------------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.1 (2017-06-30)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  C                           
##  tz       posixrules                  
##  date     2017-08-11
## Packages --------------------------------------------------------------------------------------------------------------
##  package              * version  date       source        
##  acepack                1.4.1    2016-10-29 CRAN (R 3.4.1)
##  annotate               1.54.0   2017-08-11 Bioconductor  
##  AnnotationDbi          1.38.2   2017-08-11 Bioconductor  
##  backports              1.1.0    2017-05-22 CRAN (R 3.4.1)
##  base                 * 3.4.1    2017-07-06 local         
##  base64enc              0.1-3    2015-07-28 CRAN (R 3.4.1)
##  bibtex                 0.4.2    2017-06-30 CRAN (R 3.4.1)
##  Biobase              * 2.36.2   2017-08-11 Bioconductor  
##  BiocGenerics         * 0.22.0   2017-08-11 Bioconductor  
##  BiocParallel           1.10.1   2017-08-11 Bioconductor  
##  BiocStyle            * 2.4.1    2017-08-11 Bioconductor  
##  biomaRt                2.32.1   2017-08-11 Bioconductor  
##  Biostrings             2.44.2   2017-08-11 Bioconductor  
##  bit                    1.1-12   2014-04-09 CRAN (R 3.4.1)
##  bit64                  0.9-7    2017-05-08 CRAN (R 3.4.1)
##  bitops                 1.0-6    2013-08-17 CRAN (R 3.4.1)
##  blob                   1.1.0    2017-06-17 CRAN (R 3.4.1)
##  bookdown               0.4      2017-05-20 CRAN (R 3.4.1)
##  BSgenome               1.44.0   2017-08-11 Bioconductor  
##  bumphunter             1.16.0   2017-08-11 Bioconductor  
##  checkmate              1.8.3    2017-07-03 CRAN (R 3.4.1)
##  cluster                2.0.6    2017-03-10 CRAN (R 3.4.1)
##  codetools              0.2-15   2016-10-05 CRAN (R 3.4.1)
##  colorspace             1.3-2    2016-12-14 CRAN (R 3.4.1)
##  compiler               3.4.1    2017-07-06 local         
##  data.table             1.10.4   2017-02-01 CRAN (R 3.4.1)
##  datasets             * 3.4.1    2017-07-06 local         
##  DBI                    0.7      2017-06-18 CRAN (R 3.4.1)
##  DEFormats              1.4.0    2017-08-11 Bioconductor  
##  DelayedArray         * 0.2.7    2017-08-11 Bioconductor  
##  derfinder              1.10.5   2017-08-11 Bioconductor  
##  derfinderHelper        1.10.0   2017-08-11 Bioconductor  
##  DESeq2               * 1.16.1   2017-08-11 Bioconductor  
##  devtools             * 1.13.3   2017-08-02 CRAN (R 3.4.1)
##  digest                 0.6.12   2017-01-27 CRAN (R 3.4.1)
##  doRNG                  1.6.6    2017-04-10 CRAN (R 3.4.1)
##  downloader             0.4      2015-07-09 CRAN (R 3.4.1)
##  DT                   * 0.2      2016-08-09 CRAN (R 3.4.1)
##  edgeR                  3.18.1   2017-08-11 Bioconductor  
##  evaluate               0.10.1   2017-06-24 CRAN (R 3.4.1)
##  foreach                1.4.3    2015-10-13 CRAN (R 3.4.1)
##  foreign                0.8-69   2017-06-22 CRAN (R 3.4.1)
##  Formula                1.2-2    2017-07-10 CRAN (R 3.4.1)
##  genefilter             1.58.1   2017-08-11 Bioconductor  
##  geneplotter            1.54.0   2017-08-11 Bioconductor  
##  GenomeInfoDb         * 1.12.2   2017-08-11 Bioconductor  
##  GenomeInfoDbData       0.99.0   2017-07-06 Bioconductor  
##  GenomicAlignments      1.12.1   2017-08-11 Bioconductor  
##  GenomicFeatures        1.28.4   2017-08-11 Bioconductor  
##  GenomicFiles           1.12.0   2017-08-11 Bioconductor  
##  GenomicRanges        * 1.28.4   2017-08-11 Bioconductor  
##  GEOquery               2.42.0   2017-08-11 Bioconductor  
##  ggplot2              * 2.2.1    2016-12-30 CRAN (R 3.4.1)
##  graphics             * 3.4.1    2017-07-06 local         
##  grDevices            * 3.4.1    2017-07-06 local         
##  grid                   3.4.1    2017-07-06 local         
##  gridExtra              2.2.1    2016-02-29 CRAN (R 3.4.1)
##  gtable                 0.2.0    2016-02-26 CRAN (R 3.4.1)
##  highr                  0.6      2016-05-09 CRAN (R 3.4.1)
##  Hmisc                  4.0-3    2017-05-02 CRAN (R 3.4.1)
##  htmlTable              1.9      2017-01-26 CRAN (R 3.4.1)
##  htmltools              0.3.6    2017-04-28 CRAN (R 3.4.1)
##  htmlwidgets            0.9      2017-07-10 CRAN (R 3.4.1)
##  httr                   1.2.1    2016-07-03 CRAN (R 3.4.1)
##  IRanges              * 2.10.2   2017-08-11 Bioconductor  
##  iterators              1.0.8    2015-10-13 CRAN (R 3.4.1)
##  jsonlite               1.5      2017-06-01 CRAN (R 3.4.1)
##  knitcitations        * 1.0.8    2017-07-04 CRAN (R 3.4.1)
##  knitr                * 1.17     2017-08-10 CRAN (R 3.4.1)
##  knitrBootstrap         1.0.1    2017-07-19 CRAN (R 3.4.1)
##  labeling               0.3      2014-08-23 CRAN (R 3.4.1)
##  lattice                0.20-35  2017-03-25 CRAN (R 3.4.1)
##  latticeExtra           0.6-28   2016-02-09 CRAN (R 3.4.1)
##  lazyeval               0.2.0    2016-06-12 CRAN (R 3.4.1)
##  limma                  3.32.5   2017-08-11 Bioconductor  
##  locfit                 1.5-9.1  2013-04-20 CRAN (R 3.4.1)
##  lubridate              1.6.0    2016-09-13 CRAN (R 3.4.1)
##  magrittr               1.5      2014-11-22 CRAN (R 3.4.1)
##  markdown               0.8      2017-04-20 CRAN (R 3.4.1)
##  Matrix                 1.2-10   2017-05-03 CRAN (R 3.4.1)
##  matrixStats          * 0.52.2   2017-04-14 CRAN (R 3.4.1)
##  memoise                1.1.0    2017-04-21 CRAN (R 3.4.1)
##  methods              * 3.4.1    2017-07-06 local         
##  munsell                0.4.3    2016-02-13 CRAN (R 3.4.1)
##  nnet                   7.3-12   2016-02-02 CRAN (R 3.4.1)
##  parallel             * 3.4.1    2017-07-06 local         
##  pheatmap             * 1.0.8    2015-12-11 CRAN (R 3.4.1)
##  pkgmaker               0.22     2014-05-14 CRAN (R 3.4.1)
##  plyr                   1.8.4    2016-06-08 CRAN (R 3.4.1)
##  qvalue                 2.8.0    2017-08-11 Bioconductor  
##  R6                     2.2.2    2017-06-17 CRAN (R 3.4.1)
##  RColorBrewer         * 1.1-2    2014-12-07 CRAN (R 3.4.1)
##  Rcpp                   0.12.12  2017-07-15 CRAN (R 3.4.1)
##  RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.4.1)
##  recount              * 1.2.3    2017-08-12 Bioconductor  
##  RefManageR             0.14.12  2017-07-04 CRAN (R 3.4.1)
##  regionReport         * 1.10.2   2017-08-11 Bioconductor  
##  registry               0.3      2015-07-08 CRAN (R 3.4.1)
##  rentrez                1.1.0    2017-06-01 CRAN (R 3.4.1)
##  reshape2               1.4.2    2016-10-22 CRAN (R 3.4.1)
##  rlang                  0.1.2    2017-08-09 CRAN (R 3.4.1)
##  rmarkdown              1.6      2017-06-15 CRAN (R 3.4.1)
##  rngtools               1.2.4    2014-03-06 CRAN (R 3.4.1)
##  rpart                  4.1-11   2017-03-13 CRAN (R 3.4.1)
##  rprojroot              1.2      2017-01-16 CRAN (R 3.4.1)
##  Rsamtools              1.28.0   2017-08-11 Bioconductor  
##  RSQLite                2.0      2017-06-19 CRAN (R 3.4.1)
##  rtracklayer            1.36.4   2017-08-11 Bioconductor  
##  S4Vectors            * 0.14.3   2017-08-11 Bioconductor  
##  scales                 0.4.1    2016-11-09 CRAN (R 3.4.1)
##  splines                3.4.1    2017-07-06 local         
##  stats                * 3.4.1    2017-07-06 local         
##  stats4               * 3.4.1    2017-07-06 local         
##  stringi                1.1.5    2017-04-07 CRAN (R 3.4.1)
##  stringr                1.2.0    2017-02-18 CRAN (R 3.4.1)
##  SummarizedExperiment * 1.6.3    2017-08-11 Bioconductor  
##  survival               2.41-3   2017-04-04 CRAN (R 3.4.1)
##  tibble                 1.3.3    2017-05-28 CRAN (R 3.4.1)
##  tools                  3.4.1    2017-07-06 local         
##  utils                * 3.4.1    2017-07-06 local         
##  VariantAnnotation      1.22.3   2017-08-11 Bioconductor  
##  withr                  2.0.0    2017-07-28 CRAN (R 3.4.1)
##  XML                    3.98-1.9 2017-06-19 CRAN (R 3.4.1)
##  xml2                   1.1.1    2017-01-24 CRAN (R 3.4.1)
##  xtable                 1.8-2    2016-02-05 CRAN (R 3.4.1)
##  XVector                0.16.0   2017-08-11 Bioconductor  
##  yaml                   2.1.14   2016-11-12 CRAN (R 3.4.1)
##  zlibbioc               1.22.0   2017-08-11 Bioconductor

Pandoc version used: 1.19.1.

11 Bibliography

This report was created with regionReport (Collado-Torres, Jaffe, and Leek, 2016) using rmarkdown (Allaire, Cheng, Xie, McPherson, et al., 2017) while knitr (Xie, 2014) and DT (Xie, 2016) were running behind the scenes. pheatmap (Kolde, 2015) was used to create the sample distances heatmap. Several plots were made with ggplot2 (Wickham, 2009).

Citations made with knitcitations (Boettiger, 2017). The BibTeX file can be found here.

[1] J. Allaire, J. Cheng, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 1.6. 2017. URL: https://CRAN.R-project.org/package=rmarkdown.

[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.

## No encoding supplied: defaulting to UTF-8.

[1] R. Kolde. pheatmap: Pretty Heatmaps. R package version 1.0.8. 2015. URL: https://CRAN.R-project.org/package=pheatmap.

## No encoding supplied: defaulting to UTF-8.

[1] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009. ISBN: 978-0-387-98140-6. URL: http://ggplot2.org.

[1] Y. Xie. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.2. 2016. URL: https://CRAN.R-project.org/package=DT.

[1] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.