--- title: "Introduction to regionReport" author: "L Collado-Torres" date: "`r doc_date()`" package: "`r pkg_ver('regionReport')`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Introduction to regionReport} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- HTML reports for a set of regions or DESeq2 results =================================================== ```{r vignetteSetup, echo = FALSE, message = FALSE, warning = FALSE} ## Track time spent on making the vignette startTimeVignette <- Sys.time() ## Bib setup library('knitcitations') ## Load knitcitations with a clean bibliography cleanbib() cite_options(hyperlink = 'to.doc', citation_format = 'text', style = 'html') # Note links won't show for now due to the following issue # https://github.com/cboettig/knitcitations/issues/63 ## Write bibliography information bibs <- c(knitcitations = citation('knitcitations'), derfinder = citation('derfinder')[1], regionReport = citation('regionReport')[1], knitrBootstrap = citation('knitrBootstrap'), #BiocStyle = citation('BiocStyle'), BiocStyle = RefManageR::BibEntry(bibtype = 'unpublished', key = 'BiocStyle', title = 'BiocStyle: Standard styles for vignettes and other Bioconductor documents', author = 'Martin Morgan and Andrzej OleÅ› and Wolfgang Huber', note = 'R package version 1.8.0', year = '2015'), ggbio = citation('ggbio'), ggplot2 = citation('ggplot2'), knitr = citation('knitr')[3], rmarkdown = citation('rmarkdown'), DT = citation('DT'), R = citation(), IRanges = citation('IRanges'), devtools = citation('devtools'), #GenomeInfoDb = citation('GenomeInfoDb'), GenomeInfoDb = RefManageR::BibEntry(bibtype = 'unpublished', key = 'GenomeInfoDb', title = "GenomeInfoDb: Utilities for manipulating chromosome and other 'seqname' identifiers", author = 'Sonali Arora and Martin Morgan and Marc Carlson and H. Pages', note = 'R package version 1.7.3', year = '2015'), GenomicRanges = citation('GenomicRanges'), #biovizBase = citation('biovizBase'), biovizBase = RefManageR::BibEntry(bibtype = 'unpublished', key = 'biovizBase', title = 'biovizBase: Basic graphic utilities for visualization of genomic data.', author = 'Tengfei Yin and Michael Lawrence and Dianne Cook', note = 'R package version 1.19.0', year = '2015'), #TxDb.Hsapiens.UCSC.hg19.knownGene = citation('TxDb.Hsapiens.UCSC.hg19.knownGene'), TxDb.Hsapiens.UCSC.hg19.knownGene = RefManageR::BibEntry(bibtype = 'unpublished', key = 'TxDb.Hsapiens.UCSC.hg19.knownGene', title = 'TxDb.Hsapiens.UCSC.hg19.knownGene: Annotation package for TxDb object(s)', author = 'Marc Carlson and Bioconductor Package Maintainer', note = 'R package version 3.2.2', year = '2015'), derfinderPlot = citation('derfinderPlot')[1], grid = citation('grid'), gridExtra = citation('gridExtra'), mgcv = citation('mgcv'), RColorBrewer = citation('RColorBrewer'), Cairo = citation('Cairo'), whikser = citation('whisker'), bumphunter = citation('bumphunter')[1], pheatmap = citation('pheatmap'), DESeq2 = citation('DESeq2'), edgeR1 = citation('edgeR')[1], edgeR2 = citation('edgeR')[2], edgeR5 = citation('edgeR')[5], edgeR6 = RefManageR::BibEntry('inbook', key = 'edgeR6', author = 'Chen, Yunshun and Lun, Aaron T. L. and Smyth, Gordon K.', title = 'Differential expression analysis of complex RNA-seq experiments using edgeR', booktitle = 'Statistical Analysis of Next Generation Sequencing Data', year = 2014, editor = 'Datta, Somnath and Nettleton, Dan', publisher = 'Springer', location = 'New York', pages = '51-74'), DEFormats = citation('DEFormats') ) write.bibtex(bibs, file = 'regionReportRef.bib') bib <- read.bibtex('regionReportRef.bib') ## Assign short names names(bib) <- names(bibs) ``` `r Biocpkg('regionReport')` `r citep(bib[['regionReport']])` creates HTML or PDF reports for a set of genomic regions such as `r Biocpkg('derfinder')` `r citep(bib[['derfinder']])` results or for feature-level analyses performed with `r Biocpkg('DESeq2')` `r citep(bib[['DESeq2']])` or `r Biocpkg('edgeR')` `r citep(bib[[c('edgeR1', 'edgeR2', 'edgeR5', 'edgeR6')]])`. The HTML reports are styled with `r CRANpkg('rmarkdown')` `r citep(bib[['rmarkdown']])` by default but can optionally be styled with `r CRANpkg('knitrBootstrap')` `r citep(bib[['knitrBootstrap']])`. This package includes a basic exploration for a general set of genomic regions which can be easily customized to include the appropriate conclusions and/or further exploration of the results. Such a report can be generated using `renderReport()`. `r Biocpkg('regionReport')` has a separate template for running a basic exploration analysis of `r Biocpkg('derfinder')` results by using `derfinderReport()`. `derfinderReport()` is specific to single base-level approach `r Biocpkg('derfinder')` results. A third template is included for exploring `r Biocpkg('DESeq2')` or `r Biocpkg('edgeR')` differential expression results. All reports are written in [R Markdown](http://rmarkdown.rstudio.com/) format and include all the code for making the plots and explorations in the report itself. For all templates, `r Biocpkg('regionReport')` relies on `r CRANpkg('knitr')` `r citep(bib[['knitr']])`, `r CRANpkg('rmarkdown')` `r citep(bib[['rmarkdown']])`, `DT` `r citep(bib[['DT']])` and optionally `r CRANpkg('knitrBootstrap')` `r citep(bib[['knitrBootstrap']])` for generating the report. The reports can be either in HTML or PDF format and can be easily customized. # Using `r Biocpkg('regionReport')` for `r Biocpkg('DESeq2')` results The plots in `r Biocpkg('regionReport')` for exploring `r Biocpkg('DESeq2')` are powered by `r CRANpkg('ggplot2')` `r citep(bib[['ggplot2']])` and `r CRANpkg('pheatmap')` `r citep(bib[['pheatmap']])`. ## Examples The `r Biocpkg('regionReport')` supplementary website [regionReportSupp](http://leekgroup.github.io/regionReportSupp/) has examples of using `r Biocpkg('regionReport')` with `r Biocpkg('DESeq2')` results. In particular, please look at [DESeq2.html](http://leekgroup.github.io/regionReportSupp/DESeq2.html) which has the code for generating some `r Biocpkg('DESeq2')` results based on the `r Biocpkg('DESeq2')` vignette. Then it uses those results to create HTML and PDF versions of the same report. The resulting reports are available in the following locations: * [HTML version](http://leekgroup.github.io/regionReportSupp/DESeq2-example/index.html) * [PDF version](http://leekgroup.github.io/regionReportSupp/DESeq2-example/DESeq2Report.pdf) Note that in both examples we changed the `r CRANpkg('ggplot2')` theme to `theme_bw()`. Also, in the PDF version we used the option `device = 'pdf'` instead of the default `device = 'png'` in `DESeq2Report()` since PDF figures are more appropriate for PDF reports: they look better than PNG figures. If you want to create a similar HTML report as the one linked in this section, simply run `example('DESeq2Report', 'regionReport', ask=FALSE)`. The only difference will be the `r CRANpkg('ggplot2')` theme for the plots. # Using `r Biocpkg('regionReport')` for `r Biocpkg('edgeR')` results `r Biocpkg('regionReport')` has the `edgeReport()` function that takes as input a `DGEList` object and the results from the differential expression analysis using `r Biocpkg('edgeR')`. `edgeReport()` internally uses `r Biocpkg('DEFormats')` to convert the results to `r Biocpkg('DESeq2')`'s format and then uses `DESeqReport()` to create the final report. The report looks nearly the same whether you performed the differential expression analysis with `r Biocpkg('DESeq2')` or `r Biocpkg('edgeR')` in order to make more homogenous the exploratory data analysis step. ## Examples The `r Biocpkg('regionReport')` supplementary website [regionReportSupp](http://leekgroup.github.io/regionReportSupp/) has examples of using `r Biocpkg('regionReport')` with `r Biocpkg('edgeR')` results. In particular, please look at [edgeR.html](http://leekgroup.github.io/regionReportSupp/edgeR.html) which has the code for generating some random data with `r Biocpkg('DEFormats')` and performing the differential expression analysis with `r Biocpkg('edgeR')`. Then it uses those results to create HTML and PDF versions of the same report. The resulting reports are available in the following locations: * [HTML version](http://leekgroup.github.io/regionReportSupp/edgeR-example/index.html) * [PDF version](http://leekgroup.github.io/regionReportSupp/edgeR-example/edgeReport.pdf) Note that in both examples we changed the `r CRANpkg('ggplot2')` theme to `theme_linedraw()`. Also, in the PDF version we used the option `device = 'pdf'` instead of the default `device = 'png'` in `edgeReport()` since PDF figures are more appropriate for PDF reports: they look better than PNG figures. If you want to create a similar HTML report as the one linked in this section, simply run `example('edgeReport', 'regionReport', ask=FALSE)`. The only difference will be the `r CRANpkg('ggplot2')` theme for the plots and the amount of data simulated with `r Biocpkg('DEFormats')`. # Using `r Biocpkg('regionReport')` for region results The plots in `r Biocpkg('regionReport')` for region reports are powered by `r Biocpkg('derfinderPlot')` `r citep(bib[['derfinderPlot']])`, `r Biocpkg('ggbio')` `r citep(bib[['ggbio']])`, and `r CRANpkg('ggplot2')` `r citep(bib[['ggplot2']])`. ## Examples The `r Biocpkg('regionReport')` supplementary website [regionReportSupp](http://leekgroup.github.io/regionReportSupp/) has examples of using `r Biocpkg('regionReport')` with results from `r Biocpkg('DiffBind')` and `r Biocpkg('derfinder')`. Included as a vignette, this package also has an example using a small data set derived from `r Biocpkg('bumphunter')`. These represent different uses of `r Biocpkg('regionReport')` for results from ChIP-seq, methylation, and RNA-seq data. In particular, the `r Biocpkg('DiffBind')` example illustrates how to expand a basic report created with `renderReport()`. ## General case For a general use case, you first have to identify a set of genomic regions of interest and store it as a `GRanges` object. In a typical workflow you will have some variables measured for each of the regions, such as p-values and scores. `renderReport()` uses the set of regions and three main arguments: * `pvalueVars`: this is a character vector (named optionally) with the names of the variables that are bound between 0 and 1, such as p-values. For each of these variables, `renderReport()` explores the distribution by chromosome, the overall distribution, and makes a table with commonly used cutoffs. * `densityVars`: is another character vector (named optionally) with another set of variables you wish to explore by making density graphs. This is commonly used for scores and other similar numerical variables. * `significantVar`: is a logical vector separating the regions into by whether they are statistically significant. For example, this information is used to explore the width of all the regions and compare it the significant ones. Other parameters control the name of the report, where it'll be located, the transcripts database used to annotate the nearest genes, graphical parameters, etc. Here is a short example of how to use `renderReport()`. Note that we are using regions produced by `r Biocpkg('derfinder')` just for convenience sake. You can also run this example by using `example('renderReport', 'regionReport', ask=FALSE)`. ```{r, eval = FALSE} ## Load derfinder library('derfinder') regions <- genomeRegions$regions ## Assign chr length library('GenomicRanges') seqlengths(regions) <- c('chr21' = 48129895) ## The output will be saved in the 'derfinderReport-example' directory dir.create('renderReport-example', showWarnings = FALSE, recursive = TRUE) ## Generate the HTML report report <- renderReport(regions, 'Example run', pvalueVars = c( 'Q-values' = 'qvalues', 'P-values' = 'pvalues'), densityVars = c( 'Area' = 'area', 'Mean coverage' = 'meanCoverage'), significantVar = regions$qvalues <= 0.05, nBestRegions = 20, outdir = 'renderReport-example') ``` For `r Biocpkg('derfinder')` results created via the expressed regions-level approach you can use `renderReport()` to explore the results. If you use `r Biocpkg('DESeq2')` to perform the differential expression analysis of the expressed regions you can then use `DESeq2Report()`. ## `r Biocpkg('derfinder')` single base-level case ### Run `r Biocpkg('derfinder')` Prior to using `regionReport::derfinderReport()` you must use `r Biocpkg('derfinder')` to analyze a specific data set. While there are many ways to do so, we recommend using __analyzeChr()__ with the same _prefix_ argument. Then merging the results with __mergeResults()__. This is the recommended pipeline for the single base-level approach. Below, we run `r Biocpkg('derfinder')` for the example data included in the package. The steps are: 1. Load derfinder 1. Create a directory where we'll store the results 1. Generate the pre-requisites for the models to use with the example data 1. Generate the statistical models 1. Analyze the example data for chr21 1. Merge the results (only one chr in this case, but in practice there'll be more) ```{r loadDerfinder} ## Load derfinder library('derfinder') ## The output will be saved in the 'report' directory dir.create('report', showWarnings = FALSE, recursive = TRUE) ``` The following code runs `r Biocpkg('derfinder')`. ```{r runDerfinderFake, eval=FALSE} ## Save the current path initialPath <- getwd() setwd(file.path(initialPath, 'report')) ## Generate output from derfinder ## Collapse the coverage information collapsedFull <- collapseFullCoverage(list(genomeData$coverage), verbose=TRUE) ## Calculate library size adjustments sampleDepths <- sampleDepth(collapsedFull, probs=c(0.5), nonzero=TRUE, verbose=TRUE) ## Build the models group <- genomeInfo$pop adjustvars <- data.frame(genomeInfo$gender) models <- makeModels(sampleDepths, testvars=group, adjustvars=adjustvars) ## Analyze chromosome 21 analysis <- analyzeChr(chr='21', coverageInfo=genomeData, models=models, cutoffFstat=1, cutoffType='manual', seeds=20140330, groupInfo=group, mc.cores=1, writeOutput=TRUE, returnOutput=TRUE) ## Save the stats options for later optionsStats <- analysis$optionsStats ## Change the directory back to the original one setwd(initialPath) ``` For convenience, we have included the `r Biocpkg('derfinder')` results as part of `r Biocpkg('regionReport')`. Note that the above functions are routinely checked as part of `r Biocpkg('derfinder')`. ```{r runDerfinderReal} ## Copy previous results file.copy(system.file(file.path('extdata', 'chr21'), package='derfinder', mustWork=TRUE), 'report', recursive=TRUE) ``` Next, proceed to merging the results. ```{r mergeResults} ## Merge the results from the different chromosomes. In this case, there's ## only one: chr21 mergeResults(chrs = 'chr21', prefix = 'report', genomicState = genomicState$fullGenome) ## Load optionsStats load(file.path('report', 'chr21', 'optionsStats.Rdata')) ``` ### Create report Once the `r Biocpkg('derfinder')` output has been generated and merged, use __derfinderReport()__ to create the HTML report. ```{r loadLib, message=FALSE} ## Load derfindeReport library('regionReport') ``` ```{r createReport} ## Generate the HTML report report <- derfinderReport(prefix='report', browse=FALSE, nBestRegions=15, makeBestClusters=TRUE, outdir='html', fullCov=list('21'=genomeDataRaw$coverage), optionsStats=optionsStats) ``` Once the output is generated, you can browse the report from `R` using __browseURL()__ as shown below. ```{r vignetteBrowse, eval=FALSE} ## Browse the report browseURL(report) ``` ### Notes Note that the reports require an active Internet connection to render correctly. The report is self-explanatory and will change some of the text depending on the input options. If the report is taking too long to compile (say more than 3 hours), you might want to consider setting _nBestCluters_ to a small number or even set _makeBestClusters_ to `FALSE`. # Advanced arguments If you are interested in using the advanced arguments, use `derfinder::advancedArg()` as shown below: ```{r 'advancedArg'} ## URLs to advanced arguemtns derfinder::advancedArg('derfinderReport', package = 'regionReport', browse = FALSE) ## Set browse = TRUE if you want to open them in your browser ``` In particular, you might be interested in specifying the `output_format` argument in either `renderReport()` or `derfinderReport()`. For example, setting `output_format = 'pdf_document'` will generate a PDF file instead. However, you will lose interactivity for toggling hiding/showing code and the tables will be static. # Reproducibility This package was made possible thanks to: * R `r citep(bib[['R']])` * `r Biocpkg('BiocStyle')` `r citep(bib[['BiocStyle']])` * `r Biocpkg('DEFormats')` `r citep(bib[['DEFormats']])` * `r Biocpkg('derfinder')` `r citep(bib[['derfinder']])` * `r Biocpkg('derfinderPlot')` `r citep(bib[['derfinderPlot']])` * `r Biocpkg('DESeq2')` `r citep(bib[['DESeq2']])` * `r CRANpkg('devtools')` `r citep(bib[['devtools']])` * `r CRANpkg('DT')` `r citep(bib[['DT']])` * `r Biocpkg('edgeR')` `r citep(bib[[c('edgeR1', 'edgeR2', 'edgeR5', 'edgeR6')]])` * `r Biocpkg('GenomeInfoDb')` `r citep(bib[['GenomeInfoDb']])` * `r Biocpkg('GenomicRanges')` `r citep(bib[['GenomicRanges']])` * `r Biocpkg('ggbio')` `r citep(bib[['ggbio']])` * `r CRANpkg('ggplot2')` `r citep(bib[['ggplot2']])` * `r CRANpkg('grid')` `r citep(bib[['grid']])` * `r CRANpkg('gridExtra')` `r citep(bib[['gridExtra']])` * `r Biocpkg('IRanges')` `r citep(bib[['IRanges']])` * `r CRANpkg('knitcitations')` `r citep(bib[['knitcitations']])` * `r CRANpkg('knitr')` `r citep(bib[['knitr']])` * `r CRANpkg('knitrBootstrap')` `r citep(bib[['knitrBootstrap']])` * `r CRANpkg('mgcv')` `r citep(bib[['mgcv']])` * `r CRANpkg('pheatmap')` `r citep(bib[['pheatmap']])` * `r CRANpkg('RColorBrewer')` `r citep(bib[['RColorBrewer']])` * `r CRANpkg('rmarkdown')` `r citep(bib[['rmarkdown']])` * `r Biocpkg('biovizBase')` `r citep(bib[['biovizBase']])` * `r CRANpkg('Cairo')` `r citep(bib[['Cairo']])` * `r Biocannopkg('TxDb.Hsapiens.UCSC.hg19.knownGene')` `r citep(bib[['TxDb.Hsapiens.UCSC.hg19.knownGene']])` * `r CRANpkg('whisker')` `r citep(bib[['whisker']])` * `r Biocpkg('bumphunter')` `r citep(bib[['bumphunter']])` Code for creating the vignette ```{r createVignette, eval=FALSE} ## Create the vignette library('rmarkdown') system.time(render('regionReport.Rmd', 'BiocStyle::html_document')) ## Extract the R code library('knitr') knit('regionReport.Rmd', tangle = TRUE) ``` ```{r createVignette2} ## Clean up file.remove('regionReportRef.bib') unlink('report', recursive = TRUE) ``` Date the vignette was generated. ```{r vignetteReproducibility1, echo=FALSE} ## Date the report was generated Sys.time() ``` Wallclock time spent generating the vignette. ```{r vignetteReproducibility2, echo=FALSE} ## Processing time in seconds totalTimeVignette <- diff(c(startTimeVignette, Sys.time())) round(totalTimeVignette, digits=3) ``` `R` session information. ```{r vignetteReproducibility3, echo=FALSE} ## Session info library('devtools') options(width = 120) session_info() ``` # Bibliography This vignette was generated using `r CRANpkg('BiocStyle')` `r citep(bib[['BiocStyle']])` with `r CRANpkg('knitr')` `r citep(bib[['knitr']])` and `r CRANpkg('rmarkdown')` `r citep(bib[['rmarkdown']])` running behind the scenes. Citations made with `r CRANpkg('knitcitations')` `r citep(bib[[1]])`. ```{r vignetteBiblio, results='asis', echo=FALSE, warning = FALSE} ## Print bibliography bibliography() ```