--- title: "RNA-Seq Workflow Template" author: "Author: Daniela Cassol (danielac@ucr.edu) and Thomas Girke (thomas.girke@ucr.edu)" date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" output: BiocStyle::html_document: toc_float: true code_folding: show BiocStyle::pdf_document: default package: systemPipeR vignette: | %\VignetteIndexEntry{RNA-Seq Workflow Template} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} fontsize: 14pt bibliography: bibtex.bib --- ```{css, echo=FALSE} pre code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; } ``` ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() options(width=60, max.print=1000) knitr::opts_chunk$set( eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")), cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), tidy.opts=list(width.cutoff=60), tidy=TRUE) ``` ```{r setup, echo=FALSE, messages=FALSE, warnings=FALSE} suppressPackageStartupMessages({ library(systemPipeR) library(BiocParallel) library(Biostrings) library(Rsamtools) library(GenomicRanges) library(ggplot2) library(GenomicAlignments) library(ShortRead) library(ape) library(batchtools) }) ``` # RNA-Seq Workflow This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for _`RNA-Seq`_ data. The full workflow can be found here: [HTML](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.html), [.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.Rmd), and [.R](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.R). ## Loading package and workflow template Load the _`RNA-Seq`_ sample workflow into your current working directory. ```{r genRna_workflow_single, eval=FALSE} library(systemPipeRdata) genWorkenvir(workflow="rnaseq") setwd("rnaseq") ``` The working environment of the sample data loaded in the previous step contains the following preconfigured directory structure. Directory names are indicated in _**grey**_. Users can change this structure as needed, but need to adjust the code in their workflows accordingly. * _**rnaseq/**_ + This is the directory of the R session running the workflow. + Run script ( _\*.Rmd_) and sample annotation (_targets.txt_) files are located here. + Note, this directory can have any name (_e.g._ _**rnaseq**_). Changing its name does not require any modifications in the run script(s). + Important subdirectories: + _**param/**_ + Stores parameter files such as: _\*.param_, _\*.tmpl_ and _\*\_run.sh_. + _**data/**_ + FASTQ samples + Reference FASTA file + Annotations + etc. + _**results/**_ + Alignment, variant and peak files (BAM, VCF, BED) + Tabular result files + Images and plots + etc. The following parameter files are included in each workflow template: 1. _`targets.txt`_: initial one provided by user; downstream _`targets_*.txt`_ files are generated automatically 2. _`*.param`_: defines parameter for input/output file operations, _e.g._ _`trim.param`_, _`bwa.param`_, _`hisat2.param`_, ... 3. _`*_run.sh`_: optional bash script, _e.g._: _`gatk_run.sh`_ 4. Compute cluster environment (skip on single machine): + _`.batchtools.conf.R`_: defines type of scheduler for _`batchtools`_. Note, it is necessary to point the right template accordingly to the cluster in use. + _`*.tmpl`_: specifies parameters of scheduler used by a system, _e.g._ Torque, SGE, Slurm, etc. ## Run workflow Next, run the chosen sample workflow _`systemPipeRNAseq`_ ([.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.Rmd)) by executing from the command-line _`make -B`_ within the _`rnaseq`_ directory. Alternatively, one can run the code from the provided _`*.Rmd`_ template file from within R interactively. Workflow includes following steps: 1. Read preprocessing + Quality filtering (trimming) + FASTQ quality report 2. Alignments: _`HISAT2`_ (or any other RNA-Seq aligner) 3. Alignment stats 4. Read counting 5. Sample-wise correlation analysis 6. Analysis of differentially expressed genes (DEGs) 7. GO term enrichment analysis 8. Gene-wise clustering # Version Information ```{r sessionInfo} sessionInfo() ``` # Funding This project was supported by funds from the National Institutes of Health (NIH).