---
title: "RNA-Seq Workflow Template" 
author: "Author: Daniela Cassol (danielac@ucr.edu) and Thomas Girke (thomas.girke@ucr.edu)"
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" 
output:
  BiocStyle::html_document:
    toc_float: true
    code_folding: show
  BiocStyle::pdf_document: default
package: systemPipeR
vignette: |
  %\VignetteIndexEntry{RNA-Seq Workflow Template}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
fontsize: 14pt
bibliography: bibtex.bib
---

```{css, echo=FALSE}
pre code {
white-space: pre !important;
overflow-x: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
}
```

<!--
- Compile from command-line
Rscript -e "rmarkdown::render('systemPipeRNAseq.Rmd', c('BiocStyle::html_document'), clean=F); knitr::knit('systemPipeRNAseq.Rmd', tangle=TRUE)"; Rscript ../md2jekyll.R systemPipeRNAseq.knit.md 12; Rscript -e "rmarkdown::render('systemPipeRNAseq.Rmd', c('BiocStyle::pdf_document'))"
-->

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
options(width=60, max.print=1000)
knitr::opts_chunk$set(
    eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")),
    cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), 
    tidy.opts=list(width.cutoff=60), tidy=TRUE)
```

```{r setup, echo=FALSE, messages=FALSE, warnings=FALSE}
suppressPackageStartupMessages({
    library(systemPipeR)
    library(BiocParallel)
    library(Biostrings)
    library(Rsamtools)
    library(GenomicRanges)
    library(ggplot2)
    library(GenomicAlignments)
    library(ShortRead)
    library(ape)
    library(batchtools)
})
```

# RNA-Seq Workflow

This workflow demonstrates how to use various utilities for building and running automated end-to-end analysis workflows for _`RNA-Seq`_ data. The full workflow can be found here:
[HTML](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.html), [.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.Rmd), and [.R](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.R).

## Loading package and workflow template

Load the _`RNA-Seq`_ sample workflow into your current working directory.

```{r genRna_workflow_single, eval=FALSE}
library(systemPipeRdata)
genWorkenvir(workflow="rnaseq")
setwd("rnaseq")
```

The working environment of the sample data loaded in the previous step contains the following preconfigured directory structure. Directory names are indicated in  <span style="color:grey">_**grey**_</span>. Users can change this structure as needed, but need to adjust the code in their workflows accordingly. 

* <span style="color:grey">_**rnaseq/**_</span> 
    + This is the directory of the R session running the workflow.
    + Run script ( _\*.Rmd_) and sample annotation (_targets.txt_) files are located here.
    + Note, this directory can have any name (_e.g._ <span style="color:grey">_**rnaseq**_</span>). Changing its name does not require any modifications in the run script(s).
    + Important subdirectories: 
        + <span style="color:grey">_**param/**_</span> 
            + Stores parameter files such as: _\*.param_, _\*.tmpl_ and _\*\_run.sh_.
        + <span style="color:grey">_**data/**_ </span>
            + FASTQ samples 
            + Reference FASTA file
            + Annotations
            + etc.
        + <span style="color:grey">_**results/**_</span>
            + Alignment, variant and peak files (BAM, VCF, BED) 
            + Tabular result files
            + Images and plots
            + etc.

The following parameter files are included in each workflow template: 

1. _`targets.txt`_: initial one provided by user; downstream _`targets_*.txt`_ files are generated automatically
2. _`*.param`_: defines parameter for input/output file operations, _e.g._ _`trim.param`_, _`bwa.param`_, _`hisat2.param`_, ...
3. _`*_run.sh`_: optional bash script, _e.g._: _`gatk_run.sh`_
4. Compute cluster environment (skip on single machine):
    + _`.batchtools.conf.R`_: defines type of scheduler for _`batchtools`_. Note, it is necessary to point the right template accordingly to the cluster in use.
    + _`*.tmpl`_: specifies parameters of scheduler used by a system, _e.g._ Torque, SGE, Slurm, etc.

## Run workflow

Next, run the chosen sample workflow _`systemPipeRNAseq`_ ([.Rmd](http://www.bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRNAseq.Rmd)) by executing from the command-line _`make -B`_ within the _`rnaseq`_ directory. Alternatively, one can run the code from the provided _`*.Rmd`_ template file from within R interactively. 

Workflow includes following steps:

1. Read preprocessing
    + Quality filtering (trimming)
    + FASTQ quality report
2. Alignments: _`HISAT2`_ (or any other RNA-Seq aligner)
3. Alignment stats 
4. Read counting 
5. Sample-wise correlation analysis
6. Analysis of differentially expressed genes (DEGs)
7. GO term enrichment analysis
8. Gene-wise clustering

# Version Information

```{r sessionInfo}
sessionInfo()
```

# Funding

This project was supported by funds from the National Institutes of Health (NIH).