Contents

Note: the most recent version of this vignette can be found here and a short overview slide show here.

1 Introduction

systemPipeRdata is a helper package to generate with a single command NGS workflow templates that are intended to be used by its parent package systemPipeR (Girke 2014). The latter is an environment for building end-to-end analysis pipelines with automated report generation for next generation sequence (NGS) applications such as RNA-Seq, Ribo-Seq, ChIP-Seq, VAR-Seq and many others. The directory structure of the workflow templates and the sample data used by systemPipeRdata are described here.

Back to Table of Contents

2 Getting Started

2.1 Installation

The R software for using systemPipeRdata can be downloaded from CRAN. The systemPipeRdata package can be installed from within R as follows: zR`{r install, eval=FALSE} source(“http://bioconductor.org/biocLite.R”) # Sources the biocLite.R installation script biocLite(“tgirke/systemPipeRdata”, build_vignettes=TRUE, dependencies=TRUE) # Installs from github biocLite(“systemPipeRdata”) # Installs from Bioconductor once available there ```
Back to Table of Contents

2.2 Loading package and documentation

library("systemPipeRdata") # Loads the package
library(help="systemPipeRdata") # Lists package info
vignette("systemPipeRdata") # Opens vignette
Back to Table of Contents

2.3 Generate workflow template

Load one of the available NGS workflows into your current working directory. The following does this for the varseq template. The name of the resulting workflow directory can be specified under the mydirname argument. The default NULL uses the name of the chosen workflow. An error is issued if a directory of the same name and path exists already.

genWorkenvir(workflow="varseq", mydirname=NULL)
setwd("varseq")

On Linux and OS X systems the same can be achieved from the command-line of a terminal with the following commands.

$ Rscript -e "systemPipeRdata::genWorkenvir(workflow='varseq', mydirname=NULL)"

The workflow templates generated by genWorkenvir contain the following preconfigured directory structure:

workflow_name/            # *.Rnw/*.Rmd scripts and targets file
                param/    # parameter files for command-line software 
                data/     # inputs e.g. FASTQ, reference, annotations
                results/  # analysis result files
Back to Table of Contents

2.4 Run workflows

Next, run from within R the chosen sample workflow by executing the code provided in the corresponding *.Rnw template file. If preferred the corresponding *.Rmd or *.R versions can be used instead. Alternatively, one can run an entire workflow from start to finish with a single command by executing from the command-line 'make -B' within the workflow directory (here 'varseq'). Much more detailed information on running and customizing systemPipeR workflows is available in its overview vignette here. This vignette can also be opened from R with the following command.

library("systemPipeR") # Loads systemPipeR which needs to be installed via biocLite() from Bioconductor 
vignette("systemPipeR", package = "systemPipeR")
Back to Table of Contents

2.5 Return paths to sample data

The location of the sample data provided by systemPipeRdata can be returned as a list.

pathList()
## $targets
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/param/targets.txt"
## 
## $targetsPE
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/param/targetsPE.txt"
## 
## $annotationdir
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/annotation/"
## 
## $fastqdir
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/fastq/"
## 
## $bamdir
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/bam/"
## 
## $paramdir
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/param/"
## 
## $workflows
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/workflows/"
## 
## $chipseq
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/workflows/chipseq/"
## 
## $rnaseq
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/workflows/rnaseq/"
## 
## $riboseq
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/workflows/riboseq/"
## 
## $varseq
## [1] "/tmp/RtmpEaX81d/Rinst4bed77851ec6/systemPipeRdata/extdata/workflows/varseq/"
Back to Table of Contents

3 Version information

sessionInfo()
## R Under development (unstable) (2016-09-29 r71410)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] systemPipeRdata_1.5.0      systemPipeR_1.9.1          ShortRead_1.33.0          
##  [4] GenomicAlignments_1.11.0   SummarizedExperiment_1.5.1 Biobase_2.35.0            
##  [7] BiocParallel_1.9.1         Rsamtools_1.27.2           Biostrings_2.43.0         
## [10] XVector_0.15.0             GenomicRanges_1.27.2       GenomeInfoDb_1.11.1       
## [13] IRanges_2.9.1              S4Vectors_0.13.1           BiocGenerics_0.21.0       
## [16] BiocStyle_2.3.3           
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.7            locfit_1.5-9.1         lattice_0.20-34        GO.db_3.4.0           
##  [5] assertthat_0.1         digest_0.6.10          plyr_1.8.4             BatchJobs_1.6         
##  [9] backports_1.0.4        RSQLite_1.0.0          evaluate_0.10          ggplot2_2.1.0         
## [13] zlibbioc_1.21.0        GenomicFeatures_1.27.0 annotate_1.53.0        Matrix_1.2-7.1        
## [17] checkmate_1.8.1        rmarkdown_1.1          GOstats_2.41.0         splines_3.4.0         
## [21] stringr_1.1.0          pheatmap_1.0.8         RCurl_1.95-4.8         biomaRt_2.31.1        
## [25] munsell_0.4.3          sendmailR_1.2-1        rtracklayer_1.35.1     base64enc_0.1-3       
## [29] BBmisc_1.10            htmltools_0.3.5        fail_1.3               tibble_1.2            
## [33] edgeR_3.17.1           codetools_0.2-15       XML_3.98-1.4           AnnotationForge_1.17.1
## [37] bitops_1.0-6           grid_3.4.0             RBGL_1.51.0            xtable_1.8-2          
## [41] GSEABase_1.37.0        gtable_0.2.0           DBI_0.5-1              magrittr_1.5          
## [45] formatR_1.4            scales_0.4.0           graph_1.53.0           stringi_1.1.2         
## [49] hwriter_1.3.2          genefilter_1.57.0      limma_3.31.2           latticeExtra_0.6-28   
## [53] brew_1.0-6             rjson_0.2.15           RColorBrewer_1.1-2     tools_3.4.0           
## [57] Category_2.41.0        survival_2.39-5        yaml_2.1.13            AnnotationDbi_1.37.0  
## [61] colorspace_1.2-7       knitr_1.14
Back to Table of Contents

4 Funding

This project was supported by funds from the National Institutes of Health (NIH) and the National Science Foundation (NSF).

Back to Table of Contents

References

Girke, Thomas. 2014. “systemPipeR: NGS Workflow and Report Generation Environment.” UC Riverside. https://github.com/tgirke/systemPipeR.