--- title: "Generic Workflow Template" author: "Author: FirstName LastName" date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" output: BiocStyle::html_document: toc_float: true code_folding: show BiocStyle::pdf_document: default package: systemPipeR vignette: | %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{WF: Basic Generic Template} %\VignetteEngine{knitr::rmarkdown} fontsize: 14pt bibliography: bibtex.bib --- ```{css, echo=FALSE} pre code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; } ``` ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() options(width=60, max.print=1000) knitr::opts_chunk$set( eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")), cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), tidy.opts=list(width.cutoff=60), tidy=TRUE) ``` ```{r setup, echo=FALSE, message=FALSE, warning=FALSE, eval=FALSE} suppressPackageStartupMessages({ library(systemPipeR) }) ``` # Workflow environment This is the _Generic_ workflow template of the [systemPipeRdata](https://bioconductor.org/packages/devel/data/experiment/html/systemPipeRdata.html) package, a companion package to [systemPipeR](https://www.bioconductor.org/packages/devel/bioc/html/systemPipeR.html) [@H_Backman2016-bt]. Like other workflow templates, it can be loaded with a single command. Once loaded, users have the flexibility to utilize the templates as they are or modify them as needed. More in-depth information can be found in the main vignette of [systemPipeRdata](https://bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRdata.html). The _Generic_ template presented here is special that it provides a workflow skelleton intended to be used as a starting point for building new workflows. Basic workflow steps are included to illustrate how to design command-line (CL) and R-based workflow steps, as well as R Markdown code chunks that are not part of a workflow. __Note__, the details about contructing workflow steps are explained in the [Detailed Tutorial](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial) section of `systemPipeR's` main vignette that uses the same workflow steps as the _Generic_ workflow template. The `Generic` workflow template includes the following four data processing steps. 1. R step: export tabular data to files 2. CL step: compress files 3. CL step: uncompress files 4. R step: import files and plot summary statistics The topology graph of this workflow template is shown in Figure 1. ```{r spblast-toplogy, eval=TRUE, warning= FALSE, echo=FALSE, out.width="100%", fig.align = "center", fig.cap= "Topology graph of this workflow template.", warning=FALSE} knitr::include_graphics("results/plotwf_new.png") ``` ## Create workflow environment The environment of the chosen workflow is generated with the `genWorenvir` function. After this, the user’s R session needs to be directed into the resulting directory (here `new`). ```{r genNew_wf, eval=FALSE} systemPipeRdata::genWorkenvir(workflow = "new", mydirname = "new") setwd("new") ``` The `SPRproject` function initializes a new workflow project instance. This function call creates a an empty `SAL` workflow container and at the same time a linked project log directory (default name `.SPRproject`) that acts as a flat-file database of a workflow. For additional details, please visit this [section](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial) in `systemPipeR's` main vignette. ```{r create_workflow, message=FALSE, eval=FALSE} library(systemPipeR) sal <- SPRproject() sal ``` ## Construct workflow This section illustrates how to load the following five workflow steps into a `SAL` workflow container (`SYSargsList`) first one-by-one in interactive mode (see [here](#stepwise)) or with the `importWF` command (see [here](#importwf)), and then run the workflow with the `runWF` command. ### Step 1: Load packages {#stepwise} Next, the `systemPipeR` package needs to be loaded in a workflow. ```{r load_library, eval=FALSE, spr=TRUE} appendStep(sal) <- LineWise( code = { library(systemPipeR) }, step_name = "load_library" ) ``` After adding the R code, sal contains now one workflow step. ```{r view_sal, message=FALSE, eval=FALSE} sal ``` ### Step 2: Export tabular data to files This is the first data processing step. In this case it is an R step that uses the `LineWise` function to define the workflow step, and appends it to the `SAL` workflow container. ```{r export_iris, eval=FALSE, spr=TRUE} appendStep(sal) <- LineWise(code={ mapply( FUN = function(x, y) write.csv(x, y), x = split(iris, factor(iris$Species)), y = file.path("results", paste0(names(split(iris, factor(iris$Species))), ".csv")) ) }, step_name = "export_iris", dependency = "load_library" ) ``` ### Step 3: Compress data The following adds a CL step that uses the `gzip` software to compress the files that were generated in the previous step. ```{r gzip, eval=FALSE, spr=TRUE, spr.dep=TRUE} targetspath <- system.file("extdata/cwl/gunzip", "targets_gunzip.txt", package = "systemPipeR") appendStep(sal) <- SYSargsList( targets = targetspath, dir = TRUE, wf_file = "gunzip/workflow_gzip.cwl", input_file = "gunzip/gzip.yml", dir_path = "param/cwl", inputvars = c(FileName = "_FILE_PATH_", SampleName = "_SampleName_"), step_name = "gzip", dependency = "export_iris" ) ``` ### Step 4: Uncompress data Next, the output files (here compressed `gz` files), that were generated by the previous `gzip` step, will be uncompressed in the current step with the `gunzip` software. ```{r gunzip, eval=FALSE, spr=TRUE} appendStep(sal) <- SYSargsList( targets = "gzip", dir = TRUE, wf_file = "gunzip/workflow_gunzip.cwl", input_file = "gunzip/gunzip.yml", dir_path = "param/cwl", inputvars = c(gzip_file = "_FILE_PATH_", SampleName = "_SampleName_"), rm_targets_col = "FileName", step_name = "gunzip", dependency = "gzip" ) ``` ### Step 5: Import tabular files and visualize data Imports the tabular files from the previous step back into R, performs some summary statistics and plots the results as bar diagrams. ```{r stats, eval=FALSE, spr=TRUE} appendStep(sal) <- LineWise(code={ # combine all files into one data frame df <- lapply(getColumn(sal, step="gunzip", 'outfiles'), function(x) read.delim(x, sep=",")[-1]) df <- do.call(rbind, df) # calculate mean and sd for each species stats <- data.frame(cbind(mean=apply(df[,1:4], 2, mean), sd=apply(df[,1:4], 2, sd))) stats$species <- rownames(stats) # plot plot <- ggplot2::ggplot(stats, ggplot2::aes(x=species, y=mean, fill=species)) + ggplot2::geom_bar(stat = "identity", color="black", position=ggplot2::position_dodge()) + ggplot2::geom_errorbar( ggplot2::aes(ymin=mean-sd, ymax=mean+sd), width=.2, position=ggplot2::position_dodge(.9) ) plot }, step_name = "stats", dependency = "gunzip", run_step = "optional" ) ``` ### Version Information ```{r sessionInfo, eval=FALSE, spr=TRUE} appendStep(sal) <- LineWise( code = { sessionInfo() }, step_name = "sessionInfo", dependency = "stats") ``` # Automated routine {#importwf} Once the above steps have been loaded into `sal`, the workflow can be executed from start to finish (or partially) with the `runWF` command. Subsequently, scientific and technical workflow reports can be generated with the `renderReport` and `renderLogs` functions, respectively. The following code section also demonstrates how the above workflow steps can be imported with the `importWF` function from the associated `Rmd` workflow script (here `new.Rmd`). Constructing workflow instances with this automated approach is usually preferred since it is much more convenient and reliable compared to the manual approach described earlier. __Note:__ To demonstrate the 'systemPipeR's' automation routines without regenerating a new workflow environment from scratch, the first line below uses the `overwrite=TRUE` option of the `SPRproject` function. This option is generally discouraged as it erases the existing workflow project and `sal` container. For information on resuming and restarting workflow runs, users want to consult the relevant section of the main vignette (see [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#10_Restarting_and_resetting_workflows).) ```{r , import_run_routine, eval=FALSE} sal <- SPRproject(overwrite = TRUE) # Avoid 'overwrite=TRUE' in real runs. sal <- importWF(sal, file_path = "new.Rmd") # Imports above steps from new.Rmd. sal <- runWF(sal) # Runs workflow. plotWF(sal) # Plots workflow topology graph sal <- renderReport(sal) # Renders scientific report. sal <- renderLogs(sal) # Renders technical report from log files. ``` ## CL tools used The `listCmdTools` (and `listCmdModules`) return the CL tools that are used by a workflow. To include a CL tool list in a workflow report, one can use the following code. Additional details on this topic can be found in the main vignette [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#111_Accessor_methods). ```{r list_tools} if(file.exists(file.path(".SPRproject", "SYSargsList.yml"))) { local({ sal <- systemPipeR::SPRproject(resume = TRUE) systemPipeR::listCmdTools(sal) systemPipeR::listCmdModules(sal) }) } else { cat(crayon::blue$bold("Tools and modules required by this workflow are:\n")) cat(c("gzip", "gunzip"), sep = "\n") } ``` ## Session Info This is the session information that will be included when rendering this report. ```{r report_session_info, eval=TRUE} sessionInfo() ``` # References