---
title: "Generic Workflow Template" 
author: "Author: FirstName LastName"
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" 
output:
  BiocStyle::html_document:
    toc_float: true
    code_folding: show
  BiocStyle::pdf_document: default
package: systemPipeR
vignette: |
  %\VignetteEncoding{UTF-8}
  %\VignetteIndexEntry{WF: Basic Generic Template}
  %\VignetteEngine{knitr::rmarkdown}
fontsize: 14pt
bibliography: bibtex.bib
---

```{css, echo=FALSE}
pre code {
white-space: pre !important;
overflow-x: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
}
```

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
options(width=60, max.print=1000)
knitr::opts_chunk$set(
    eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")),
    cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), 
    tidy.opts=list(width.cutoff=60), tidy=TRUE)
```

```{r setup, echo=FALSE, message=FALSE, warning=FALSE, eval=FALSE}
suppressPackageStartupMessages({
    library(systemPipeR)
})
```

# Workflow environment

This is the _Generic_ workflow template of the 
[systemPipeRdata](https://bioconductor.org/packages/devel/data/experiment/html/systemPipeRdata.html) package, 
a companion package to [systemPipeR](https://www.bioconductor.org/packages/devel/bioc/html/systemPipeR.html) [@H_Backman2016-bt]. 
Like other workflow templates, it can be loaded with a single command. Once loaded, users have the
flexibility to utilize the templates as they are or modify them as needed. More
in-depth information can be found in the main vignette of [systemPipeRdata](https://bioconductor.org/packages/devel/data/experiment/vignettes/systemPipeRdata/inst/doc/systemPipeRdata.html). The
_Generic_ template presented here is special that it provides a workflow
skelleton intended to be used as a starting point for building new workflows.
Basic workflow steps are included to illustrate how to design command-line (CL)
and R-based workflow steps, as well as R Markdown code chunks that are not part
of a workflow. __Note__, the details about contructing workflow steps are explained in the 
[Detailed Tutorial](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial) section 
of `systemPipeR's` main vignette that uses the same workflow steps as the _Generic_ workflow template.

The `Generic` workflow template includes the following four data processing steps.

1. R step: export tabular data to files 
2. CL step: compress files
3. CL step: uncompress files 
4. R step: import files and plot summary statistics

The topology graph of this workflow template is shown in Figure 1.

```{r spblast-toplogy, eval=TRUE, warning= FALSE, echo=FALSE, out.width="100%", fig.align = "center", fig.cap= "Topology graph of this workflow template.", warning=FALSE}
knitr::include_graphics("results/plotwf_new.png")
```

## Create workflow environment

The environment of the chosen workflow is generated with the `genWorenvir` 
function. After this, the user’s R session needs to be directed into the resulting directory
(here `new`).

```{r genNew_wf, eval=FALSE}
systemPipeRdata::genWorkenvir(workflow = "new", mydirname = "new")
setwd("new")
```

The `SPRproject` function initializes a new workflow project instance. This function
call creates a an empty `SAL` workflow container and at the same time a
linked project log directory (default name `.SPRproject`) that acts as a flat-file 
database of a workflow. For additional details, please visit this
[section](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#5_Detailed_tutorial)
in `systemPipeR's` main vignette.  

```{r create_workflow, message=FALSE, eval=FALSE}
library(systemPipeR)
sal <- SPRproject()
sal
```

## Construct workflow

This section illustrates how to load the following five workflow steps into a
`SAL` workflow container (`SYSargsList`) first one-by-one in interactive mode
(see [here](#stepwise)) or with the `importWF` command (see [here](#importwf)),
and then run the workflow with the `runWF` command. 


### Step 1: Load packages {#stepwise}

Next, the `systemPipeR` package needs to be loaded in a workflow. 

```{r load_library, eval=FALSE, spr=TRUE}
appendStep(sal) <- LineWise(
    code = {
    library(systemPipeR)
    }, 
    step_name = "load_library"
)
```

After adding the R code, sal contains now one workflow step.

```{r view_sal, message=FALSE, eval=FALSE}
sal
```

### Step 2: Export tabular data to files

This is the first data processing step. In this case it is an R step that uses the `LineWise` 
function to define the workflow step, and appends it to the `SAL` workflow container.

```{r export_iris, eval=FALSE, spr=TRUE}
appendStep(sal) <- LineWise(code={
    mapply(
      FUN = function(x, y) write.csv(x, y),
      x = split(iris, factor(iris$Species)),
      y = file.path("results", paste0(names(split(iris, factor(iris$Species))), ".csv"))
    )
    }, 
  step_name = "export_iris", 
  dependency = "load_library"
)
```

### Step 3: Compress data

The following adds a CL step that uses the `gzip` software to compress the files that were 
generated in the previous step.

```{r gzip, eval=FALSE, spr=TRUE, spr.dep=TRUE}
targetspath <- system.file("extdata/cwl/gunzip", "targets_gunzip.txt", package = "systemPipeR")
appendStep(sal) <- SYSargsList(
    targets = targetspath, dir = TRUE,
    wf_file = "gunzip/workflow_gzip.cwl", input_file = "gunzip/gzip.yml",
    dir_path = "param/cwl",
    inputvars = c(FileName = "_FILE_PATH_", SampleName = "_SampleName_"), 
    step_name = "gzip", 
    dependency = "export_iris"
)
```

### Step 4: Uncompress data

Next, the output files (here compressed `gz` files), that were generated by the
previous `gzip` step, will be uncompressed in the current step with the `gunzip`
software. 

```{r gunzip, eval=FALSE, spr=TRUE}
appendStep(sal) <- SYSargsList(
    targets = "gzip", dir = TRUE,
    wf_file = "gunzip/workflow_gunzip.cwl", input_file = "gunzip/gunzip.yml",
    dir_path = "param/cwl",
    inputvars = c(gzip_file = "_FILE_PATH_", SampleName = "_SampleName_"), 
    rm_targets_col = "FileName", 
    step_name = "gunzip", 
    dependency = "gzip"
)
```

### Step 5: Import tabular files and visualize data

Imports the tabular files from the previous step back into R, performs some summary 
statistics and plots the results as bar diagrams.

```{r stats, eval=FALSE, spr=TRUE}
appendStep(sal) <- LineWise(code={
    # combine all files into one data frame
    df <- lapply(getColumn(sal, step="gunzip", 'outfiles'), function(x) read.delim(x, sep=",")[-1])
    df <- do.call(rbind, df)
    # calculate mean and sd for each species
    stats <- data.frame(cbind(mean=apply(df[,1:4], 2, mean), sd=apply(df[,1:4], 2, sd)))
    stats$species <- rownames(stats)
    # plot
    plot <- ggplot2::ggplot(stats, ggplot2::aes(x=species, y=mean, fill=species)) + 
        ggplot2::geom_bar(stat = "identity", color="black", position=ggplot2::position_dodge()) +
        ggplot2::geom_errorbar(
            ggplot2::aes(ymin=mean-sd, ymax=mean+sd), 
            width=.2,
            position=ggplot2::position_dodge(.9)
        )
    plot
    }, 
    step_name = "stats", 
    dependency = "gunzip", 
    run_step = "optional"
)
```

### Version Information

```{r sessionInfo, eval=FALSE, spr=TRUE}
appendStep(sal) <- LineWise(
    code = {
    sessionInfo()
    }, 
    step_name = "sessionInfo", 
    dependency = "stats")
```

# Automated routine {#importwf}

Once the above steps have been loaded into `sal`, the workflow can be executed from start to
finish (or partially) with the `runWF` command. Subsequently, scientific and technical workflow 
reports can be generated with the `renderReport`  and `renderLogs` functions, respectively.

The following code section also demonstrates how the above workflow steps can be imported with 
the `importWF` function from the associated `Rmd` workflow script (here `new.Rmd`). Constructing 
workflow instances with this automated approach is usually preferred since it is much more convenient 
and reliable compared to the manual approach described earlier. 

__Note:__ To demonstrate the 'systemPipeR's' automation routines without regenerating a new workflow 
environment from scratch, the first line below uses the `overwrite=TRUE` option of the `SPRproject` function. 
This option is generally discouraged as it erases the existing workflow project and `sal` container. 
For information on resuming and restarting workflow runs, users want to consult the relevant section of 
the main vignette (see [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#10_Restarting_and_resetting_workflows).)

```{r , import_run_routine, eval=FALSE}
sal <- SPRproject(overwrite = TRUE) # Avoid 'overwrite=TRUE' in real runs.
sal <- importWF(sal, file_path = "new.Rmd") # Imports above steps from new.Rmd.
sal <- runWF(sal) # Runs workflow.
plotWF(sal) # Plots workflow topology graph
sal <- renderReport(sal) # Renders scientific report.
sal <- renderLogs(sal) # Renders technical report from log files.
```

## CL tools used 
The `listCmdTools` (and `listCmdModules`) return the CL tools that 
are used by a workflow. To include a CL tool list in a workflow report, 
one can use the following code. Additional details on this topic 
can be found in the main vignette [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/systemPipeR/inst/doc/systemPipeR.html#111_Accessor_methods).

```{r list_tools}
if(file.exists(file.path(".SPRproject", "SYSargsList.yml"))) {
    local({
        sal <- systemPipeR::SPRproject(resume = TRUE)
        systemPipeR::listCmdTools(sal)
        systemPipeR::listCmdModules(sal)
    })
} else {
    cat(crayon::blue$bold("Tools and modules required by this workflow are:\n"))
    cat(c("gzip", "gunzip"), sep = "\n")
}
```

## Session Info

This is the session information that will be included when rendering this report. 

```{r report_session_info, eval=TRUE}
sessionInfo()
```

# References