---
title: "Introduction to decoupleR"
author:
  - name: Jesús Vélez
    affiliation:
    - National Autonomous University of Mexico
    email: jvelezmagic@gmail.com
output: 
  BiocStyle::html_document:
    self_contained: true
    toc: true
    toc_float: true
    toc_depth: 3
    code_folding: show
package: "`r pkg_ver('decoupleR')`"
vignette: >
  %\VignetteIndexEntry{Introduction to decoupleR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r chunk_setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

```{r vignette_setup, echo=FALSE, message=FALSE, warning = FALSE}
# Track time spent on making the vignette.
start_time <- Sys.time()

# Bib setup.
library(RefManageR)

# Write bibliography information
bib <- c(
    R = citation(),
    BiocStyle = citation("BiocStyle")[1],
    knitr = citation("knitr")[1],
    rmarkdown = citation("rmarkdown")[1],
    sessioninfo = citation("sessioninfo")[1],
    testthat = citation("testthat")[1],
    RefManageR = citation("RefManageR")[1],
    decoupleR = citation("decoupleR")[1],
    GSVA = citation("GSVA")[1],
    viper = citation("viper")[1]
)
```

# Basics

## Install `decoupleR`

`R` is an open-source statistical environment which can be easily modified to
enhance its functionality via packages. `r Biocpkg("decoupleR")` is an `R`
package available via the [Bioconductor](http://bioconductor.org) repository 
for packages. `R` can be installed on any operating system from
[CRAN](https://cran.r-project.org/) after which you can install
`r Biocpkg("decoupleR")` by using the following commands in your `R` session:

```{r bioconductor_install, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("decoupleR")

# Check that you have a valid Bioconductor installation
BiocManager::valid()
```

You can install the development version from [GitHub](https://github.com/) with:

```{r github_install, eval=FALSE}
# install.packages("devtools")
devtools::install_github("saezlab/decoupleR")
```

## Required knowledge

`r Biocpkg("decoupleR")` is based on many other packages and in particular in 
those that have implemented the infrastructure needed for dealing with
functional genomic analysis. That is, packages like `r Biocpkg("viper")` or 
`r Biocpkg("GSVA")`, among others. This in order to have a centralized place
from which to apply different statistics to the same data set without the need
and work that would require testing in isolation. Opening the possibility of
developing benchmarks that can grow easily.

## Asking for help

As package developers, we try to explain clearly how to use our packages and in
which order to use the functions. But `R` and `Bioconductor` have a steep
learning curve so it is critical to learn where to ask for help.
We would like to highlight the
[Bioconductor support site](https://support.bioconductor.org/) as the main
resource for getting help: remember to use the `decoupleR` tag and check 
[the older posts](https://support.bioconductor.org/t/decoupleR/).
Other alternatives are available such as creating GitHub issues and tweeting.
However, please note that if you want to receive help you should adhere to the
[posting guidelines](http://www.bioconductor.org/help/support/posting-guide/).
It is particularly critical that you provide a small reproducible example and
your session information so package developers can track down the source of
the error.

## Citing `decoupleR`

We hope that `r Biocpkg("decoupleR")` will be useful for your research.
Please use the following information to cite the package and the overall
approach. Thank you!

```{r decoupleR citation}
citation("decoupleR")
```


# Quick start to using to `decoupleR`

## Libraries

`r Biocpkg("decoupleR")` provides different statistics to calculate the
regulatory activity given an expression `matrix` and a `network`.
It incorporates pre-existing methods to avoid recreating the wheel while
implementing its own methods under an evaluation standard.
Therefore, it provides flexibility when evaluating a data set with different
statistics.
  
Since inputs and outputs are always tibbles (i.e. special data frames),
incorporating `r Githubpkg("tidyverse/dplyr","dplyr")` into your workflow can
be useful for manipulating results, but it is not necessary.

```{r load_library, message=FALSE}
library(decoupleR)
library(dplyr)
```

## Input data

In order to use it, you first need to have a `matrix` where the rows represent
the target nodes and the columns the different conditions in which they were
evaluated. In addition, it is necessary to provide a `network` that contains at
least two columns corresponding to the source and target nodes. It is noteworthy
that certain methods will require specifying additional metadata columns. For
instance, the mode of regulation (MoR) or the likelihood of the interaction.

```{r read_example_data}
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- file.path(inputs_dir, "input-expr_matrix.rds") %>%
    readRDS() %>%
    glimpse()

network <- file.path(inputs_dir, "input-dorothea_genesets.rds") %>%
    readRDS() %>%
    glimpse()
```

## How to decouple?

Once the data is loaded, you are one step away from achieving decoupling.
This step corresponds to specifying which statistics you want to run.
For more information about the defined statistics and their parameters,
you can execute `?decouple()`.

```{r usage-decouple_function, message=TRUE}
decouple(
    mat = mat,
    network = network,
    .source = "tf",
    .target = "target",
    statistics = c("gsva", "mean", "pscira", "scira", "viper", "ora"),
    args = list(
        gsva = list(verbose = FALSE),
        mean = list(.mor = "mor", .likelihood = "likelihood"),
        pscira = list(.mor = "mor"),
        scira = list(.mor = "mor"),
        viper = list(
            .mor = "mor",
            .likelihood = "likelihood",
            verbose = FALSE
        ),
        ora = list()
    )
) %>%
    glimpse()
```

Done, you have applied different statistics to the same data set, now you can
analyze them at your convenience, for example, by performing a [benchmark]().

## How it works?

### Mapping statistics with arguments

Internally, `decouple()` works through `purrr::map2_dfr()` to perform
statistics and argument mapping. This comes with important points:

- `statistics` and `args` can be vectors of the same length. A vector of length
  1 will be recycled. So, **match** is performed by **position** not by name.
  Using named vectors could be a good idea to make clear your intentions.
- You will lose track of which statistic you are running on a certain problem.
  To get around this, `decouple()` works with expressions that are later
  evaluated. For example, it generates a toy call that represents what it is
  trying to run. You can show it with the option `show_toy_call = TRUE`.
- If an error occurs, copy the last line that was displayed with
  `show_toy_call = TRUE` and execute it locally. Try to fix it and correct it
  on the original call.

Based on the previous points, they can take the generated toy calls and execute
them independently, obtaining the same results as if you were executing
`decouple()`.

See internal gsva calls and save results.

```{r see gsvas individual calls}
gsvas_res <- decouple(
    mat = head(mat, 5000),
    network = network,
    .source = "tf",
    .target = "target",
    statistics = c("gsva"),
    args = list(
        gsva_default = list(verbose = FALSE),
        gsva_minsize = list(verbose = FALSE, ssgsea.norm = FALSE)
    ),
    show_toy_call = TRUE
)
```

Run same calls as provided by setting `show_toy_call = TRUE`.

```{r run_individual_gsvas}
gsva_1 <- run_gsva(
    mat = head(mat, 5000),
    network = network,
    .source = "tf",
    .target = "target",
    verbose = FALSE
)

gsva_2 <- run_gsva(
    mat = head(mat, 5000),
    network = network,
    .source = "tf",
    .target = "target",
    verbose = FALSE,
    ssgsea.norm = FALSE
)

gsvas_res_2 <- bind_rows(gsva_1, gsva_2, .id = "run_id")
```

Now compare results and see there is not difference.

```{r see_not_differences}
all.equal(gsvas_res, gsvas_res_2)
```

### Mapping network columns

To carry out the column mapping, `decoupleR` relies on the selection provided
by the `r Githubpkg("tidyverse/tidyselect","tidyselect")` package.
Some of the selection it provides are:

- Symbols
- Strings
- Position

Let's see an example. Input network has the following columns:
```{r show_columns}
network %>%
    colnames()
```

We can use the way we like to do the mapping, even a combination of ways to do
it. This applies not only to the decouple function, but to all functions of
the `decoupleR statistics` family, identifiable by the `run_` prefix.

```{r}
this_column <- "target"
viper_res <- decouple(
    mat = mat,
    network = network,
    .source = tf,
    .target = !!this_column,
    statistics = c("viper"),
    args = list(
        viper = list(
            .mor = 4,
            .likelihood = "likelihood",
            verbose = FALSE
        )
    ),
    show_toy_call = TRUE
)
```

# Reproducibility

## Special thanks

The `r Biocpkg("decouopleR")` package `r Citep(bib[["decoupleR"]])` was made
possible thanks to:

* R `r Citep(bib[["R"]])`
* `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])`
* `r CRANpkg("knitcitations")` `r Citep(bib[["knitcitations"]])`
* `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])`
* `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])`
* `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])`
* `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])`
* `r Biocpkg("GSVA")` `r Citep(bib[["GSVA"]])`
* `r Biocpkg("viper")` `r Citep(bib[["viper"]])`

This package was developed using
`r BiocStyle::Githubpkg("lcolladotor/biocthis")`.

## Vignette

### Create

```{r create_vignette, eval=FALSE}
# Create the vignette
library(rmarkdown)
system.time(render("decoupleR.Rmd", "BiocStyle::html_document"))

# Extract the R code
library(knitr)
knit("decoupleR.Rmd", tangle = TRUE)
```

### Wallclock time spent generating the vignette

```{r reproduce_time, echo=FALSE}
# Processing time in seconds
total_time <- diff(c(start_time, Sys.time()))
round(total_time, digits = 3)
```

## Session information

```{r session_info, echo=FALSE}
options(width = 120)
sessioninfo::session_info()
```

# Bibliography

This vignette was generated using `r Biocpkg("BiocStyle")` 
`r Citep(bib[["BiocStyle"]])` with `r CRANpkg("knitr")`
`r Citep(bib[["knitr"]])` and `r CRANpkg("rmarkdown")`
`r Citep(bib[["rmarkdown"]])` running behind the scenes.

Citations made with `r CRANpkg('RefManageR')` `r Citep(bib[['RefManageR']])`.

```{r vignetteBiblio, results = "asis", echo = FALSE, warning = FALSE, message = FALSE}
## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
```