---
title: "Using the ReactomeGSA package"
author: "Johannes Griss"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using the ReactomeGSA package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

The ReactomeGSA package is a client to the web-based Reactome Analysis System. Essentially, it performs a gene set analysis using the latest version of the Reactome pathway database as a backend.

The main advantages of using the Reactome Analysis System are:

  * Simultaneous analysis and visualization of different types of 'omics data
  * Support for direct comparison across different species
  * Directly linked with Reactome's powerful pathway browser producing publication-ready figures of your gene set analysis

### Citation

To cite this package, use 

```
Griss J. ReactomeGSA, https://github.com/reactome/ReactomeGSA (2019)
```

## Installation

The `ReactomeGSA` package can be directly installed from Bioconductor:

```{r}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

if (!require(ReactomeGSA))
  BiocManager::install("ReactomeGSA")
```

For more information, see https://bioconductor.org/install/.

## Getting available methods

The Reactome Analysis System will be continuously updated. Before starting your analysis it is therefore a good approach to check which methods are available.

This can simply be done by using:

```{r show_methods}
library(ReactomeGSA)

available_methods <- get_reactome_methods(print_methods = FALSE, return_result = TRUE)

# only show the names of the available methods
available_methods$name
```

To get more information about a specific method, set `print_details` to `TRUE` and specify the `method`:

```{r get_method_details}
# Use this command to print the description of the specific method to the console
# get_reactome_methods(print_methods = TRUE, print_details = TRUE, method = "PADOG", return_result = FALSE)

# show the parameter names for the method
padog_params <- available_methods$parameters[available_methods$name == "PADOG"][[1]]

paste0(padog_params$name, " (", padog_params$type, ", ", padog_params$default, ")")
```

## Creating an analysis request

To start a gene set analysis, you first have to create an analysis request. This is a simple S4 class that takes care of submitting multiple datasets simultaneously to the analysis system.

When creating the request object, you already have to specify the analysis method you want to use:

```{r create_request}
# Create a new request object using 'Camera' for the gene set analysis
my_request <-ReactomeAnalysisRequest(method = "Camera")

my_request
```

## Setting parameters

To get a list of supported parameters for each method, use the `get_reactome_methods` function (see above).

Parameters are simply set using the `set_parameters` function:

```{r set_parameters}
# set the maximum number of allowed missing values to 50%
my_request <- set_parameters(request = my_request, max_missing_values = 0.5)

my_request
```

Multiple parameters can by set simulataneously by simply adding more name-value pairs to the function call.

## Adding datasets

One analysis request can contain multiple datasets. This can be used to, for example, visualize the results of an RNA-seq and Proteomics experiment (of the same / similar samples) side by side:

```{r add_dataset}
library(ReactomeGSA.data)
data("griss_melanoma_proteomics")
```

This is a limma `EList` object with the sample data already added

```{r}
class(griss_melanoma_proteomics)
head(griss_melanoma_proteomics$samples)
```

The dataset can now simply be added to the request using the `add_dataset` function:

```{r}
my_request <- add_dataset(request = my_request, 
                          expression_values = griss_melanoma_proteomics, 
                          name = "Proteomics", 
                          type = "proteomics_int",
                          comparison_factor = "condition", 
                          comparison_group_1 = "MOCK", 
                          comparison_group_2 = "MCM",
                          additional_factors = c("cell.type", "patient.id"))
my_request
```

Several datasets (of the same experiment) can be added to one request. This RNA-seq data is stored as an `edgeR` `DGEList` object:

```{r}
data("griss_melanoma_rnaseq")

# only keep genes with >= 100 reads in total
total_reads <- rowSums(griss_melanoma_rnaseq$counts)
griss_melanoma_rnaseq <- griss_melanoma_rnaseq[total_reads >= 100, ]

# this is a edgeR DGEList object
class(griss_melanoma_rnaseq)
head(griss_melanoma_rnaseq$samples)
```

Again, the dataset can simply be added using `add_dataset`. Here, we added an additional parameter to the `add_dataset`
call. Such additional parameters are treated as additional dataset-level parameters.
```{r}
# add the dataset
my_request <- add_dataset(request = my_request, 
                          expression_values = griss_melanoma_rnaseq, 
                          name = "RNA-seq", 
                          type = "rnaseq_counts",
                          comparison_factor = "treatment", 
                          comparison_group_1 = "MOCK", 
                          comparison_group_2 = "MCM",
                          additional_factors = c("cell_type", "patient"),
                          # This adds the dataset-level parameter 'discrete_norm_function' to the request
                          discrete_norm_function = "TMM")
my_request
```

### Sample annotations

Datasets can be passed as limma `EList`, edgeR `DGEList`, any implementation of the Bioconductor `ExpressionSet`, or simply a `data.frame`. 

For the first three, sample annotations are simply read from the respective slot. When supplying the expression values as a `data.frame`, the `sample_data` parameter has to be set using a `data.frame` where each row represents one sample and each column one proptery. If the the `sample_data` option is set while providing the expression data as an `EList`, `DGEList`, or `ExpressionSet`, the data in `sample_data` will be used instead of the sample annotations in the expression data object.

### Name

Each dataset has to have a name. This can be anything but has to be unique within one analysis request.

### Type

The ReactomeAnalysisSystem supports different types of 'omics data. To get a list of supported types, use the `get_reactome_data_types` function:

```{r get_data_types}
get_reactome_data_types()
```

### Defining the experimental design

Defining the experimental design for a `ReactomeAnalysisRequest` is very simple. Basically, it only takes three parameters:

  * `comparison_factor`: Name of the property within the sample data to use
  * `comparison_group_1`: The first group to compare
  * `comparison_group_2`: The second group to compare
  
The value set in `comparison_factor` must match a column name in the sample data (either the slot in an `Elist`, `DGEList`, or `ExpressionSet` object or in the `sample_data` parameter).

Additionally, it is possible to define blocking factors. These are supported by all methods that rely on linear models in the backend. Some methods though might simply ignore this parameter. For more information on whether a method supports blocking factors, please use `get_reactome_methods`.

Blocking factors can simply be set `additional_factors` to a vector of names. These should again reference properties (or columns) in the sample data.

## Submitting the request

Once the `ReactomeAnalysisRequest` is created, the complete analysis can be run using `perform_reactome_analysis`:

```{r perform_analysis}
result <- perform_reactome_analysis(request = my_request, compress = F)
```

## Investigating the result

The result object is a `ReactomeAnalysisResult` S4 class with several helper functions to access the data.

To retrieve the names of all available results (generally one per dataset), use the `names` function:

```{r}
names(result)
```

For every dataset, different result types may be available. These can be shown using the `result_types` function:

```{r}
result_types(result)
```

The `Camera` analysis method returns two types of results, pathway-level data and gene- / protein-level fold changes.

A specific result can be retrieved using the `get_result` method:

```{r}
# retrieve the fold-change data for the proteomics dataset
proteomics_fc <- get_result(result, type = "fold_changes", name = "Proteomics")
head(proteomics_fc)
```

Additionally, it is possible to directly merge the pathway level data for all result sets using the `pathways` function:

```{r}
combined_pathways <- pathways(result)

head(combined_pathways)
```

## Visualising results

The ReactomeGSA package includes several basic plotting functions to visualise the pathway results. For comparative gene set analysis like the one presented here, two functions are available: `plot_correlations` and `plot_volcano`.

`plot_correlations` can be used to quickly assess how similar two datasets are on the pathway level:

```{r}
plot_correlations(result)
```

Individual datasets can further be visualised using volcano plots of the pathway data:

```{r}
plot_volcano(result, 2)
```

## Session Info

```{r}
sessionInfo()
```