---
title: "MOMA - Multi Omic Master Regulator Analysis"
author: "Sunny Jones, Evan Paull, Mariano Alvarez, Federico Giorgi and Andrea Califano"
output: BiocStyle::html_document
#bibliography: bibliography.bib
vignette: >
  %\VignetteIndexEntry{MOMA - Multi Omic Master Regulator Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction and Background

MOMA is a tool for inferring connections between Master Regulator proteins and 
genomic driver events in cancer. Master regulators are regulatory proteins 
(primarily transcription factors and co-transcription factors) that control cell
state. In the case of cancer and other disease states transcription factors have 
been shown to be key drivers of maintaining the disease state and can be targets 
for interventions. Often these master regulators are not mutated themselves but 
are downstream of mutations and other genomic alterations that ultimately 
dysregulate the normal activity of that regulator. MOMA uses multiple inputs of 
information to infer these connections and to improve the predictive value of 
the master regulator analysis.  


*For more information on Master Regulators and our tool for calculating their values see the paper published on VIPER in Nature Genetics in 2016* 
[paper](https://www.nature.com/articles/ng.3593) // [package](https://bioconductor.org/packages/release/bioc/html/viper.html).

*** 

# Running MOMA

## Getting Started

First install and load the library into the R session. Make sure all the other 
dependent packages have already been installed to ensure full functionality, 
including graphics and plotting functions.  


```{r setup, message = FALSE, results="hide", warning=FALSE, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("MOMA")
```


```{r load}
library(MOMA)
```


Explore the test data saved in the `MultiAssayExperiment` object. Confirm that we 
have 50 test samples and 2505 VIPER inferred proteins in the test viper matrix. 
This was generated by using `VIPER` on the GBM gene expression signature. The 
matrix has the samples across the columns and the regulators in the rows.  

The other assays needed for the experiment are:

- **SNP matrix**: binary 0/1 values where 1 indicates the prescence of a mutation
- **CNV matrix**: copy number variant values
- **Fusion matrix**: binary 0/1 values (optional)


```{r explore data, message = FALSE}
example.gbm.mae

MultiAssayExperiment::assays(example.gbm.mae)$viper[1:3, 1:3]
```

Next, we select the biological pathways we'd like to use to aid in our inference 
task. In this case, we're using the CINDy modulator inference algorithm [[paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0109569)] 
as well as protein-protein interactions predicted by the PrePPI structure-based 
algorithm [[paper](https://academic.oup.com/nar/article/41/D1/D828/1070064)]. 
The output of the CINDy algorithm is a likelihood (p value) for
the association of an upstream modulator with a particular regulators activity.
The relevant output for PrePPI is a likelihood (p value) that a modulator 
structurally binds to a regulator.

To make the MOMA object they need to be a named list of lists, with the indexes
of each being a regulator and their associated partners.

*note: CINDy values are context specific and need to be obtained for a particular 
tissue/expression set. Values calculated from the tumor types in TCGA can be found
[[here](https://figshare.com/articles/CINDy_Networks_from_TCGA/12198510)] for use in applying this analysis to other datasets. The PREPPi values are not context specific
and the values from this example analysis can be used with other datasets.* 

```{r pathways}
names(gbm.pathways)

gbm.pathways$cindy[1:3]
```

## Generating the **MOMA** object

Initialize the Moma object with the assays and pathway data in order to start
the analysis. The required inputs are the following:

- **Assays**: a MultiAssayExperiment object with the previously mentioned assays.
Alternatively a named list with the assays as matrices can be supplied. In both 
cased the names of the assays must be "viper", "cnv", "mut" and "fusion"
- **Pathway list**: see above

Other optional inputs include:

- **Gene Blacklist**: genes to not include in the analysis. By default the Broad
Institute's list of highly mutated genes are excluded [[link](https://software.broadinstitute.org/cancer/cga/mutsig)]
- **Output Folder** : Directory to save results to.
- **Gene Map** : Dataframe mapping Gene Names to Entrez IDs and Cytoband locations.
One is already provided by default. 

```{r moma object}
momaObj <- MomaConstructor(example.gbm.mae, gbm.pathways)
```

## MOMA Analysis on GBM Data

The first step, `runDIGGIT()`  will run the DIGGIT inference algorithm [[paper](https://www.cell.com/cell/fulltext/S0092-8674(14)01170-2)] to
find statistical interactions between VIPER-inferred proteins and genomic events.  

The `makeInteractions()` function will infer robust computational predictions
using all the provided data, including the Conditional Inference of Network Dynamics 
(CINDy) algorithm.  

The `Rank()` function will create a final ranking of candidate Master Regulators
for this cohort of patient samples.

```{r interactions, results = "hide", message = FALSE}
momaObj$runDIGGIT()

momaObj$makeInteractions()

momaObj$Rank()
```

Clustering of the samples, using the protein ranks computed in the last step, 
can then be performed using `Cluster()`. Multiple cluster solutions will be 
calculated, ranging from 2 to 15 clusters by default. The reliability or 
average silhouette scores of each can be assessed to determine an optimal 'k' 
number of clusters. By default the clustering solution with the maximum 
reliability will be saved to the object, but any solution can be saved in afterwards.

```{r clustering}
momaObj$Cluster()

# Explore the reliability scores
momaObj$clustering.results$all.cluster.reliability

# Save in the 3 cluster solution
momaObj$sample.clustering <- momaObj$clustering.results$`3clusters`$clustering

```

  
Genomic saturation analysis is then performed on each cluster with the 
`saturationCalculation()` function, allowing us to find the key proteins that are 
downstream of the majority of genomic events in the samples within a particular 
cluster. These regulators make up that cluster's `checkpoint`.  

The results of this analysis can be accessed directly in the following result 
fields:  
  
  - `momaObj$checkpoints` 
  - `momaObj$genomic.saturation`
  - `momaObj$coverage.summaryStats` 
  
These will be used for plotting the genomic saturation curves as well.
  
```{r saturation, message=FALSE, results="hide"}
momaObj$saturationCalculation()
```

```{r checkpoints}
cluster1.checkpoint <- momaObj$checkpoints[[1]]
print (cluster1.checkpoint[1:5])
```

***

# Plotting the Results

## Visualizing the VIPER matrix  

The primary results of the analysis are the master regululators of each particular
cluster's checkpoint, as displayed above. You can plot the original VIPER matrix 
subset down to only these regulators using whatever heatmap function you like. 


## Visualizing the Genomic Saturation and Events Plots

The other important results of the analysis are the statisically significant
genomic events found to be upstream of each of the checkpoint master regulators. 
The `makeSaturationPlots()` function takes the subtype specific genomic event 
interactions calculated in the previous step and makes plots for viewing the 
results. Because there are usually quite a large number of events that are 
detected only the most frequently occurring events will be plotted. 
(Note: to account for amplifications and deletions being events that can occur
across multiple genes, these events are considered on a cytoband basis).

```{r make plots, warning=FALSE, include=FALSE}
plots <- makeSaturationPlots(momaObj)
```


The two plot types are: 

- **Genomic Event Oncoprint Plots** : these plot the different types of 
genomic events found to be upstream of the regulators in the checkpoint,
using the style of cBioPortal's plotting. The samples in that cluster are in
the columns and the rows are the top significant events to be plotted (max
number can be changed). Summary bar plots are on the top and right side.
- **Saturation Curves** : these plot the number of regulators needed to capture
a certain percentage of genomic events. The inflection points plotted on
this graph match up with the cut off for which regulators make up a particular
cluster's checkpoint.   

The data for these plots are stored as grobs and ggplot objects so layers or other 
graphic modifications can be added afterwards for further customization.

```{r visualize plots}
library(ggplot2)
library(grid)
grid.draw(plots$oncoprint.plots[[3]])
plots$curve.plots[[3]] +
  ggtitle("Genomic Saturation Curve for GBM Subtype 3")
  
```

# Saving Data

Any of the results fields can be saved by using the `saveData()` function. 
Pass in any of the names of the results fields and they will be saved to files
in the designated output folder. If no names are passed all results will be saved.

```{r saving data, eval=FALSE}
momaObj$saveData(output.folder = "outputs", "hypotheses", "checkpoints")
```


# Session Info 

Here is the output of sessionInfo() on the system on which this package and
documentation was compiled.

```{r sessionInfo}
sessionInfo()
```