--- title: "MOSClip vignette" date: "`r BiocStyle::doc_date()`" output: BiocStyle::html_document: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{MOSClip} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} Package: MOSClip bibliography: ../inst/mosclip.bib --- # Abstract {.unnumbered} In the last years we have witnessed a dramatic change in the clinical treatment of patients thanks to molecular and personalized medicine. Many medical institutes are starting to adopt routine genome-wide screenings to complement and help diagnosis and treatment choices. As the number of datasets grows, we need to adapt and improve the methods to cope with the complexity, amount and multi-level structure of available information. Integrating these type of data remains challenging due to their dimensionality and diverse features. Moreover, focusing on dysregulated biological processes rather than individual genes can offer deeper insights into complex diseases like cancer. `MOSClip` is a multi-omic statistical approach based on pathway topology that can deal with this complexity. It integrates multiple omics - such as expression, methylation, mutation, or copy number variation - using various dimensionality reduction strategies. The analysis can be performed at the level of entire pathways or pathway modules, allowing for a more detailed examination of the dysregulated mechanisms within large biological processes. `MOSClip` pathway analysis serves two primary purposes: testing the survival association of pathways or modules using the Cox proportional hazard model, and conducting a two-class analysis with a generalized linear model. Additionally, the package offers valuable graphical tools to visualize and interpret the results. ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction In recent years many efforts have been dedicated on multi-omic data integration, with several tools available for pathway analysis in a multi-omic framework; however, their purpose is mainly limited to two-class comparison. `MOSClip` (Multi-Omic Survival Clip) was originally developed for survival pathway analysis, combining both multi-omic data integration and graphical model theory to keep track of gene interactions among a pathway [@martini_mosclip_2019]. With this purpose, `MOSClip` allows to test survival association of pathways or their connected components, that we called modules, in a multi-omic framework. A second test has been implemented to perform also two-class comparison, to investigate pathways or modules association with a specific condition. Multi-omics gene measurements are tested as covariates of a Cox proportional hazard model or a generalized linear model, after dimensionality reduction of data. `MOSClip` is highly flexible thanks to its modular structure, allowing the use of one or multiple different omics, as well as different data reduction strategies and tests. In brief, `MOSClip` comprises four main components: pathway topology, multi-omic data and survival or two-class analysis. The final goal is to find biological processes impacting patient's survival or patient's association with a specific condition. Furthermore, several graphical tools have been implemented in `MOSClip` to browse, manage and provide help in the interpretation of the results. In this vignette, we will show an example of survival analysis on four omics: transcriptome, methylome, genomic mutations and genomic copy number variations, testing if these omics can be synergically involved in pathways with survival or two-class prognostication power. # Installation ```{r, eval = FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("MOSClip") ``` # How to use MOSClip for survival analysis We start by loading example data provided by `MOSClip` package to run our analysis. `reactSmall` object is a list of three interesting Reactome pathways in `graphite::Pathway` format that we will use in our analysis. More information can be found on the object documentation (`?reactSmall`). `multiOmics` object is a multi-omic dataset containing matrices, genes per patients, of methylation status, somatic mutations, CNVs, and transcripts expression levels, for 50 ovarian cancer patients from TCGA. Genes and patients were manually selected to generate this small dataset intended to illustrate the functionality of the package, allowing users to explore its methods and tools in a simplified context. Information on how the dataset was generated can be found in package documentation with `?multiOmics`. Moreover, additional information on how to pre-process datasets for a `MOSClip` analysis are available in this [GitHub tutorial](https://caluralab.github.io/MOSClipTutorials/) or in the Materials and Methods section of the paper [@martini_mosclip_2019]. ```{r message=FALSE} library(MOSClip) library(MultiAssayExperiment) library(kableExtra) data("reactSmall") data("multiOmics") ``` `multiOmics` is an `Omics` class object. This `MOSClip` specific object derives from a `MultiAssayExperiment` object. It contains standard slots `ExperimentList`, `sampleMap`, `colData`, and specific slots for `MOSClip` analysis: `modelInfo`, where the user should specify the dimensionality reduction method to process each omic dataset, and `specificArgs`, where additional parameters can be set for each method, as described in the help of each reduction function. An `Omics` class object can be generated with the function `makeOmics` starting from data matrices and additional annotations. Available methods for dimensionality reduction can be conveniently visualized with `availableOmicMethods` function. ```{r} availableOmicMethods() ``` In this vignette, we chose to use PCA for expression data, cluster analysis for methylation data, vote counting for mutations and CNVs (for detail see `MOSClip` paper [@martini_mosclip_2019]). This data transformations are easily applied calling `MOSClip` functions, thus, here we need only to provide the name of the needed function. Given the nature of the methylation data, we expect to have more than a CpG cluster associated to a gene. For this reason, `MOSClip` provides the possibility to include a dictionary to associate the methylation level of multiple CpG clusters to a single gene. Thus, in the methylation specific arguments you need to provide the dictionary to convert cluster names into genes. ## Module analysis We are now ready to perform the survival analysis on modules using the function `multiOmicsSurvivalModuleTest`. Required inputs are a multi-omic dataset and a pathway in `Pathway` or `graphNEL` format. Alternatively, it is possible to run the analysis also on gene-sets; in this case the topological information is lost, thus, only the function for pathway test must be used. In this example, we choose to test only genes that have at least expression data, as specified with `useTheseGenes`. This a priori filter, however, is not mandatory. Moreover, it could be useful to set a seed in order to have reproducible results. ```{r, warning = FALSE} genesToConsider <- row.names(experiments(multiOmics)$exp) moduleSurv <- lapply(reactSmall, function(g) { set.seed(1234) fcl = multiOmicsSurvivalModuleTest(multiOmics, g, useTheseGenes = genesToConsider) fcl }) ``` Once the analysis is complete, we can plot the tabular summary of the top 10 modules selected by p-value of the Cox proportional hazard model using the function `multiPathwayModuleReport`. ```{r} moduleSummary <- multiPathwayModuleReport(moduleSurv) ``` ```{r, echo=FALSE} kable(moduleSummary[1:10,-1]) %>% kable_styling() %>% scroll_box(width = "80%", height = "400px") ``` ### Graphical exploration of MOSClip module results `MOSClip` has plenty of functions to visually explore the results. In the following paragraphs we will show some examples. We can visualize the table of module test results for a specific pathway, e.g., _Activation of Matrix Metalloproteinases_. ```{r fig.height=6, fig.weight=6} plotModuleReport(moduleSurv[["Activation of Matrix Metalloproteinases"]], MOcolors = c(exp = "red", met = "green", cnv = "yellow", mut = "blue")) ``` As you can see, among others, the module number 4 of this pathway is significant, and, in particular, covariates expPC2, met2k2 and cnvPOS are driving this significance. We can have a look at the pathway graph and the module position in the pathway using `plotModuleInGraph`. ```{r, message=FALSE} plotModuleInGraph(moduleSurv[["Activation of Matrix Metalloproteinases"]], pathList = reactSmall, moduleNumber = 4) ``` Then, we can check at the differences in survival using Kaplan-Meier curves dividing patients in groups with different omics patterns. Here we consider only covariates expPC2 and met2k. ```{r, fig.height=6, fig.width=7} plotModuleKM(moduleSurv[["Activation of Matrix Metalloproteinases"]], moduleNumber = 4, formula = "Surv(days, status) ~ expPC2 + met2k", paletteNames = "Paired", risk.table = TRUE, inYears = TRUE) ``` We can explore the most important genes of the pathway and their original profiles across the omics using an heatmap plot of the original values. The most important genes are selected as described in Martini et al. [@martini_mosclip_2019]. ```{r, message=FALSE, results='hide', fig.keep='all', fig.height=7, fig.width=7} additionalA <- colData(multiOmics) additionalA$status[additionalA$status == 1] <- "event" additionalA$status[additionalA$status == 0] <- "no_event" additionalA$PFS <- as.factor(additionalA$status) additionalA$status <- NULL additionalA$years <- round(additionalA$days/365.24, 0) additionalA$days <- NULL plotModuleHeat(moduleSurv[["Activation of Matrix Metalloproteinases"]], moduleNumber = 4, paletteNames = c(exp = "red", met = "green", cnv = "yellow", mut = "blue"), additionalAnnotations = additionalA, additionalPaletteNames = list(PFS = "violet", years = "teal"), sortBy = c("expPC2", "met2k", "PFS", "years"), withSampleNames = FALSE) ``` In second instance, we can ask if two or more omics are significant in the same module simultaneously and if this omic interaction is more frequent than those expected by chance. To perform this test we use the `runSupertest` function. A circle plot is returned with the frequency of all significant omic combinations and their significance levels. ```{r, fig.height=9, fig.width=10} runSupertest(moduleSummary, pvalueThr = 0.05, zscoreThr = 0.05, excludeColumns = c("pathway", "module")) ``` ## Pathway analysis In pathway test the pathway topology (the in and out connections of the pathway genes) can be exploited to guide the data reduction step. To do that, we suggest to use the topological PCA instead of the sparse PCA changing the settings in the omics object. Then, we are ready to run the analysis using the function `multiOmicsSurvivalPathwayTest`. ```{r warning=FALSE} data("multiOmicsTopo") pathwaySurv <- lapply(reactSmall, function(g) { set.seed(1234) fcl = multiOmicsSurvivalPathwayTest(multiOmicsTopo, g, useTheseGenes = genesToConsider) }) ``` ### Graphical exploration of MOSClip pathway results We can plot a report of the first 10 pathways, sorted by pvalue of the Cox proportional hazard model. ```{r fig.width=6, fig.height=4} plotMultiPathwayReport(pathwaySurv, MOcolors = c(exp = "red", mut = "blue", cnv = "yellow", met = "green")) ``` Finally, we can look at the predictive genes using a heatmap with patient additional annotations. ```{r, message=FALSE, results='hide', fig.keep='all', fig.height=7, fig.width=7} plotPathwayHeat(pathwaySurv[["Activation of Matrix Metalloproteinases"]], sortBy = c("expPC1", "cnvPOS", "PFS"), paletteNames = c(exp = "red", met = "green", mut = "blue", cnv = "yellow"), additionalAnnotations = additionalA, additionalPaletteNames = list(PFS = "violet", years = "teal"), withSampleNames = FALSE) ``` Then, we can also check for differences in survival using Kaplan-Meier curves dividing patients in groups with different omics patterns (e.g. patients with different methylation pattern and high and low levels of PC2 in expression). ```{r, fig.height=9, fig.width=10} plotPathwayKM(pathwaySurv[["Activation of Matrix Metalloproteinases"]], formula = "Surv(days, status) ~ expPC1 + cnvPOS", paletteNames = "Paired") ``` # Additional functionalities {.unnumbered} `MOSClip` gives the possibility to prioritize most important and stable pathway or module results, running a resampling procedure that can be found on the [extended tutorial on GitHub](https://caluralab.github.io/MOSClipTutorials/). More tutorials are also available on how to perform a two-class analysis with `MOSClip`, as well as examples of more plots that were not shown in this vignette. # Session Information {.unnumbered} ```{r} sessionInfo() ``` # References {.unnumbered}