--- title: "MetaPhOR" author: "Emily Isenhart" output: BiocStyle::html_document: toc: FALSE bibliography: "`r system.file('extdata/REFERENCES.bib', package = 'MetaPhOR')`" vignette: > %\VignetteIndexEntry{MetaPhOR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r style, include = FALSE, results = 'asis'} BiocStyle::html_document() ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, dpi=200 ) library(kableExtra) ``` ## Introduction MetaPhOR was developed to enable users to assess metabolic dysregulation using transcriptomic-level data (RNA-sequencing and Microarray data) and produce publication-quality figures. A list of differentially expressed genes (DEGs), which includes fold change and p value, from DESeq2 (@DESeq2) or limma (@limma), can be used as input, with sample size for MetaPhOR, and will produce a data frame of scores for each KEGG pathway. These scores represent the magnitude and direction of transcriptional change within the pathway, along with estimated p-values (@MP). MetaPhOR then uses these scores to visualize metabolic profiles within and between samples through a variety of mechanisms, including: bubble plots, heatmaps, and pathway models. ## Installation This command line can be used to install and load MetaPhOR. if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("MetaPhOR") ```{r, message=FALSE} library(MetaPhOR) ``` ## Data Preparation Minimal data preparation is required to run MetaPhOR. DEGs may be loaded into R in the form of .csv or .tsv for use with this package. The DEG file must contain columns for log fold change, adjusted p-value, and HUGO gene names. By default, MetaPhOR assumes DESeq2 header stylings: “log2FoldChange” and “padj”. In any function that assumes these headers, however, the user can define column names for these values. Below is a sample DEG file resulting from limma: ```{r} exdegs <- read.csv(system.file("extdata/exampledegs.csv", package = "MetaPhOR"), header = TRUE) ``` ```{r, echo=FALSE} kable(head(exdegs), format="html") %>% kable_material() ``` ## Pathway Analysis “pathwayAnalysis” first assigns scores and their absolute values using log fold change and p value to each gene (@MP). These transcript-level scores, along with sample size, are then utilized to calculate both scores (directional change) and absolute value scores (magnitude of change) (@MP) for each KEGG Pathway (@KEGG). We then utilize a bootstrapping method, to randomly calculate 100,000 scores per pathway, based on the number of genes in that pathway and model the distribution. This distribution can then be used to evaluate where the actual score for that pathway sits in relation to the distribution, and can assign a p-value to the achieved score. For example, if the polyamine biosynthetic pathway contains 13 genes, we can get a score for the sum of the 13 genes that exist within that pathway. Using bootstrapping, we can then randomly sample, with replacement, 13 genes to create scores, 100,000 times. We use these random samples (100,000) to generate a distribution, and we can calculate a p-value dependent on where the score that consists of the 13 genes that actually exist within the pathway falls within the distribution. Taken together, the scores and p-values resulting from “pathwayAnalysis” provide a measure for both the biological and statistical significance of metabolic dysregulation. **Note: A seed MUST be set before utilizing this function to ensure reproducible results by bootstrapping. It is NECESSARY that the seed remain the same throughout an analysis.** **Note: Bootrapping is performed, by default, with 100,000 iterations of resampling for optimal power. The number of iterations can be decreased for improved speed. We use 50,000 iterations for improved speed of examples.*** pathwayAnalysis() requires: * The file path to the DEG list of interest * The name of the column containing HUGO gene names * The sample size of the DEG analysis * The number of iterations of resampling to be performed during bootstrapping * Correct headers for fold change and p value columns (as indicated above) A partial output of the pathway analysis function can be seen as follows: ```{r, include=FALSE} # BRCA, OVCA, PRAD # sampsize <- c(1095, 378, 497) ``` ```{r} set.seed(1234) brca <- pathwayAnalysis(DEGpath = system.file("extdata/BRCA_DEGS.csv", package = "MetaPhOR"), genename = "X", sampsize = 1095, iters = 50000, headers = c("logFC", "adj.P.Val")) ``` ```{r, echo = FALSE} kable(head(brca), format="html", booktabs = TRUE) %>% kable_material() ``` ## bubblePlot The metabolic profile determined by pathway analysis can be easily visualized using “bubblePlot.” Scores are plotted on the x-axis, while absolute value scores are plotted on the y-axis. Each point represents a KEGG pathway, where point size represents p-value (the smaller the p value, the larger the point) and point color is dictated by scores. Negative scores, which indicate transcriptional downregulation, are blue, and positive scores, which indicate transcriptional upregulation, are red. The top ten points, either by smallest p value or greatest dysregulation by score, are labeled with their pathway names. The plot demonstrates which pathways are the most statistically and biologically relevant. bubblePlot() requires: * The output of pathwayAnalysis(), as a data frame * An indication which values to use, in order to label points: either “Pval” or “LogFC” * Optional: Numeric value for point label text size (default = .25) \newpage **Bubble Plot Labeled by P Value** ```{r} pval <- bubblePlot(scorelist = brca, labeltext = "Pval", labelsize = .85) plot(pval) ``` **Bubble Plot Labeled by LogFC** ```{r} logfc <- bubblePlot(scorelist = brca, labeltext = "LogFC", labelsize = .85) plot(logfc) ``` \newpage ## metaHeatmap “metaHeatmap” provides a useful visualization for comparing metabolic profiles between groups, including only significantly dysregulated pathways, and highlighting those which are most highly changed. This function should be used when you have multiple groups/DEGs being compared, e.g. if you have 4 conditions all being compared to each other. This will not be useful if you have a single DEG list. This function can be used only when multiple DEG comparisons are scored by “pathwayAnalysis.” The absolute pathway scores are scaled across outputs and plotted via pheatmap, selecting only those which have absolute score p values below the level of significance. Note: A heatmap cannot be produced if there are no pathways significantly dysregulated below the p value cut off. metaHeatmap() requires: * A list of outputs from pathway analysis, as data frames * A character vector of names for labeling each output * Optional: The p value cut off to be used (default = 0.05) ```{r} ##read in two additional sets of scores, ##run in the same manner as brca for comparison ovca <- read.csv(system.file("extdata/OVCA_Scores.csv", package = "MetaPhOR"), header = TRUE, row.names = 1) prad <- read.csv(system.file("extdata/PRAD_Scores.csv", package = "MetaPhOR"), header = TRUE, row.names = 1) all.scores <- list(brca, ovca, prad) names <- c("BRCA", "OVCA", "PRAD") metaHeatmap(scorelist = all.scores, samplenames = names, pvalcut = 0.05) ``` ## cytoPath “cytoPath” models metabolic pathways, sourced from WikiPathway (@Wikipathways), using the transcriptional change of individual genes in the pathway. The resulting figure can be used to identify candidate genes for further study by providing a detailed look at transcriptional dysregulation within and between “pathwayAnalysis” outputs. Pathways of interest can be readily identified from “bubblePlot” or “metaHeatmap.” This function utilizes Cytoscape (@Cytoscape); the package *RCy3* and a local instance of Cytoscape are required to render this plot. For details regarding legends, see the Cytoscape Legend Creator Tutorial (https://cytoscape.org/cytoscape-tutorials/protocols/legend-creator/#/ introduction). cytoPath() requires: * The name of the pathway to be plotted * The file path to the DEG list of interest * The file path to which the figure will be saved * The name of the column containing HUGE gene names * Correct headers for fold change and p value columns (as indicated above) ```{r, eval = FALSE} cytoPath(pathway = "Tryptophan Metabolism", DEGpath = "BRCA_DEGS.csv", figpath = paste(getwd(), "BRCA_Tryptophan_Pathway", sep = "/"), genename = "X", headers = c("logFC", "adj.P.Val")) ``` ```{r, echo = F, fig.wide = TRUE} knitr::include_graphics(c(system.file("extdata", "BRCA_Tryptophan_Pathway.png", package = "MetaPhOR"))) ``` ## pathwayList The WikiPathways available within MetaPhOR have been restricted to metabolic pathways. The pathwayList function will provide a complete list of WikiPathways for mapping with “cytoPath.” ```{r, eval = FALSE} pathwayList() ``` ```{r, echo = F} kable(head(pathwayList()), format = "html", booktabs = TRUE) %>% kable_material() ``` ## Conclusion MetaPhOR contains five functions for the analysis of metabolic dysregulation using transcriptomic-level data and the production of publication-quality figures. This version of MetaPhOR, 0.99.0, is available at https://github.com/emilyise/MetaPhOR. ## SessionInfo ```{r} sessionInfo() ``` ## References