--- title: "PROGENy pathway signatures: Application to Bulk transcriptomics" author: "Alberto Valdeolivas" date: "`r Sys.Date()`" output: BiocStyle::html_document vignette: | %\VignetteIndexEntry{PROGENy pathway signatures: Application to Bulk transcriptomics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, echo=FALSE, results='hide', warning=FALSE, error=FALSE, message=FALSE, cache=FALSE} library(knitr) opts_chunk$set( cache = FALSE, echo = TRUE, warning = FALSE, error = FALSE, message = FALSE ) ``` ## Introduction This is the third part in a series of transcriptomics tutorials. We present here how to estimate pathway activity from transcriptomics data using **PROGENy**. Conventional pathway analysis methods rely on the gene expression of the pathway members. However, this approach overlooks the effect of post-translational modifications and only captures very specific experimental conditions. To overcome these limitations, **PROGENy** (Pathway RespOnsive GENes) estimates the activity of relevant signaling pathways based on consensus gene signatures obtained from perturbation experiments [(Schubert et al. 2018)](https://www.nature.com/articles/s41467-017-02391-6) **PROGENy** initially contained 11 pathways and was developed for the application to human transcriptomics data. It has been recently shown that **PROGENy** is also applicable to mouse data and it has been expanded to 14 pathways [(Holland et al., 2019)](https://doi.org/10.1016/j.bbagrm.2019.194431). In addition, **PROGENy** can be applied to scRNA-seq data, as described in [(Holland et al., 2020)](https://doi.org/10.1186/s13059-020-1949-z) **PROGENy** is available as a [Bioconductor package](http://bioconductor.org/packages/release/bioc/html/progeny.html). For additional information about the **PROGENy** method, visit its website: https://saezlab.github.io/progeny/ ## Getting Started We first load the required libraries. ```{r libraries, message=FALSE} library(progeny) library(tibble) library(tidyr) library(dplyr) library(pheatmap) library(readr) library(ggplot2) ``` ### Import the raw count dataframe The data can be downloaded from: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE119931 The file name is: GSE119931_PANC1.FOXA2KO.genes.counts.txt.gz ```{r loadInput, message=FALSE} ## We assign the normalised counts and the experimental design data("vignette_data") Normalised_counts <- vignette_data$counts Experimental_design <- vignette_data$design ## We read the results from the differential analysis. ttop_KOvsWT <- vignette_data$limma_ttop ``` We have to slightly modify the format of the input files to make it suitable for running Progeny. ```{r} Normalised_counts_matrix <- Normalised_counts %>% dplyr::mutate_if(~ any(is.na(.x)),~ if_else(is.na(.x),0,.x)) %>% tibble::column_to_rownames(var = "gene") %>% as.matrix() ttop_KOvsWT_matrix <- ttop_KOvsWT %>% dplyr::select(ID, t) %>% dplyr::filter(!is.na(t)) %>% column_to_rownames(var = "ID") %>% as.matrix() ``` ## Pathway activity with Progeny We first compute **Progeny** scores for every sample (with the replicates) using the normalised counts. It is worth noting that we are going to use the 100 most responsive genes per pathway. This number can be increased depending on the coverage of your experiments. For instance, the number of quantified genes for single-cell RNA-seq is smaller than for Bulk RNA-seq or microarray. In those cases, we suggest to increase the number of responsive genes to 200-500. ```{r ProgenyCounts, message=FALSE} PathwayActivity_counts <- progeny(Normalised_counts_matrix, scale=TRUE, organism="Human", top = 100) Activity_counts <- as.vector(PathwayActivity_counts) ``` We present the results in a heatmap: ```{r HeatmapProgeny, dpi=300} paletteLength <- 100 myColor <- colorRampPalette(c("darkblue", "whitesmoke","indianred"))(paletteLength) progenyBreaks <- c(seq(min(Activity_counts), 0, length.out=ceiling(paletteLength/2) + 1), seq(max(Activity_counts)/paletteLength, max(Activity_counts), length.out=floor(paletteLength/2))) progeny_hmap <- pheatmap(t(PathwayActivity_counts),fontsize=14, fontsize_row = 10, fontsize_col = 10, color=myColor, breaks = progenyBreaks, main = "PROGENy (100)", angle_col = 45, treeheight_col = 0, border_color = NA) ``` Now, we run an enrichment analysis using a competitive permutation approach to assess the significance of the pathway activity. We end up with Normalised Enrichment Scores (NES) for each pathway. ```{r ProgenyStat, message=FALSE} PathwayActivity_zscore <- progeny(ttop_KOvsWT_matrix, scale=TRUE, organism="Human", top = 100, perm = 1000, z_scores = TRUE) %>% t() colnames(PathwayActivity_zscore) <- "NES" ``` ```{r BarplotProgeny, dpi=300} PathwayActivity_zscore_df <- as.data.frame(PathwayActivity_zscore) %>% rownames_to_column(var = "Pathway") %>% dplyr::arrange(NES) %>% dplyr::mutate(Pathway = factor(Pathway)) ggplot(PathwayActivity_zscore_df,aes(x = reorder(Pathway, NES), y = NES)) + geom_bar(aes(fill = NES), stat = "identity") + scale_fill_gradient2(low = "darkblue", high = "indianred", mid = "whitesmoke", midpoint = 0) + theme_minimal() + theme(axis.title = element_text(face = "bold", size = 12), axis.text.x = element_text(angle = 45, hjust = 1, size =10, face= "bold"), axis.text.y = element_text(size =10, face= "bold"), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + xlab("Pathways") ``` The MAPK pathway is the most active pathway upon the perturbation that we are studying (KO of the FOXA2 gene versus the wild-type). We can therefore visualise the MAPK most responsive genes (progeny weights) along with their t_values to interpret the results. In the scatterplot, we can see the genes that are contributing the most to the pathway enrichment. ```{r ProgenySccater_1, message=FALSE, warning=FALSE} prog_matrix <- getModel("Human", top=100) %>% as.data.frame() %>% tibble::rownames_to_column("GeneID") ttop_KOvsWT_df <- ttop_KOvsWT_matrix %>% as.data.frame() %>% tibble::rownames_to_column("GeneID") scat_plots <- progeny::progenyScatter(df = ttop_KOvsWT_df, weight_matrix = prog_matrix, statName = "t_values", verbose = FALSE) ``` ```{r ProgenySccater_2, dpi=300} plot(scat_plots[[1]]$`MAPK`) ``` Progeny results can be used as an optional input for **CARNIVAL**. CARNIVAL sets weights based on **PROGENy** scores in each pathway-related node in order to find more relevant solutions. We therefore run **PROGENy** again with slightly different parameters, setting `z_scores = FALSE` so that **PROGENy** returns pathway activity values between 1 and -1, rather than converting to Z-Scores. ```{r ProgenyCARNIVAL} PathwayActivity_CARNIVALinput <- progeny(ttop_KOvsWT_matrix, scale=TRUE, organism="Human", top = 100, perm = 10000, z_scores = FALSE) %>% t () %>% as.data.frame() %>% tibble::rownames_to_column(var = "Pathway") colnames(PathwayActivity_CARNIVALinput)[2] <- "score" ``` PathwayActivity_CARNIVALinput can then be used as input for CARNIVAL. ## References > Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, Garnett MJ, Blüthgen N, Saez-Rodriguez J. “Perturbation-response genes reveal signaling footprints in cancer gene expression.” _Nature Communications_: [10.1038/s41467-017-02391-6](https://doi.org/10.1038/s41467-017-02391-6) > Holland CH, Szalai B, Saez-Rodriguez J. "Transfer of regulatory knowledge from human to mouse for functional genomics analysis." _Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms._ 2019. DOI: [10.1016/j.bbagrm.2019.194431](https://doi.org/10.1016/j.bbagrm.2019.194431). > Holland CH, Tanevski J, Perales-Patón J, Gleixner J, Kumar MP, Mereu E, Joughin BA, Stegle O, Lauffenburger DA, Heyn H, Szalai B, Saez-Rodriguez, J. "Robustness and applicability of transcription factor and pathway analysis tools on single-cell RNA-seq data." _Genome Biology._ 2020. DOI: [10.1186/s13059-020-1949-z](https://doi.org/10.1186/s13059-020-1949-z). ## Session Info Details ```{r sessionInfo, echo=FALSE, eval=TRUE} sessionInfo() ```