--- title: "Integration with other panels" author: - name: Kevin Rue-Albrecht affiliation: - University of Oxford email: kevin.rue-albrecht@imm.ox.ac.uk output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r doc_date()`" package: "`r pkg_ver('iSEEpathways')`" vignette: > %\VignetteIndexEntry{Integration with other panels} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html ) ``` ```{r, eval=!exists("SCREENSHOT"), include=FALSE} SCREENSHOT <- function(x, ...) knitr::include_graphics(x) ``` ```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE} ## Track time spent on making the vignette startTime <- Sys.time() ## Bib setup library("RefManageR") ## Write bibliography information bib <- c( R = citation(), BiocStyle = citation("BiocStyle")[1], knitr = citation("knitr")[1], RefManageR = citation("RefManageR")[1], rmarkdown = citation("rmarkdown")[1], sessioninfo = citation("sessioninfo")[1], testthat = citation("testthat")[1], iSEEpathways = citation("iSEEpathways")[1] ) ``` # Scenario In this vignette, we use the example of a differential expression and pathway analysis workflow on a real data set to demonstrate how a selection made in a panel of pathway analysis results may be transmitted to other row-oriented panels in the `r BiocStyle::Biocpkg("iSEE")` application. # Demonstration ## Example data ## Experimental metadata We use the `?airway` data set. We briefly adjust the reference level of the treatment factor to the untreated condition. ```{r, message=FALSE, warning=FALSE} library("airway") data("airway") airway$dex <- relevel(airway$dex, "untrt") ``` ## Feature identifiers We also map the Ensembl gene identifiers to more recognisable gene symbols, setting row names to a unique identifier composed of either gene symbol, gene identifier, of a concatenate of both. Although not essential, this implicitly defines the primary piece of information displayed for genes in the live app. No information is lost in the process, as the original Ensembl identifier and the corresponding gene symbol are both stored in the `rowData()` of the object. ```{r, message=FALSE, warning=FALSE} library("org.Hs.eg.db") library("scater") rowData(airway)[["ENSEMBL"]] <- rownames(airway) rowData(airway)[["SYMBOL"]] <- mapIds(org.Hs.eg.db, rownames(airway), "SYMBOL", "ENSEMBL") rowData(airway)[["uniquifyFeatureNames"]] <- uniquifyFeatureNames( ID = rowData(airway)[["ENSEMBL"]], names = rowData(airway)[["SYMBOL"]] ) rownames(airway) <- rowData(airway)[["uniquifyFeatureNames"]] ``` ## Gene expression We also compute log-transformed counts, for a better visualisation of differential expression in the live app. ```{r} library("scuttle") airway <- logNormCounts(airway) ``` ## Differential expression analysis We run a standard Limma-Voom analysis using `limma::voom()`, `limma::lmFit()`, `limma::makeContrasts()`, and `limma::eBayes()`. ```{r, message=FALSE, warning=FALSE} library("edgeR") counts <- assay(airway, "counts") design <- model.matrix(~ 0 + dex + cell, data = colData(airway)) keep <- filterByExpr(counts, design) v <- voom(counts[keep,], design, plot=FALSE) fit <- lmFit(v, design) contr <- makeContrasts("dextrt - dexuntrt", levels = colnames(coef(fit))) tmp <- contrasts.fit(fit, contr) tmp <- eBayes(tmp) res_limma <- topTable(tmp, sort.by = "P", n = Inf) head(res_limma) ``` Then, we embed this set of differential expression results in the `?airway` object using the `iSEEde::embedContrastResults()` method. ```{r} library("iSEEde") airway <- iSEEde::embedContrastResults(res_limma, airway, name = "Limma-Voom", class = "limma") rowData(airway) ``` ## Pathways We prepare Gene Ontology gene sets of biological pathways using `r BiocStyle::Biocpkg("org.Hs.eg.db")`. Due to the use of `uniquifyFeatureNames()` above, we must first map pathway identifiers to the unique Ensembl gene identifier, to accurately perform pathway analysis using the feature identifiers matching those of the embedded differential expression results. ```{r} library("org.Hs.eg.db") pathways <- select(org.Hs.eg.db, keys(org.Hs.eg.db, "ENSEMBL"), c("GOALL"), keytype = "ENSEMBL") pathways <- subset(pathways, ONTOLOGYALL == "BP") pathways <- unique(pathways[, c("ENSEMBL", "GOALL")]) pathways <- merge(pathways, rowData(airway)[, c("ENSEMBL", "uniquifyFeatureNames")]) pathways <- split(pathways$uniquifyFeatureNames, pathways$GOALL) ``` ## Mapping pathways to genes Separately, we define and register a function that fetches the gene identifiers associated with a given pathway identifier. This function is required to transmit selections from pathway-level panels to feature-level panels. Due to the use of `uniquifyFeatureNames()` above, the function must first map to the unique Ensembl gene identifier, to accurately identify the corresponding value in `rownames(airway)`. ```{r} map_GO <- function(pathway_id, se) { pathway_ensembl <- mapIds(org.Hs.eg.db, pathway_id, "ENSEMBL", keytype = "GOALL", multiVals = "CharacterList")[[pathway_id]] pathway_rownames <- rownames(se)[rowData(se)[["gene_id"]] %in% pathway_ensembl] pathway_rownames } airway <- registerAppOptions(airway, Pathways.map.functions = list(GO = map_GO)) ``` ## Gene set enrichment analysis We run a standard GSEA analysis using `r BiocStyle::Biocpkg("fgsea")`. ```{r} library("fgsea") set.seed(42) stats <- na.omit(log2FoldChange(contrastResults(airway, "Limma-Voom"))) fgseaRes <- fgsea(pathways = pathways, stats = stats, minSize = 15, maxSize = 500) head(fgseaRes[order(pval), ]) ``` Then, we embed this set of pathway analysis results in the `airway` object, using the `?iSEEpathways::embedPathwaysResults` method. But first, we reorder the results by increasing p-value. Although not essential, this implicitly defines the default ordering of the table in the live app. ```{r} library("iSEEpathways") fgseaRes <- fgseaRes[order(pval), ] airway <- embedPathwaysResults( fgseaRes, airway, name = "fgsea (p-value)", class = "fgsea", pathwayType = "GO", pathwaysList = pathways, featuresStats = stats) airway ``` To showcase a choice of pathway analysis results in the live app, we repeat the process above, this time sorting by a different score that combines the log-transformed p-value and the absolute log-transformed fold-change. ```{r, warning=FALSE} stats <- na.omit( log2FoldChange(contrastResults(airway, "Limma-Voom")) * -log10(pValue(contrastResults(airway, "Limma-Voom"))) ) set.seed(42) fgseaRes <- fgsea(pathways = pathways, stats = na.omit(stats), minSize = 15, maxSize = 500) fgseaRes <- fgseaRes[order(pval), ] airway <- embedPathwaysResults( fgseaRes, airway, name = "fgsea (p-value & fold-change)", class = "fgsea", pathwayType = "GO", pathwaysList = pathways, featuresStats = stats) airway ``` ## Displaying additional pathway information For further user-friendliness in the live app, we define and register a function that displays details for the selected Gene Ontology gene set using the `r BiocStyle::Biocpkg("GO.db")` package. ```{r, warning=FALSE} library("GO.db") library("shiny") library("iSEE") go_details <- function(x) { info <- select(GO.db, x, c("TERM", "ONTOLOGY", "DEFINITION"), "GOID") html <- list(p(strong(info$GOID), ":", info$TERM, paste0("(", info$ONTOLOGY, ")"))) if (!is.na(info$DEFINITION)) { html <- append(html, list(p(info$DEFINITION))) } tagList(html) } airway <- registerAppOptions(airway, PathwaysTable.select.details = go_details) ``` ## Live app Finally, we configure the initial state and launch the live app. ```{r "start", message=FALSE, warning=FALSE} app <- iSEE(airway, initial = list( PathwaysTable(ResultName="fgsea (p-value)", Selected = "GO:0046324", PanelWidth = 4L), VolcanoPlot(RowSelectionSource = "PathwaysTable1", ColorBy = "Row selection", PanelWidth = 4L), ComplexHeatmapPlot(RowSelectionSource = "PathwaysTable1", PanelWidth = 4L, PanelHeight = 700L, CustomRows = FALSE, ColumnData = "dex", ClusterRows = TRUE, ClusterRowsDistance = "euclidean", AssayCenterRows = TRUE), FgseaEnrichmentPlot(ResultName="fgsea (p-value)", PathwayId = "GO:0046324", PanelWidth = 12L) )) if (interactive()) { shiny::runApp(app) } ``` ```{r, echo=FALSE, out.width="100%"} SCREENSHOT("screenshots/integration.png", delay=30) ``` # Trading off memory usage for speed {#speed-mapping} The function `map_GO()` that we defined earlier above -- to map a pathway identifier to a set of gene identifiers -- uses the `r BiocStyle::Biocpkg("org.Hs.eg.db")` package and the `?AnnotationDbi::select()` function. While memory-efficient, the repeated calls to the database of gene annotations introduce a bottleneck that limits the reactivity of the app. We can improve the speed of the app rendering by trading off an increased memory usage. Specifically, we can use the object `pathways` that we created earlier as a named list of pathway identifiers and character vectors of gene identifiers associated with each pathway, instead of querying the `GO.db` database. First, for the app to be self-contained, the list of pathways should be stored within the `airway` object itself. For instance, we store those in the `metadata()` of the `airway` object. ```{r} metadata(airway)[["pathways"]] <- list(GO = pathways) ``` Then, we can write a new, faster, function that fetches gene identifiers directly from that list rather than the database. The function should take as first argument a single pathway identifier, and the second argument must be called `se` to match the name of the `SummarizedExperiment` object used within the app. As a a trade off, the app now relies on the list of pathway annotations being available in the metadata of the `se` object, using additional memory for the benefit of dramatically faster access. ```{r} map_GO_v2 <- function(pathway_id, se) { pathway_list <- metadata(se)[["pathways"]][["GO"]] if (!pathway_id %in% names(pathway_list)) { warning("Pathway identifier %s not found.", sQuote(pathway_id)) return(character(0)) } pathway_list[[pathway_id]] } airway <- registerAppOptions(airway, Pathways.map.functions = list(GO = map_GO_v2)) ``` We can then launch a new instance of the app, using the same initial configuration, but the update `airway` object. ```{r "faster", message=FALSE, warning=FALSE} app <- iSEE(airway, initial = list( PathwaysTable(ResultName="fgsea (p-value)", Selected = "GO:0046324", PanelWidth = 4L), VolcanoPlot(RowSelectionSource = "PathwaysTable1", ColorBy = "Row selection", PanelWidth = 4L), ComplexHeatmapPlot(RowSelectionSource = "PathwaysTable1", PanelWidth = 4L, PanelHeight = 700L, CustomRows = FALSE, ColumnData = "dex", ClusterRows = TRUE, ClusterRowsDistance = "euclidean", AssayCenterRows = TRUE), FgseaEnrichmentPlot(ResultName="fgsea (p-value)", PathwayId = "GO:0046324", PanelWidth = 12L) )) if (interactive()) { shiny::runApp(app) } ``` ```{r, echo=FALSE, out.width="100%"} SCREENSHOT("screenshots/integration-faster.png", delay=20) ``` # Reproducibility The `r Biocpkg("iSEEpathways")` package `r Citep(bib[["iSEEpathways"]])` was made possible thanks to: * R `r Citep(bib[["R"]])` * `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])` * `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])` * `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])` * `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` * `r CRANpkg("sessioninfo")` `r Citep(bib[["sessioninfo"]])` * `r CRANpkg("testthat")` `r Citep(bib[["testthat"]])` This package was developed using `r BiocStyle::Biocpkg("biocthis")`. Code for creating the vignette ```{r createVignette, eval=FALSE} ## Create the vignette library("rmarkdown") system.time(render("integration.Rmd", "BiocStyle::html_document")) ## Extract the R code library("knitr") knit("integration.Rmd", tangle = TRUE) ``` Date the vignette was generated. ```{r reproduce1, echo=FALSE} ## Date the vignette was generated Sys.time() ``` Wallclock time spent generating the vignette. ```{r reproduce2, echo=FALSE} ## Processing time in seconds totalTime <- diff(c(startTime, Sys.time())) round(totalTime, digits = 3) ``` `R` session information. ```{r reproduce3, echo=FALSE} ## Session info library("sessioninfo") options(width = 120) session_info() ``` # Bibliography This vignette was generated using `r Biocpkg("BiocStyle")` `r Citep(bib[["BiocStyle"]])` with `r CRANpkg("knitr")` `r Citep(bib[["knitr"]])` and `r CRANpkg("rmarkdown")` `r Citep(bib[["rmarkdown"]])` running behind the scenes. Citations made with `r CRANpkg("RefManageR")` `r Citep(bib[["RefManageR"]])`. ```{r vignetteBiblio, results = "asis", echo = FALSE, warning = FALSE, message = FALSE} ## Print bibliography PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html")) ```