--- title: "Get started" author: "Kim Philipp Jablonski, Martin Pirkl" date: "`r Sys.Date()`" graphics: yes output: BiocStyle::html_document bibliography: bibliography.bib vignette: > %\VignetteIndexEntry{Get started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Overview One cause of diseases like cancer is the deregulation of signalling pathways. The interaction of two or more genes is changed and cell behaviour is changed in the malignant tissue. The estimation of causal effects from observational data has previously been used to elucidate gene interactions. We extend this notion to compute Differential Causal Effects (DCE). We compare the causal effects between two conditions, such as a malignant tissue (e.g., from a tumor) and a healthy tissue to detect differences in the gene interactions. However, computing causal effects solely from given observations is difficult, because it requires reconstructing the gene network beforehand. To overcome this issue, we use prior knowledge from literature. This largely improves performance and makes the estimation of DCEs more accurate. Overall, we can detect pathways which play a prominent role in tumorigenesis. We can even pinpoint specific interaction in the pathway that make a large contribution to the rise of the disease. # Installation ```{r eval=FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("dce") ``` # Load required packages Load `dce` package and other required libraries. ```{r message=FALSE} # fix "object 'guide_edge_colourbar' of mode 'function' was not found" # when building vignettes # (see also https://github.com/thomasp85/ggraph/issues/75) library(ggraph) library(curatedTCGAData) library(TCGAutils) library(SummarizedExperiment) library(tidyverse) library(cowplot) library(graph) library(dce) set.seed(42) ``` # Introductory example To demonstrate the basic idea of Differential Causal Effects (DCEs), we first artificially create a wild-type network by setting up its adjacency matrix. The specified edge weights describe the direct causal effects and total causal effects are defined accordingly [@pearl2010causal]. In this way, the detected dysregulations are endowed with a causal interpretation and spurious correlations are ignored. This can be achieved by using valid adjustment sets, assuming that the underlying network indeed models causal relationships accurately. In a biological setting, these networks correspond, for example, to a KEGG pathway [@kanehisa2004kegg] in a healthy cell. Here, the edge weights correspond to proteins facilitating or inhibiting each others expression levels. ```{r} graph_wt <- matrix(c(0, 0, 0, 1, 0, 0, 1, 1, 0), 3, 3) rownames(graph_wt) <- colnames(graph_wt) <- c("A", "B", "C") graph_wt ``` In case of a disease, these pathways can become dysregulated. This can be expressed by a change in edge weights. ```{r} graph_mt <- graph_wt graph_mt["A", "B"] <- 2.5 # dysregulation happens here! graph_mt cowplot::plot_grid( plot_network(graph_wt, edgescale_limits = c(-3, 3)), plot_network(graph_mt, edgescale_limits = c(-3, 3)), labels = c("WT", "MT") ) ``` By computing the counts based on the edge weights (root nodes are randomly initialized), we can generate synthetic expression data for each node in both networks. Both `X_wt` and `X_mt` then induce causal effects as defined in their respective adjacency matrices. ```{r} X_wt <- simulate_data(graph_wt) X_mt <- simulate_data(graph_mt) X_wt %>% head ``` Given the network topology (without edge weights!) and expression data from both WT and MT conditions, we can estimate the difference in causal effects for each edge between the two conditions. These are the aforementioned Differential Causal Effects (DCEs). ```{r} res <- dce(graph_wt, X_wt, X_mt, solver = "lm") res %>% as.data.frame %>% drop_na ``` Visualizing the result shows that we can recover the dysregulation of the edge from `A` to `B`. Note that since we are computing total causal effects, the causal effect of `A` on `C` has changed as well. ```{r} plot(res) + ggtitle("Differential Causal Effects between WT and MT condition") ``` # Application to real data Pathway dysregulations are a common cancer hallmark [@hanahan2011hallmarks]. It is thus of interest to investigate how the causal effect magnitudes in relevant pathways vary between normal and tumor samples. ## Retrieve gene expression data As a showcase, we download breast cancer (BRCA) RNA transcriptomics profiling data from TCGA [@tomczak2015cancer]. ```{r} brca <- curatedTCGAData( diseaseCode = "BRCA", assays = c("RNASeq2*"), version = "2.0.1", dry.run = FALSE ) ``` This will retrieve all available samples for the requested data sets. These samples can be classified according to their site of origin. ```{r} sampleTables(brca) data(sampleTypes, package = "TCGAutils") sampleTypes %>% filter(Code %in% c("01", "06", "11")) ``` We can extract Primary Solid Tumor and matched Solid Tissue Normale samples. ```{r} # split assays brca_split <- splitAssays(brca, c("01", "11")) # only retain matching samples brca_matched <- as(brca_split, "MatchedAssayExperiment") brca_wt <- assay(brca_matched, "01_BRCA_RNASeq2GeneNorm-20160128") brca_mt <- assay(brca_matched, "11_BRCA_RNASeq2GeneNorm-20160128") ``` ## Retrieve biological pathway of interest KEGG [@kanehisa2004kegg] provides the breast cancer related pathway `hsa05224`. It can be easily retrieved using `dce`. ```{r} pathways <- get_pathways(pathway_list = list(kegg = c("Breast cancer"))) brca_pathway <- pathways[[1]]$graph ``` Luckily, it shares all genes with the cancer data set. ```{r} shared_genes <- intersect(nodes(brca_pathway), rownames(brca_wt)) glue::glue( "Covered nodes: {length(shared_genes)}/{length(nodes(brca_pathway))}" ) ``` ## Estimate Differential Causal Effects We can now estimate the differences in causal effects between matched tumor and normal samples on a breast cancer specific pathway. ```{r warning=FALSE} res <- dce::dce(brca_pathway, t(brca_wt), t(brca_mt), solver = "lm") ``` Interpretations may now begin. ```{r} res %>% as.data.frame %>% drop_na %>% arrange(desc(abs(dce))) %>% head plot(res, nodesize = 20, labelsize = 1, use_symlog = TRUE) ``` # Session information ```{r} sessionInfo() ``` # References