--- title: "GRaNIE Workflow Example" author: "Christian Arnold" date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('GRaNIE')`" abstract: > In this vignette, we present GRaNIE (**G**ene **R**egul**a**tory **N**etwork **I**nference including **E**nhancers), a framework to reconstruct predictive enhancer-mediated regulatory network models that are based on integrating of expression and chromatin accessibility/activity pattern across individuals, and provide a comprehensive resource of cell-type specific gene regulatory networks for particular cell types. For an extended biological motivation, see the first section below. In the following, we summarize how to use the `GRaNIE` package in a real-world example, illustrate most features and how to work with a `GRaNIE` object. Importantly, this vignette will be continuously updated whenever new functionality becomes available or when we receive user feedback. vignette: > %\VignetteIndexEntry{Workflow example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: code_folding: hide --- ```{css, echo=FALSE} pre { max-height: 300px; overflow-y: auto; } pre[class] { max-height: 100px; } ``` ```{css, echo=FALSE} .scroll-100 { max-height: 100px; overflow-y: auto; background-color: inherit; } ``` ```{css, echo=FALSE} .scroll-200 { max-height: 200px; overflow-y: auto; background-color: inherit; } ``` ```{css, echo=FALSE} .scroll-300 { max-height: 300px; overflow-y: auto; background-color: inherit; } ``` # Motivation and Summary Genetic variants associated with diseases often affect non-coding regions, thus likely having a regulatory role. To understand the effects of genetic variants in these regulatory regions, identifying genes that are modulated by specific regulatory elements (REs) is crucial. The effect of gene regulatory elements, such as enhancers, is often cell-type specific, likely because the combinations of transcription factors (TFs) that are regulating a given enhancer have cell-type specific activity. This TF activity can be quantified with existing tools such as `diffTF` and captures differences in binding of a TF in open chromatin regions. Collectively, this forms a enhancer-mediated gene regulatory network (`eGRN`) with cell-type and data-specific TF-RE and RE-gene links. Here, we reconstruct such a `eGRN` using bulk RNA-seq and open chromatin (e.g., using ATAC-seq or ChIP-seq for open chromatin marks) and optionally TF activity data. Our network contains different types of links, connecting TFs to regulatory elements, the latter of which is connected to genes in the vicinity or within the same chromatin domain (*TAD*). We use a statistical framework to assign empirical FDRs and weights to all links using a background-based approach. **For a more detailed description of the package, its mode of action, guidelines, recommendations, limitations, scope, etc,please see the [Package Details Vignette on the GRaNIE website](https://grp-zaugg.embl-community.io/GRaNIE/articles/GRaNIE_packageDetails.html).** # Example data Before we start with the package, let's retrieve some example data! For the purpose of this vignette, the data we will use is taken from [here](https://zenodo.org/record/1188300#.X370PXUzaSN) [1](https://doi.org/10.1016/j.celrep.2019.10.106 "The original publication explaining the method and motivation in detail"), has been minimally processed to meet the requirements of the `GRaNIE` package and consists of the following files: - ATAC-seq peaks, raw counts (originally around 75,000, genome-wide, here filtered to around 60,500) - RNA-Seq data, raw counts (originally for around 35,000 genes, here filtered to around 19,000) - sample metadata with additional sample-specific information In general, the dataset is from human macrophages (both naive and IFNg primed) of healthy individuals and various stimulations / infections (naive vs primed and infected with SL1344 vs not), with 4 groups in total: control/infected(SL1344) and naive/primed(IFNg). However, here for the example data, all \~30 samples are from IFNg primed and infected cells (as summarized as `IFNg_SL1344` in the sample metadata column `condition`). Furthermore, the example dataset is accompanied by the following files: - genome-wide transcription factor binding site predictions for 6 selected TFs, along with a translation table to link TF names to their corresponding Ensembl IDs. For each TF, a gzipped BED file has been created with predicted TF binding sites. The files have been generated with `PWMScan` and the `HOCOMOCO` database, see [2](https://difftf.readthedocs.io "the *ReadTheDocs* help for *diffTF*") for details. # Example Workflow In the following example, you will use the example data to construct a `eGRN` from ATAC-seq, RNA-seq data as well transcription factor (TF) data. ```{r Summary of the community enrichment", class.output="scroll-200"} GRN = plotCommunitiesEnrichment(GRN, plotAsPDF = FALSE, pages = c(5)) ``` ### TF enrichment analyses In analogy to community enrichment, we can also calculate enrichment based on TFs via their target genes they are connected to. Running a TF enrichment analyses is straight forward, with the parameter `n` we can control the number of TFs to run the enrichment for - the function runs the enrichment for the top connected TFs. Thus, `n=3` equals running the enrichment for the top 3 connected TFs. Here, we show the results for one of the TFs, *EGR1.0.A*, as well as a summary across all top 3 connected TFs, in analogy the results for the community enrichment. Let's first calculate the TF enrichment.This may take a while. We omit the output here. ```{r TFEnrichmentCal, echo=FALSE, eval = FALSE, class.output="scroll-200"} GRN = calculateTFEnrichment(GRN, ontology = "GO_BP") ``` Now, we can plot it, similar to the enrichment results before: ```{r TFEnrichment, echo=TRUE, fig.cap="Enrichment summary for EGR1.0.A", fig.height=7, fig.width = 12, class.output="scroll-200"} GRN = plotTFEnrichment(GRN, plotAsPDF = FALSE, n = 3, pages = c(1)) ``` ```{r TFEnrichment2, echo=TRUE, fig.cap="Enrichment summary for selected TFs and the whole eGRN network", fig.height=7, fig.width = 15, class.output="scroll-200"} GRN = plotTFEnrichment(GRN, plotAsPDF = FALSE, n = 3, pages = c(5)) ``` ## Wrapping up We are now finished with the main workflow, all that is left to do is to save our `GRaNIE` object to disk so we can load it at a later time point without having to repeat the analysis. We recommend to run the convenience function `deleteIntermediateData()` beforehand that aims to reduce its size by deleting some intermediate data that may still be stored within the object. For more parameter details, see the R help (`?deleteIntermediateData`). Finally, as we did already in the middle of the workflow, we save the object finally in rds format. ```{r saveObject3, echo=TRUE, include=TRUE, eval=FALSE, class.output="scroll-200"} GRN = deleteIntermediateData(GRN) saveRDS(GRN, GRN_file_outputRDS) ``` # How to continue? From here on, possibilities are endless, and you can further investigate patterns and trends in the data! We hope that the `GRaNIE` package is useful for your research and encourage you to contact us if you have any question or feature request! # Session Info ```{r, class.output="scroll-200"} sessionInfo() ```