--- title: "TxRegInfra: support for TxRegQuery" author: "Vincent J. Carey, stvjc at channing.harvard.edu" date: "`r format(Sys.time(), '%B %d, %Y')`" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{TxRegInfra -- classes and methods for TxRegQuery} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: highlight: pygments number_sections: yes theme: united toc: yes --- ```{r setup,echo=FALSE,results="hide", eval=TRUE} suppressPackageStartupMessages({ library(TxRegInfra) library(GenomicFiles) }) ``` # Introduction TxRegQuery addresses exploration of transcriptional regulatory networks by integrating data on eQTL, digital genomic footprinting (DGF), DnaseI hypersensitivity binding data (DHS), and transcription factor binding site (TFBS) data. Owing to the volume of emerging tissue-specific data, special data modalities are used. # Managing bed file content with mongodb ## Querying the `txregnet` database We have a long-running server that will respond to queries. We focus on `r CRANpkg("mongolite")` as the interface. ### The connection ```{r lkmong, eval=TRUE} suppressPackageStartupMessages({ library(TxRegInfra) library(mongolite) library(Gviz) library(EnsDb.Hsapiens.v75) library(BiocParallel) register(SerialParam()) }) con1 = mongo(url=URL_txregInAWS(), db="txregnet") con1 ``` We will write methods that work with the 'fields' of this object. There is not much explicit reflectance in the mongolite API. The following is improvised and may be fragile: ```{r lkpar, eval=TRUE} parent.env(con1)$orig ``` ### Queries and aggregation If the `mongo` utility is available as a system command, we can get a list of collections in the database as follows. ```{r getl, eval=TRUE} if (verifyHasMongoCmd()) { head(c1 <- listAllCollections(url=URL_txregInAWS(), db="txregnet")) } ``` Otherwise, as long as `r CRANpkg("mongolite")` is installed, as long as we know the collection names of interest, we can use them as noted throughout this vignette. We can get a record from a given collection: ```{r getl2, eval=TRUE} mongo(url=URL_txregInAWS(), db="txregnet", collection="Adipose_Subcutaneous_allpairs_v7_eQTL")$find(limit=1) ``` Queries can be composed using JSON. We have a tool to generate queries that employ the mongodb aggregation method. Here we demonstrate this by computing, for each chromosome, the count and minimum values of the footprint statistic on CD14 cells. ```{r doagg, eval=TRUE} m1 = mongo(url = URL_txregInAWS(), db = "txregnet", collection="CD14_DS17215_hg19_FP") newagg = makeAggregator( by="chr", vbl="stat", op="$min", opname="min") ``` The JSON layout of this aggregating query is ``` [ { "$group": { "_id": ["$chr"], "count": { "$sum": [1] }, "min": { "$min": ["$stat"] } } } ] ``` Invocation returns a data frame: ```{r lkagggg, eval=TRUE} head(m1$aggregate(newagg)) ``` # An integrative container We need to bind the metadata and information about the mongodb. ## Sample metadata The following turns a very ad hoc filtering of the collection names into a DataFrame. ```{r getcold, eval=TRUE} # cd = makeColData() # works when mongo does cd = TxRegInfra::basicColData head(cd,2) ``` ## Extended RaggedExperiment ```{r domor1, eval=TRUE} rme0 = RaggedMongoExpt(con1, colData=cd) rme1 = rme0[, which(cd$type=="FP")] ``` A key method in development is subsetting the archive by genomic coordinates. ```{r lksb, cache=TRUE, eval=TRUE} s1 = sbov(rme1, GRanges("chr17", IRanges(38.07e6,38.09e6))) s1 dim(sa <- sparseAssay(s1, 3)) # compact gives segfault sa[953:956,c("fLung_DS14724_hg19_FP", "fMuscle_arm_DS17765_hg19_FP")] ``` # Visualizing coincidence ```{r mym, eval=TRUE} ormm = txmodels("ORMDL3", plot=FALSE, name="ORMDL3") sar = strsplit(rownames(sa), ":|-") an = as.numeric gr = GRanges(seqnames(ormm)[1], IRanges(an(sapply(sar,"[", 2)), an(sapply(sar,"[", 3)))) gr1 = gr gr1$score = 1-sa[,1] gr2 = gr gr2$score = 1-sa[,2] sc1 = DataTrack(gr1, name="Lung FP") sc2 = DataTrack(gr2, name="Musc/Arm FP") plotTracks(list(GenomeAxisTrack(), sc1, sc2, ormm), showId=TRUE) ```