--- title: "Continuous Data Analysis" shorttitle: "Continuous Data" package: knowYourCG output: rmarkdown::html_vignette fig_width: 6 fig_height: 5 vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{"3. Continuous Variable Enrichment Analysis"} %\VignetteEncoding{UTF-8} --- There are four testing scenarios depending on the type format of the query set and database sets. They are shown with the respective testing scenario in the table below. `testEnrichment`, `testEnrichmentSEA` are for Fisher's exact test and Set Enrichment Analysis respectively. ```{r ky9, echo = FALSE, results="asis"} library(knitr) df = data.frame( c("Correlation-based","Set Enrichment Analysis"), c("Set Enrichment Analysis","Fisher's Exact Test") ) colnames(df) <- c("Continuous Database Set", "Discrete Database Set") rownames(df) <- c("Continuous Query", "Discrete Query") kable(df, caption="Four knowYourCG Testing Scenarios") ``` # CONTINUOUS VARIABLE ENRICHMENT The query may be a named continuous vector. In that case, either a gene enrichment score will be calculated (if the database is discrete) or a Spearman correlation will be calculated (if the database is continuous as well). The three other cases are shown below using biologically relevant examples. To display this functionality, let's load two numeric database sets individually. One is a database set for CpG density and the other is a database set corresponding to the distance of the nearest transcriptional start site (TSS) to each probe. ```{r ky21, run-test-data, echo=TRUE, eval=TRUE, message=FALSE} library(knowYourCG) query <- getDBs("KYCG.MM285.designGroup")[["TSS"]] ``` ```{r ky22, echo=TRUE, eval=TRUE, message=FALSE} sesameDataCache(data_titles = c("KYCG.MM285.seqContextN.20210630")) res <- testEnrichmentSEA(query, "MM285.seqContextN") main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap") res[,main_stats] ``` The estimate here is enrichment score. > **NOTE:** Negative enrichment score suggests enrichment of the categorical database with the higher values (in the numerical database). Positive enrichment score represent enrichment with the smaller values. As expected, the designed TSS CpGs are significantly enriched in smaller TSS distance and higher CpG density. Alternatively one can test the enrichment of a continuous query with discrete databases. Here we will use the methylation level from a sample as the query and test it against the chromHMM chromatin states. ```{r ky23, warning=FALSE, eval=TRUE,message=FALSE} library(sesame) sesameDataCache(data_titles = c("MM285.1.SigDF")) beta_values <- getBetas(sesameDataGet("MM285.1.SigDF")) res <- testEnrichmentSEA(beta_values, "MM285.chromHMM") main_stats <- c("dbname", "test", "estimate", "FDR", "nQ", "nD", "overlap") res[,main_stats] ``` As expected, chromatin states `Tss`, `Enh` has negative enrichment score, meaning these databases are associated with small values of the query (DNA methylation level). On the contrary, `Het` and `Quies` states are associated with high methylation level. # SESSION INFO ```{r} sessionInfo() ```