--- title: "Introduction to CTCF" author: - name: Mikhail Dozmorov affiliation: - Virginia Commonwealth University email: mikhail.dozmorov@gmail.com output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r doc_date()`" package: "`r pkg_ver('CTCF')`" vignette: > %\VignetteIndexEntry{Introduction to CTCF} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL, ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html warning = FALSE ) ``` `CTCF` defines an AnnotationHub resource representing genomic coordinates of FIMO-predicted CTCF binding sites with motif MA0139.1 (Jaspar). - Human (hg19, hg38) and mouse (mm9, mm10) genomes. - The binding sites were detected using the FIMO tool of the MEME suite using default settings. - Extra columns include motif name (MA0139.1), score, p-value, q-value, and the motif sequence. ## Installation instructions Get the latest stable `R` release from [CRAN](http://cran.r-project.org/). Then install `CTCF` using from [Bioconductor](http://bioconductor.org/) the following code: ```{r 'install', eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("CTCF") ``` ## Example ```{r get-data} suppressMessages(library(AnnotationHub)) ah <- AnnotationHub() query_data <- query(ah, "CTCF") query_data ``` The FIMO-predicted CTCF sites are named as "CTCF_", e.g., "CTCF_hg38". Use `query_data <- query(ah , "CTCF_hg38")` for a more targeted search. We can check the details about the object. ```{r} query_data["AH95566"] ``` And retrieve the object. ```{r} CTCF_hg38 <- query_data[["AH95566"]] CTCF_hg38 ``` Note that the default q-value cutoff is 0.5. Looking at the q-value distribution: ```{r echo=FALSE} knitr::include_graphics("../man/figures/CTCF_hg38_qvalue.png") ``` one may decide to use a more stringent cutoff. E.g., filtering by q-value less than 0.3 filters out more than half of the predicted sites. The remaining sites may be considered as high-confidence CTCF sites. ```{r} # Check length before filtering length(CTCF_hg38) # Filter and check length after filtering CTCF_hg38 <- CTCF_hg38[CTCF_hg38$q.value < 0.3] length(CTCF_hg38) ``` ## CTCF GRanges for other organisms ```{r eval = FALSE} # hg19 CTCF coordinates CTCF_hg19 <- query_data[["AH95565"]] # mm9 CTCF coordinates CTCF_mm9 <- query_data[["AH95567"]] # mm10 CTCF coordinates CTCF_mm10 <- query_data[["AH95568"]] ``` See [../inst/scripts/make-data.R](inst/scripts/make-data.R) how to create the CTCF GRanges objects. ## Citation Below is the citation output from using `citation('CTCF')` in R. Please run this yourself to check for any updates on how to cite __CTCF__. ```{r 'citation', eval = requireNamespace('CTCF')} print(citation("CTCF"), bibtex = TRUE) ``` Date the vignette was generated. ```{r reproduce1, echo=FALSE} ## Date the vignette was generated Sys.time() ``` `R` session information. ```{r reproduce3, echo=FALSE} ## Session info library("sessioninfo") options(width = 120) session_info() ```