--- title: "JASPAR2020" output: BiocStyle::html_document author: name: Damir Baranasic affiliation: Imperial College London, Faculty of Medicine, Institute of Clinical Sciences, Hammersmith Campus, Du Cane Road, W12 0NN, London bibliography: JASPAR2020.bib vignette: > %\VignetteIndexEntry{JASPAR2020} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, echo=FALSE, results="hide", warning=FALSE} suppressPackageStartupMessages({ library(JASPAR2020) library(TFBSTools) }) ``` # Introduction JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 157 PFMs were updated (125 for vertebrates, 28 for plants, and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. The easiest way to use the JASPAR2020 data package [@Fornes:2019] is by `TFBSTools` package interface [@Tan:2016], which provides functions to retrieve and manipulate data from the JASPAR database. This vignette demonstrates how to use those functions. ```{r setup} library(JASPAR2020) library(TFBSTools) ``` # Retrieving matrices from JASPAR2020 by ID or name Matrices from JASPAR can be retrieved using either `getMatrixByID` or ` getMatrixByName` function by providing a matrix ID or a matrix name from JASPAR, respectively. These functions accept either a single element as the ID/name parameter or a vector of values. The former case returns a `PFMatrix` object, while the later one returns a `PFMatrixList` with multiple `PFMatrix` objects. ```{r example_name_id, tidy=TRUE} ## the user assigns a single matrix ID to the argument ID pfm <- getMatrixByID(JASPAR2020, ID="MA0139.1") ## the function returns a PFMatrix object pfm ``` The user can utilise the PFMatrix object for further analysis and visualisation. Here is an example of how to plot a sequence logo of a given matrix using functions available in `TFBSTools` package. ```{r seq_logo} seqLogo(toICM(pfm)) ``` ```{r multiple_matrix_id} ## the user assigns multiple matrix IDs to the argument ID pfmList <- getMatrixByID(JASPAR2020, ID=c("MA0139.1", "MA1102.1")) ## the function returns a PFMatrix object pfmList ## PFMatrixList can be subsetted to extract enclosed PFMatrix objects pfmList[[2]] ``` `getMatrixByName` retrieves matrices by name. If multiple matrix names are supplied, the function returns a PFMatrixList object. ```{r getMatrixByName_example} pfm <- getMatrixByName(JASPAR2020, name="Arnt") pfm pfmList <- getMatrixByName(JASPAR2020, name=c("Arnt", "Ahr::Arnt")) pfmList ``` # The use of filtering criteria The `getMatrixSet` function fetches all matrices that match criteria defined by the named arguments, and it returns a `PFMatrixList` object. ```{r example_set, tidy=TRUE} ## select all matrices found in a specific species and constructed from the SELEX experiment opts <- list() opts[["species"]] <- 9606 opts[["type"]] <- "SELEX" opts[["all_versions"]] <- TRUE PFMatrixList <- getMatrixSet(JASPAR2020, opts) PFMatrixList ## retrieve all matrices constructed from SELEX experiment opts2 <- list() opts2[["type"]] <- "SELEX" PFMatrixList2 <- getMatrixSet(JASPAR2020, opts2) PFMatrixList2 ``` Additional details about TFBS matrix analysis can be found in the [TFBSTools](https://bioconductor.org/packages/release/bioc/html/TFBSTools.html) documantation. # Session Info Here is the output of `sessionInfo()` on the system on which this document was compiled: ```{r session_info, echo=FALSE} sessionInfo() ``` # Bibliography