---
title: "JASPAR2022"
output: BiocStyle::html_document
author:
name: Damir Baranasic
affiliation: Imperial College London, Faculty of Medicine, Institute of Clinical Sciences, Hammersmith Campus, Du Cane Road, W12 0NN, London
bibliography: JASPAR2022.bib
vignette: >
%\VignetteIndexEntry{JASPAR2022}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r, echo=FALSE, results="hide", warning=FALSE}
suppressPackageStartupMessages({
library(JASPAR2022)
library(TFBSTools)
})
```
# Introduction
JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.
The easiest way to use the JASPAR2022 data package [@10.1093/nar/gkab1113] is by `TFBSTools` package interface [@Tan:2016], which provides functions to retrieve and manipulate data from the JASPAR database. This vignette demonstrates how to use those functions.
```{r setup}
library(JASPAR2022)
library(TFBSTools)
```
# Retrieving matrices from JASPAR2022 by ID or name
Matrices from JASPAR can be retrieved using either `getMatrixByID` or ` getMatrixByName` function by providing a matrix ID or a matrix name from JASPAR, respectively. These functions accept either a single element as the ID/name parameter or a vector of values. The former case returns a `PFMatrix` object, while the later one returns a `PFMatrixList` with multiple `PFMatrix` objects.
```{r example_name_id, tidy=TRUE}
## the user assigns a single matrix ID to the argument ID
pfm <- getMatrixByID(JASPAR2022, ID="MA0139.1")
## the function returns a PFMatrix object
pfm
```
The user can utilise the PFMatrix object for further analysis and visualisation. Here is an example of how to plot a sequence logo of a given matrix using functions available in `TFBSTools` package.
```{r seq_logo}
seqLogo(toICM(pfm))
```
```{r multiple_matrix_id}
## the user assigns multiple matrix IDs to the argument ID
pfmList <- getMatrixByID(JASPAR2022, ID=c("MA0139.1", "MA1102.1"))
## the function returns a PFMatrix object
pfmList
## PFMatrixList can be subsetted to extract enclosed PFMatrix objects
pfmList[[2]]
```
`getMatrixByName` retrieves matrices by name. If multiple matrix names are supplied, the function returns a PFMatrixList object.
```{r getMatrixByName_example}
pfm <- getMatrixByName(JASPAR2022, name="Arnt")
pfm
pfmList <- getMatrixByName(JASPAR2022, name=c("Arnt", "Ahr::Arnt"))
pfmList
```
# The use of filtering criteria
The `getMatrixSet` function fetches all matrices that match criteria defined by the named arguments, and it returns a `PFMatrixList` object.
```{r example_set, tidy=TRUE}
## select all matrices found in a specific species and constructed from the SELEX experiment
opts <- list()
opts[["species"]] <- 9606
opts[["type"]] <- "SELEX"
opts[["all_versions"]] <- TRUE
PFMatrixList <- getMatrixSet(JASPAR2022, opts)
PFMatrixList
## retrieve all matrices constructed from SELEX experiment
opts2 <- list()
opts2[["type"]] <- "SELEX"
PFMatrixList2 <- getMatrixSet(JASPAR2022, opts2)
PFMatrixList2
```
Additional details about TFBS matrix analysis can be found in the [TFBSTools](https://bioconductor.org/packages/release/bioc/html/TFBSTools.html) documantation.
# Session Info
Here is the output of `sessionInfo()` on the system on which this document was compiled:
```{r session_info, echo=FALSE}
sessionInfo()
```
# Bibliography