--- title: "VplotR" author: "Jacques Serizay" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{VplotR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, eval = TRUE, echo=FALSE, results="hide", warning=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) suppressPackageStartupMessages({ library(GenomicRanges) library(ggplot2) library(VplotR) }) ``` - [Introduction](#introduction) - [Overview](#overview) - [Installation](#installation) - [Importing sequencing datasets](#importing-sequencing-datasets) - [Using `importPEBamFiles()` function](#using-importpebamfiles-function) - [Provided datasets](#provided-datasets) - [Fragment size distribution](#fragment-size-distribution) - [Vplot(s)](#vplots) - [Single Vplot](#single-vplot) - [Multiple Vplots](#multiple-vplots) - [Vplots normalization](#vplots-normalization) - [Footprints](#footprints) - [Local fragment distribution](#local-fragment-distribution) - [Session Info](#session-info) ## Introduction ### Overview VplotR is an R package streamlining the process of generating V-plots, i.e. two-dimensional paired-end fragment density plots. It contains functions to import paired-end sequencing bam files from any type of DNA accessibility experiments (e.g. ATAC-seq, DNA-seq, MNase-seq) and can produce V-plots and one-dimensional footprint profiles over single or aggregated genomic loci of interest. The R package is well integrated within the Bioconductor environment and easily fits in standard genomic analysis workflows. Integrating V-plots into existing analytical frameworks has already brought additional insights in chromatin organization (Serizay et al., 2020). The main user-level functions of VplotR are `getFragmentsDistribution()`, `plotVmat()`, `plotFootprint()` and `plotProfile()`. * `getFragmentsDistribution()` computes the distribution of fragment sizes over sets of genomic ranges; * `plotVmat()` is used to compute fragment density and generate V-plots; * `plotFootprint()` generates the MNase-seq or ATAC-seq footprint at a set of genomic ranges. * `plotProfile()` is used to plot the distribution of paired-end fragments at a single locus of interest. ### Installation VplotR can be installed from Bioconductor: ```{r eval = FALSE} if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("VplotR") library("VplotR") ``` ## Importing sequencing datasets ### Using `importPEBamFiles()` function Paired-end .bam files are imported using the `importPEBamFiles()` function as follows: ```{r eval = TRUE} library(VplotR) bamfile <- system.file("extdata", "ex1.bam", package = "Rsamtools") fragments <- importPEBamFiles( bamfile, shift_ATAC_fragments = TRUE ) fragments ``` ### Provided datasets Several datasets are available for this package: * Sets of tissue-specific ATAC-seq experiments in young adult C. elegans (Serizay et al., 2020): ```{r, eval = TRUE} data(ce11_proms) ce11_proms data(ATAC_ce11_Serizay2020) ATAC_ce11_Serizay2020 ``` * MNase-seq experiment in yeast (Henikoff et al., 2011) and ABF1 binding sites: ```{r, eval = TRUE} data(ABF1_sacCer3) ABF1_sacCer3 data(MNase_sacCer3_Henikoff2011) MNase_sacCer3_Henikoff2011 ``` ## Fragment size distribution A preliminary control to check the distribution of fragment sizes (regardless of their location relative to genomic loci) can be performed using the `getFragmentsDistribution()` function. ```{r, eval = TRUE} df <- getFragmentsDistribution( MNase_sacCer3_Henikoff2011, ABF1_sacCer3 ) p <- ggplot(df, aes(x = x, y = y)) + geom_line() + theme_ggplot2() p ``` ## Vplot(s) ### Single Vplot Once data is imported, a V-plot of paired-end fragments over loci of interest is generated using the `plotVmat()` function: ```{r, eval = TRUE} p <- plotVmat(x = MNase_sacCer3_Henikoff2011, granges = ABF1_sacCer3) p ``` ### Multiple Vplots The generation of multiple V-plots can be parallelized using a list of parameters: ```{r, eval = TRUE} list_params <- list( "MNase\n@ ABF1" = list(MNase_sacCer3_Henikoff2011, ABF1_sacCer3), "MNase\n@ random loci" = list( MNase_sacCer3_Henikoff2011, sampleGRanges(ABF1_sacCer3) ) ) p <- plotVmat( list_params, cores = 1 ) p ``` For instance, ATAC-seq fragment density can be visualized at different classes of ubiquitous and tissue-specific promoters in *C. elegans*. ```{r, eval = TRUE} list_params <- list( "Germline ATACseq\n@ Ubiq. proms" = list( ATAC_ce11_Serizay2020[['Germline']], ce11_proms[ce11_proms$which.tissues == 'Ubiq.'] ), "Germline ATACseq\n@ Germline proms" = list( ATAC_ce11_Serizay2020[['Germline']], ce11_proms[ce11_proms$which.tissues == 'Germline'] ), "Neuron ATACseq\n@ Ubiq. proms" = list( ATAC_ce11_Serizay2020[['Neurons']], ce11_proms[ce11_proms$which.tissues == 'Ubiq.'] ), "Neuron ATACseq\n@ Neuron proms" = list( ATAC_ce11_Serizay2020[['Neurons']], ce11_proms[ce11_proms$which.tissues == 'Neurons'] ) ) p <- plotVmat( list_params, cores = 1, nrow = 2, ncol = 5 ) p ``` ### Vplots normalization Different normalization approaches are available using the `normFun` argument. * Un-normalized raw counts can be plotted by specifying `normFun = 'none'`. ```{r, eval = TRUE} # No normalization p <- plotVmat( list_params, cores = 1, nrow = 2, ncol = 5, verbose = FALSE, normFun = 'none' ) p ``` * By default, plots are normalized by the library depth of the sequencing run and by the number of loci used to compute fragment density. ```{r, eval = TRUE} # Library depth + number of loci of interest (default) p <- plotVmat( list_params, cores = 1, nrow = 2, ncol = 5, verbose = FALSE, normFun = 'libdepth+nloci' ) p ``` * Alternatively, heatmaps can be internally z-scored or scaled to a specific quantile. ```{r, eval = TRUE} # Zscore p <- plotVmat( list_params, cores = 1, nrow = 2, ncol = 5, verbose = FALSE, normFun = 'zscore' ) p # Quantile p <- plotVmat( list_params, cores = 1, nrow = 2, ncol = 5, verbose = FALSE, normFun = 'quantile', s = 0.99 ) p ``` ## Footprints VplotR also implements a function to profile the footprint from MNase or ATAC-seq over sets of genomic loci. For instance, CTCF is known for its ~40-bp large footprint at its binding loci. ```{r} p <- plotFootprint( MNase_sacCer3_Henikoff2011, ABF1_sacCer3 ) p ``` ## Local fragment distribution VplotR provides a function to plot the distribution of paired-end fragments over an individual genomic window. ```{r} data(MNase_sacCer3_Henikoff2011_subset) genes_sacCer3 <- GenomicFeatures::genes(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene:: TxDb.Scerevisiae.UCSC.sacCer3.sgdGene ) p <- plotProfile( fragments = MNase_sacCer3_Henikoff2011_subset, window = "chrXV:186,400-187,400", loci = ABF1_sacCer3, annots = genes_sacCer3, min = 20, max = 200, alpha = 0.1, size = 1.5 ) p ``` ## Session Info ```{r echo = TRUE, collapse = TRUE, eval = TRUE} sessionInfo() ```