--- title: "Identifying DMCs using Bayesian functional regressions in BS-Seq data" author: - name: Farhad Shokoohi affiliation: Department of Mathematical Sciences, University of Nevada, Las Vegas email: shokoohi@icloud.com date: "`r doc_date()`" package: DMCFB abstract: "`Instructions on using the DMCFB package.`" vignette: > %\VignetteIndexEntry{Identifying DMCs using Bayesian functional regressions in BS-Seq data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The `DMCFB` package is a pipeline to identify differentially methylated cytosine (DMC) in bisulfite sequencing data using Bayesian functional regression models. In what follows we provides some guidelines on how to read your data and analyze them. # Reading data ## Reading bisulfite data (using files) The R-method `readBismark()` is used to read bisulfite data files that are created by `Bismark`. Each file must include six columns, with no header, that represent - Chromosome - Start position in the chromosome - End position in the chromosome - Methylation level (m/(m+u)) - Number of methylated reads (m) - Number of un-methylated reads (u) and each row is a cytosine (or a small region) in DNA. The function `readBismark(, )` has two inputs: 'the paths of the files' and 'the names of the files'. Using this function an object of class `BSDMC` is created. Extra information about data such as Age, Gender, Group, etc, must be assigned to the object using `DataFrame` function. As an example, we have provided three files in the package that can be read as follows: ```{r, eval=TRUE} library(DMCFB) fn <- list.files(system.file("extdata",package = "DMCFB")) fn.f <- list.files(system.file("extdata",package="DMCFB"), full.names=TRUE) OBJ <- readBismark(fn.f, fn, mc.cores = 2) cdOBJ <- DataFrame(Cell = factor(c("BC", "TC","Mono"), levels = c("BC", "TC", "Mono")), row.names = c("BCU1568","BCU173","BCU551")) colData(OBJ) <- cdOBJ OBJ ``` ## Reading bisulfite data (using matrices) Alternatively, one can use two integer matrices and a `DataFrame` to create `BSDMC` object using `cBSDMC()` function. One matrix includes the read-depth and the other one includes methylation reads. The columns of these matrices represent samples and the rows represent cytosine positions. Additional information about the genomic positions and covariates must be stored in a `DataFrame` and then assign to the object. The following exampel shows the details. ```{r, eval=TRUE, message=FALSE} library(DMCFB) set.seed(1980) nr <- 1000 nc <- 8 metht <- matrix(as.integer(runif(nr * nc, 0, 100)), nr) methc <- matrix(rbinom(n=nr*nc,c(metht),prob = runif(nr*nc)),nr,nc) methl <- methc/metht r1 <- GRanges(rep('chr1', nr), IRanges(1:nr, width=1), strand='*') names(r1) <- 1:nr cd1 <- DataFrame(Group=rep(c('G1','G2'),each=nc/2),row.names=LETTERS[1:nc]) OBJ2 <- cBSDMC(rowRanges=r1,methReads=methc,totalReads=metht, methLevels=methl,colData=cd1) OBJ2 ``` # Identifying DMCs To identify DMCs, one need to use the function `findDMCFB()` function. The function ```{r, eval=FALSE} library(DMCFB) start.time <- Sys.time() path0 <- "..//BCData/" # provide the path to the files namelist.new <- list.files(path0,pattern="blk",full.names=F) namelist.new.f <- list.files(path0,pattern="blk",full.names=T) type <- NULL for(i in seq_along(namelist.new)){ type[i] <- unlist(strsplit(namelist.new[i], split=c('_'), fixed=TRUE))[2] } type table(type) indTC <- which(type=="TC") indBC <- which(type=="BC") indMono <- which(type=="Mono") namelist.new <- namelist.new[c(indBC,indMono,indTC)] namelist.new.f <- namelist.new.f[c(indBC,indMono,indTC)] BLKDat <- readBismark(namelist.new.f, namelist.new, mc.cores = 2) colData1 <- DataFrame(Group = factor( c(rep("BC",length(indBC)), rep("Mono",length(indMono)), rep("TC", length(indTC))), levels = c("BC", "Mono", "TC")), row.names = colnames(BLKData)) colData(BLKDat) <- colData1 BLK.BC.Mono.TC <- sort(BLKDat) DMC.obj = findDMCFB(object = BLKDat, bwa = 30, bwb = 30, nBurn = 300, nMC = 300, nThin = 1, alpha = 5e-5, pSize = 500, sfiles = FALSE) ``` # Figures To plot DMCs one can use the `plotDMCFB()` function to plot an `BSDMC` object that resulted from running `findDMCFB()` function. To illustrate use the following example: ```{r, fig.width=6} library(DMCFB) set.seed(1980) nr <- 1000 nc <- 8 metht <- matrix(as.integer(runif(nr * nc, 0, 100)), nr) methc <- matrix(rbinom(n=nr*nc,c(metht),prob = runif(nr*nc)),nr,nc) methl <- methc/metht r1 <- GRanges(rep('chr1', nr), IRanges(1:nr, width=1), strand='*') names(r1) <- 1:nr cd1 <- DataFrame(Group=rep(c('G1','G2'),each=nc/2),row.names=LETTERS[1:nc]) OBJ1 <- cBSDMC(rowRanges=r1,methReads=methc,totalReads=metht, methLevels=methl,colData=cd1) OBJ2 = findDMCFB(object = OBJ1, bwa = 30, bwb = 30, nBurn = 10, nMC = 10, nThin = 1, alpha = 0.05, pSize = 500, sfiles = FALSE) plotDMCFB(OBJ2, region = c(1,400), nSplit = 2) ``` # Session info ```{r} sessionInfo() ```