--- title: "AlphaBeta" author: "Y.Shahryary, Rashmi Hazarika, Frank Johannes " date: "`r Sys.Date()`" output: pdf_document: toc: true toc_depth: 4 number_sections: true urlcolor: blue geometry: margin=0.5in vignette: > %\VignetteIndexEntry{AlphaBeta} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} options(width=120) knitr::opts_chunk$set( collapse = FALSE, eval = FALSE, comment = " ", tidy.opts =list(width.cutoff=80), tidy = TRUE, size="small" ) ``` \newpage ```{r, include=FALSE} library(AlphaBeta) library(data.table) ``` \section{Introduction} **AlphaBeta** is a computational method for estimating epimutation rates and spectra from high-throughput DNA methylation data in plants. The method has been specifically designed to: **1.** Analyze 'germline' epimutations in the context of multi-generational mutation accumulation lines (MA-lines). **2.** Analyze 'somatic' epimutations in the context of plant development and aging. Heritable changes in cytosine methylation can arise stochastically in plant genomes independently of DNA sequence alterations. These so-called ‘spontaneous epimutations’ appear to be a byproduct of imperfect DNA methylation maintenance during mitotic and meitotic cell divisions. Accurate estimates of the rate and spectrum of these stochastic events are necessary to be able to quantify how epimutational processes shape methylome diversity in the context of plant evolution, development and aging. Here we describe AlphaBeta, a computational method for estimating epimutation rates and spectra from pedigree-based high-throughput DNA methylation data in plants. The method requires that the topology of the pedigree is known, which is typically the case in the construction of mutation accumulation lines (MA-lines) in sexually or clonally reproducing plant species. However, the method also works for inferring somatic epimutations in long-lived perrenials, such as trees, using leaf methylomes and coring data as input. In this case, AlphaBeta treats the tree branching structure as an intra-organismal phylogeny of somatic lineages that carry information about the epimutational history of each branch. \section{Preparing Files} *NOTE: In this tutorial we are reading methylome files generated using Bioconductor package Methimpute:* You can find more information here: [Methimpute package](https://bioconductor.org/packages/release/bioc/html/methimpute.html) ## List of files List of filenames containing generations and lineage. *User must define "generations.fn" file.* *The structure of "generations.fn" should be same as in our example* ```{r} # Sample "generations.fn" file generation.fn <- system.file("extdata","generations.fn", package="AlphaBeta") file <- fread(generation.fn) head(file) ``` ```{r,include=FALSE} df<-read.csv(generation.fn) df$filename<-sub("^",paste0(dirname(generation.fn),"/"),df$filename ) write.csv(df, file = paste0(dirname(generation.fn),"/tmp_generation.fn"),row.names=FALSE,quote=FALSE) generation.fn<- system.file("extdata","tmp_generation.fn", package="AlphaBeta") ``` ## Generate divergence matrix Estimating epimutation rates from high-throughput DNA methylation data. Generation of divergence matrix and calculation of methylation levels. ```{r, eval=FALSE, r,paged.print=FALSE} dMatrix(genTable =generation.fn, cytosine = "CG", posteriorMaxFilter = 0.99 ) ``` ```{r paged.print=FALSE} #Sample output from dMatrix function head(fread(system.file("extdata/dm","AB-dMatrix-CG-0.99.csv", package="AlphaBeta"))) ``` ## Generate methylation proportions ```{r,eval=FALSE, paged.print=FALSE} rc.meth.lvl(genTable = generation.fn , cytosine = "CG", posteriorMaxFilter = 0.99 ) ``` ```{r , paged.print=FALSE} #Sample output from proportions function head(fread(system.file("extdata/dm","AB-methprop-CG-0.99.csv", package="AlphaBeta"))) ``` ## Information about Sample file. This file containing information on generation times and pedigree lineages. (should be provide manually) ```{r , paged.print=FALSE} #Sample file head(fread(system.file("extdata/dm","sampleInfo.csv", package="AlphaBeta"))) ``` ## File containing lineage branch points. (should be provide manually) ```{r , paged.print=FALSE} #Sample file head(fread(system.file("extdata/dm","branchPoints.csv", package="AlphaBeta"))) ``` \section{Germline epimutations} Models ABneutral, ABselectMM and ABselectUU can be used to estimate the rate of spontaneous epimutations from pedigree-based high-throughput DNA methylation data. The models are generally designed for pedigree data arising from selfing diploid species. ## Calculate divergence times Divergence time (delta t) is calculated as follows: delta t = t1 + t2 - 2*t0, where t1 is the time of sample 1 (in generations), t2 is the time of sample 2 (in generations) and t0 is the time (in generations) of the most recent common founder of samples 1 and 2. To calculate divergence times of the pedigree should be provided in the form of 4 files as shown below. ```{r } props.name <- read.table(system.file("extdata/dm","AB-methprop-CG-0.99.csv", package="AlphaBeta"), sep="\t", header=TRUE) sample.info <-read.table(system.file("extdata/dm","sampleInfo.csv", package="AlphaBeta"), sep="\t", header=TRUE) branch.points <-read.table(system.file("extdata/dm","branchPoints.csv", package="AlphaBeta"), sep="\t", header=TRUE) dmatrix <-read.table(system.file("extdata/dm","AB-dMatrix-CG-0.99.csv", package="AlphaBeta"), sep="\t", header=TRUE) context <- "CG" ``` calculate divergence times of the pedigree: ```{r } pedigree <- convertDMATRIX(sample.info=sample.info, branch.points=branch.points, dmatrix=dmatrix, design="sibling") head(pedigree) ``` This is a manual step for inspecting the divergence data and removing outlier samples (if any): ```{r echo=FALSE} dt <- pedigree[,2] + pedigree[,3] - 2 * pedigree[,1] plot(dt, pedigree[,"D.value"], ylab="Divergence value", xlab=expression(paste(Delta, " t"))) ``` Read in the proportions data: ```{r} outliers <- "none" dmatrix <- dmatrix[which(dmatrix[,1] != outliers), ] dmatrix <- dmatrix[which(dmatrix[,2] != outliers), ] pedigree <- pedigree[c(as.numeric(rownames(dmatrix))),] props <- props.name[which(as.character(props.name[,2]) == context),] props <- props.name[which(!is.element(props.name[,1], outliers) == TRUE),] ``` Calculate initial proportions of unmethylated cytosines after removal of outliers: ```{r} p0uu_in <- 1-mean(as.numeric(as.character(props[,3]))) p0uu_in ``` ## Run Models ### Run Model with no selection (ABneutral) This model assumes that heritable gains and losses in cytosine methylation are selectively neutral. ```{r,output.lines=5 } # output directory output.data.dir<-paste0(getwd(),"/") output <- ABneutral( pedigree.data = pedigree, p0uu = p0uu_in, eqp = p0uu_in, eqp.weight = 1, Nstarts = 2, out.dir = output.data.dir, out.name = "CG_global_estimates_ABneutral" ) ``` *NOTE: it is recommended to use at least 50 Nstarts to achieve best solutions* Showing summary output of only output: ```{r,output.lines=5 } summary(output) ``` ```{r,output.lines=5} head(output$pedigree) ``` ### Run model with selection against spontaneous gain of methylation (ABselectMM) This model assumes that heritable losses of cytosine methylation are under negative selection. The selection parameter is estimated. ```{r} output <- ABselectMM( pedigree.data = pedigree, p0uu = p0uu_in, eqp = p0uu_in, eqp.weight = 1, Nstarts = 2, out.dir = output.data.dir, out.name = "CG_global_estimates_ABselectMM" ) summary(output) ``` ### Run model with selection against spontaneous loss of methylation (ABselectUU) This model assumes that heritable gains of cytosine methylation are under negative selection. The selection parameter is estimated. ```{r} output <- ABselectUU( pedigree.data = pedigree, p0uu = p0uu_in, eqp = p0uu_in, eqp.weight = 1, Nstarts = 2, out.dir = output.data.dir, out.name = "CG_global_estimates_ABselectUU" ) summary(output) ``` ### Run model that considers no accumulation of epimutations (ABnull) This is the null model of no accumulation. ```{r} output <- ABnull(pedigree.data = pedigree, out.dir = output.data.dir, out.name = "CG_global_estimates_ABnull") summary(output) ``` ## Comparison of different models and selection of best model ### Testing ABneutral vs. ABnull ```{r} file1 <- system.file("extdata/models/", "CG_global_estimates_ABneutral.Rdata", package = "AlphaBeta") file2 <- system.file("extdata/models/", "CG_global_estimates_ABnull.Rdata", package = "AlphaBeta") out <- FtestRSS(pedigree.select = file1, pedigree.null = file2) out$Ftest ``` ### Testing ABselectMM vs.ABneutral ```{r} file1 <- system.file("extdata/models/", "CG_global_estimates_ABselectMM.Rdata", package = "AlphaBeta") file2 <- system.file("extdata/models/", "CG_global_estimates_ABnull.Rdata", package = "AlphaBeta") out <- FtestRSS(pedigree.select = file1, pedigree.null = file2) out$Ftest ``` ### Testing ABselectUU vs.ABneutral ```{r} file1 <- system.file("extdata/models/", "CG_global_estimates_ABselectUU.Rdata", package = "AlphaBeta") file2 <- system.file("extdata/models/", "CG_global_estimates_ABnull.Rdata", package = "AlphaBeta") out <- FtestRSS(pedigree.select = file1, pedigree.null = file2) out$Ftest ``` ## Bootstrap analysis with the best model i.e ABneutral in our case ```{r} inputModel <- system.file("extdata/models/", "CG_global_estimates_ABneutral.Rdata", package = "AlphaBeta") # Bootstrapping models CG output.data.dir <-paste0(getwd(),"/") Boutput <- BOOTmodel( pedigree.data = inputModel, Nboot = 2, out.dir = output.data.dir, out.name = "Boot_CG_global_estimates_ABneutral" ) summary(Boutput) ``` ```{r} Boutput$standard.errors ``` \section{Somatic epimutations} Models ABneutralSOMA, ABselectMMSOMA and ABselectUUSOMA can be used to estimate the rate of spontaneous epimutations from pedigree-based high-throughput DNA methylation data. The models are generally designed for pedigree data arising from clonally or asexually propagated diploid species. The models can also be applied to long-lived perrenials, such as trees, using leaf methylomes and coring data as input. In this case, the tree branching structure is treated as an intra-organismal pedigree (or phylogeny) of somatic lineages. ## Loading data and generation of pedigree ```{r } props.name <- read.table(system.file("extdata/soma/","AB-methprop-CG-0.99.csv", package="AlphaBeta"), sep="\t", header=TRUE,stringsAsFactors = FALSE) sample.info <-read.table(system.file("extdata/soma/","sampleInfo.csv", package="AlphaBeta"), sep="\t", header=TRUE,stringsAsFactors = FALSE) dmatrix <-read.table(system.file("extdata/soma/","AB-dMatrix-CG-0.99.csv", package="AlphaBeta"), sep="\t", header=TRUE,stringsAsFactors = FALSE) ``` Samples: ```{r } head(props.name) head(sample.info) head(dmatrix) ``` ## Generate pedigree from the input files *NOTE: Here in our example "makePHYLO" function calculates divergence times of a tree branching structure with total age of 330 years derived from coring-based measurements.* ```{r} # 'tall' is total age of tree pedigree.out <- makePHYLO(tall=330, pedigree = dmatrix, sample.info = sample.info) pedigree.out <- pedigree.out[[1]] head(pedigree.out) ``` ## Calculate the proportion of unmethylated cytosines ```{r} p0uu_in <- mean(props[,3]) p0uu_in ``` ## Run Models ### Run Model with no selection (ABneutralSOMA) This model assumes that somatically heritable gains and losses in cytosine methylation are selectively neutral. ```{r} outneutral <- ABneutralSOMA( pedigree.data = pedigree.out, p0uu = p0uu_in, eqp = p0uu_in, eqp.weight = 0.001, Nstarts = 2, out.dir = output.data.dir, out.name = "ABneutralSOMA_CG_estimates" ) summary(outneutral) ``` ```{r} head(outneutral$pedigree) ``` ### Run model with selection against spontaneous gain of methylation (ABselectMMSOMA) This model assumes that somatically heritable losses of cytosine methylation are under negative selection. The selection parameter is estimated. ```{r} outselectMM <- ABselectMMSOMA( pedigree.data = pedigree.out, p0uu = p0uu_in, eqp = p0uu_in, eqp.weight = 0.001, Nstarts = 2, out.dir = output.data.dir, out.name = "ABselectMMSOMA_CG_estimates" ) summary(outselectMM) ``` ### Run model with selection against spontaneous loss of methylation (ABselectUUSOMA) This model assumes that somatically heritable gains of cytosine methylation are under negative selection. The selection parameter is estimated. ```{r} outselectUU <- ABselectUUSOMA( pedigree.data = pedigree.out, p0uu = p0uu_in, eqp = p0uu_in, eqp.weight = 0.001, Nstarts = 2, out.dir = output.data.dir, out.name = "ABselectUUSOMA_CG_estimates" ) summary(outselectUU) ``` \section{R session info } ```{r} sessionInfo() ```