--- title: "Updating methylSig code" author: "Raymond G. Cavalcante" date: "`r Sys.Date()`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Updating methylSig code} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(methylSig) ``` # Introduction The purpose of this vignette is to show users how to retrofit their `methylSig` < 0.99.0 code to work with the refactor in version 0.99.0 and later. # Reading Data ## Old methylSig In versions < 0.99.0 of `methylSig`, the `methylSigReadData()` function read Bismark coverage files, Bismark genome-wide CpG reports, or MethylDackel bedGraphs. Additionally, users could destrand the data, filter by coverage, and filter SNPs. ```{r eval = FALSE} meth = methylSigReadData( fileList = files, pData = pData, assembly = 'hg19', destranded = TRUE, maxCount = 500, minCount = 10, filterSNPs = TRUE, num.cores = 1, fileType = 'cytosineReport') ``` ## New methylSig In versions >= 0.99.0 of `methylSig`, the user should read data with `bsseq::read.bismark()` and then apply functions that were once bundled within `methylSigReadData()`. ```{r read} files = c( system.file('extdata', 'bis_cov1.cov', package='methylSig'), system.file('extdata', 'bis_cov2.cov', package='methylSig') ) bsseq_stranded = bsseq::read.bismark( files = files, colData = data.frame(row.names = c('test1','test2')), rmZeroCov = FALSE, strandCollapse = FALSE ) ``` After reading data, filter by coverage. Note, we are changing our dataset to something we can use with the downstream functions. ```{r filter_by_coverage} # Load data for use in the rest of the vignette data(BS.cancer.ex, package = 'bsseqData') bs = BS.cancer.ex[1:10000] bs = filter_loci_by_coverage(bs, min_count = 5, max_count = 500) ``` If the locations of C-to-T and G-to-A SNPs are known, or some other set of location should be removed: ```{r filter_by_location} # Construct GRanges object remove_gr = GenomicRanges::GRanges( seqnames = c('chr21', 'chr21', 'chr21'), ranges = IRanges::IRanges( start = c(9411552, 9411784, 9412099), end = c(9411552, 9411784, 9412099) ) ) bs = filter_loci_by_location(bs = bs, gr = remove_gr) ``` # Tiling Data ## Old methylSig In versions < 0.99.0 of `methylSig`, the `methylSigTile()` function combined aggregating CpG data over pre-defined tiles and genomic windows. ```{r eval = FALSE} # For genomic windows, tiles = NULL windowed_meth = methylSigTile(meth, tiles = NULL, win.size = 10000) # For pre-defined tiles, tiles should be a GRanges object. ``` ## New methylSig In versions >= 0.99.0 of `methylSig`, tiling is separated into two functions, `tile_by_regions()` and `tile_by_windows()`. Users should chooose one or the other. ```{r tile_by_windows} windowed_bs = tile_by_windows(bs = bs, win_size = 10000) ``` ```{r tile_by_regions} # Collapsed promoters on chr21 and chr22 data(promoters_gr, package = 'methylSig') promoters_bs = tile_by_regions(bs = bs, gr = promoters_gr) ``` # Testing ## MethylSig Test ### Old methylSig In versions < 0.99.0 of `methylSig`, the `methylSigCalc` function had a `min.per.group` parameter to determine how many samples per group had to have coverage in order to be tested. ```{r eval = FALSE} result = methylSigCalc( meth = meth, comparison = 'DR_vs_DS', dispersion = 'both', local.info = FALSE, local.winsize = 200, min.per.group = c(3,3), weightFunc = methylSig_weightFunc, T.approx = TRUE, num.cores = 1) ``` ### New methylSig In versions >= 0.99.0 of `methylSig`, the `min.per.group` functionality is performed by a separate function `filter_loci_by_group_coverage()`. Also note the change in form to define dispersion calculations, and the use of local information. ```{r filter_by_group_coverage} # Look a the phenotype data for bs bsseq::pData(bs) # Require at least two samples from cancer and two samples from normal bs = filter_loci_by_group_coverage( bs = bs, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) ``` After removing loci with insufficient information, we can now use the `diff_methylsig()` test. ```{r diff_methylsig} # Test cancer versus normal with dispersion from both groups diff_gr = diff_methylsig( bs = bs, group_column = 'Type', comparison_groups = c('case' = 'cancer', 'control' = 'normal'), disp_groups = c('case' = TRUE, 'control' = TRUE), local_window_size = 0, t_approx = TRUE, n_cores = 1) ``` ## DSS Test ### Old methylSig In versions < 0.99.0 of `methylSig`, the `methylSigDSS` function also had a `min.per.group` parameter to determine how many samples per group had to have coverage. Users also couldn't specify which methylation groups to recover. The form of `design`, `formula`, and `contrast`, remain the same in versions >= 0.99.0. ```{r eval = FALSE} contrast = matrix(c(0,1), ncol = 1) result_dss = methylSigDSS( meth = meth, design = design1, formula = '~ group', contrast = contrast, group.term = 'group', min.per.group=c(3,3)) ``` ### New methylSig In versions >= 0.99.0 of `methylSig`, the single `methylSigDSS()` function is replaced by a fit function `diff_dss_fit()` and a test functiotn `diff_dss_test()`. As with `diff_methylsig()`, users should ensure enough samples have sufficient coverage with the `filter_loci_by_group_coverage()` function. The `design` and `formula` are unchanged in their forms. If a continuous covariate is to be tested, `filter_loci_by_group_coverage()` should be skipped, as there are no groups. In prior versions of `methylSigDSS()`, this was not possible, and the group constraints were incorrectly applied prior to testing on a continuous covariate. ```{r filter_by_group_coverage2, eval = FALSE} # IF NOT DONE PREVIOUSLY # Require at least two samples from cancer and two samples from normal bs = filter_loci_by_group_coverage( bs = bs, group_column = 'Type', c('cancer' = 2, 'normal' = 2)) ``` ```{r diff_dss_fit_simple} # Test the simplest model with an intercept and Type diff_fit_simple = diff_dss_fit( bs = bs, design = bsseq::pData(bs), formula = as.formula('~ Type')) ``` The `contrast` parameter is also changed in its form. Note the, additional parameters to specify how to recover group methylation. `methylation_group_column` and `methylation_groups` should be specified for group versus group comparisons. For continuous covariates, `methylation_group_column` is sufficient, and the samples will be grouped into top/bottom 25 percentile based on the continuous covariate column name given in `methylation_group_column`. ```{r diff_dss_test_simple} # Test the simplest model for cancer vs normal # Note, 2 rows corresponds to 2 columns in diff_fit_simple$X simple_contrast = matrix(c(0,1), ncol = 1) diff_simple_gr = diff_dss_test( bs = bs, diff_fit = diff_fit_simple, contrast = simple_contrast, methylation_group_column = 'Type', methylation_groups = c('case' = 'cancer', 'control' = 'normal')) ``` # Session Info ```{r sessionInfo} sessionInfo() ```