%\VignetteIndexEntry{charm Vignette} %\VignetteDepends{charmData, BSgenome.Hsapiens.UCSC.hg18} %\VignetteKeywords{} %\VignettePackage{charm} \documentclass{article} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \usepackage{graphicx} \usepackage{hyperref} \usepackage{pdfpages} \begin{document} \title{Using the charm package to estimate DNA methylation levels and find differentially methylated regions} \date{March, 2011} \author{Martin Aryee\footnote{aryee@jhu}, Peter Murakami, Rafael Irizarry} \maketitle \begin{center} Johns Hopkins School of Medicine / Johns Hopkins School of Public Health\\Baltimore, MD, USA \end{center} <>= options(width=60) options(continue=" ") options(prompt="R> ") @ \section{Introduction} The Bioconductor package \Rpackage{charm} can be used to analyze DNA methylation data generated using McrBC fractionation and two-color Nimblegen microarrays. It is customized for use with data the from the custom CHARM microarray \cite{IrizarryGenomeRes2008}, but can also be applied to many other Nimblegen designs. The preprocessing and normalization methods are described in detail in \cite{AryeeBiostatistics2011}. Functions include: \begin{itemize} \item Quality control \item Finding suitable control probes for normalization \item Percentage methylation estimates \item Identification of differentially methylated regions \end{itemize} As input we will need raw Nimblegen data (.xys) files and a corresponding annotation package built with pdInfoBuilder. This vignette uses the following packages: \begin{itemize} \item \Rpackage{charm}: contains the analysis functions \item \Rpackage{charmData}: an example dataset \item \Rpackage{pd.charm.hg18.example}: the annotation package for the example dataset \item \Rpackage{BSgenome.Hsapiens.UCSC.hg18}: A BSgenome object containing genomic sequence used for finding non-CpG control probes \end{itemize} Each sample is represented by two xys files corresponding to the untreated (green) and methyl-depleted (red) channels. The 532.xys and 635.xys suffixes indicate the green and red channels respectively. \section{Analyzing data from the custom CHARM microarray} Load the \Rpackage{charm} package: <>= library(charm) library(charmData) @ \section{Read in raw data} Get the name of your data directory (in this case, the example data): <>= dataDir <- system.file("data", package="charmData") dataDir @ First we read in the sample description file: <>= phenodataDir <- system.file("extdata", package="charmData") pd <- read.delim(file.path(phenodataDir, "phenodata.txt")) phenodataDir pd @ A valid sample description file should contain at least the following (arbitrarily named) columns: \begin{itemize} \item a filename column \item a sample ID column \item a group label column (optional) \end{itemize} The sample ID column is used to pair the methyl-depleted and untreated data files for each sample. The group label column is used when identifying differentially methylated regions between experimental groups. The \Rcode{validatePd} function can be used to validate the sample description file. When called with only a sample description data frame and no further options \Rcode{validatePd} will try to guess the contents of the columns. <>= res <- validatePd(pd) @ Now we read in the raw data. The \Rcode{readCharm} command makes the assumption (unless told otherwise) that the two xys files for a sample have the same file name up to the suffixes 532.xys (untreated) and 635.xys (methyl-depleted). <>= rawData <- readCharm(files=pd$filename, path=dataDir, sampleKey=pd) rawData @ \section{Array quality assessment} We can calculate array quality scores and generate a pdf report with the \Rcode{qcReport} command. A useful quick way of assessing data quality is to examine the untreated channel where we expect every probe to have signal. Very low signal intensities on all or part of an array can indicate problems with hybridization or scanning. The CHARM array and many other designs include background probes that do not match any genomic sequence. Any signal at these background probes can be assumed to be the result of optical noise or cross-hybridization. Since the untreated channel contains total DNA a successful hybridization would have strong signal for all untreated channel genomic probes. The array signal quality score (pmSignal) is calculated as the average percentile rank of the signal robes among these background probes. A score of 100 means all signal probes rank above all background probes (the ideal scenario). <>= qual <- qcReport(rawData, file="qcReport.pdf") qual @ The PDF quality report is shown in Appendix A. Three quality metrics are calculated for each array: \begin{enumerate} \item Average signal strength: the average percentile rank of untreated channel signal probes among the background (anti-genomic) probes. \item Untreated channel signal standard deviation. The array is divided into a series of rectangular blocks and the average signal level calculated for each. Since probes are arranged randomly on the array there should be no large differences between blocks. Arrays with spatial artifacts have a larg standard deviation between blocks. \item Methyl-depleted channel signal standard deviation. \end{enumerate} \section{Percentage methylation estimates and differentially methylated regions (DMRs)} We now calculate probe-level percentage methylation estimates for each sample. As a first step we need to identify a suitable set of unmethylated control probes from CpG-free regions to be used in normalization. <>= library(BSgenome.Hsapiens.UCSC.hg18) ctrlIdx <- getControlIndex(rawData, subject=Hsapiens) @ The minimal code required to estimate methylation would be \Rcode{p <- methp(rawData, controlIndex=ctrlIdx)}. However, it is often useful to get \Rcode{methp} to produce a series of diagnostic density plots to help identify non-hybridization quality issues. The \Rcode{plotDensity} option specifies the name of the output pdf file, and the optional \Rcode{plotDensityGroups} can be used to give groups different colors. <>= grp <- pData(rawData)$tissue p <- methp(rawData, controlIndex=ctrlIdx, plotDensity="density.pdf", plotDensityGroups=grp) head(p) @ The density plots are shown in Appendix B. We can now identify differentially methylated regions using \Rcode{dmrFinder}: <>= dmr <- dmrFinder(rawData, p=p, groups=grp, compare=c("colon", "liver", "colon", "spleen")) @ <>= names(dmr) names(dmr$tabs) head(dmr$tabs[[1]]) @ When called without the \Rcode{compare} option, \Rcode{dmrFinder} performs all pairwise comparisons between the groups. \bibliography{charmVignette}{} \bibliographystyle{plain} \section{Appendix A: Quality report} \includepdf[pages=-]{qcReport.pdf} \section{Appendix B: Density plots} Each row corresponds to one stage of the normalization process (Raw data, After spatial and background correction, after within-sample normalization, after between-sample normalization, percentage methylation estimates). The left column shows all probes, while the right column shows control probes. \includepdf{density.pdf} \section{Details} This document was written using: <<>>= sessionInfo() @ \end{document}