%\VignetteIndexEntry{crlmmDownstream} %\VignetteKeywords{copy number, genotype, SNP} %\VignettePackage{VanillaICE} \documentclass{article} \usepackage{amsmath} \usepackage{graphicx} \usepackage[numbers]{natbib} \usepackage{color} \usepackage[margin=1in]{geometry} \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rpackage}[1]{\textit{#1}} \newcommand{\Rfunction}[1]{\texttt{#1}} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rclass}[1]{\texttt{#1}} \newcommand{\R}{\textsf{R}} \newcommand{\hmmoptions}{\Robject{HmmOptions}} \newcommand{\hmmparam}{\Robject{HmmParameter}} \newcommand{\crlmm}{\Rpackage{crlmm}} \newcommand{\oligo}{\Rpackage{oligo}} \title{Integration with the crlmm package for copy number inference} \author{Robert Scharpf} \begin{document} \maketitle <>= library(oligoClasses) library(VanillaICE) library(crlmm) library(IRanges) library(foreach) @ We load a portion of chromosome 8 from 2 HapMap samples that were processed using the \Rpackage{crlmm} package. <>= data(cnSetExample, package="crlmm") @ The data \Robject{cnSetExample} is an object of class \Rclass{CNSet}. We coerce the \Rclass{CNSet} object to a \Rclass{SnpArrayExperiment} that contains information on copy number (log R ratios) and B allele frequencies. <>= se <- as(cnSetExample, "SnpArrayExperiment") @ \section*{Wave correction} To correct for genomic waves that correlate with GC content [refs], we use the \R{} package \Rpackage{ArrayTV} -- an approach adapted from the wave correction methods proposed by Benjamini and Speed for next generation sequencing platforms \cite{Benjamini2012}. In the following code-chunk, we select a subset of the samples in the study to evaluate the genomic window for wave correction. See the ArrayTV vignette for details. For large datasets, one could randomly select 20 or 25 samples to compute the window, and then use a pre-selected window for wave correction on the remaining samples. <>= library(ArrayTV) i <- seq_len(ncol(se)) increms <- c(10,1000,100e3) wins <- c(100,10e3,1e6) res <- gcCorrect(lrr(se), increms=increms, maxwins=wins, returnOnlyTV=FALSE, verbose=TRUE, build="hg18", chr=chromosome(se), starts=start(se)) se2 <- se assays(se2)[["cn"]] <- res$correctedVals @ <>= ## TODO: correct for GC bias by loess se2 <- se @ \section*{HMM} To identify CNVs, we fit a 6-state hidden markov model from estimates of the B allele frequency and log R ratios. A \Rfunction{hmm} method is defined for the \Rclass{BafLrrSetList} class, and we apply the method directly with a few parameters that change the arguments from their default values. For example, the \texttt{TAUP} parameter scales the transition probability matrix. Larger values of \texttt{TAUP} makes it more expensive to transition from the normal copy number state to states with altered copy number. <>= res <- hmm2(se2) @ The object \Robject{res} can be filtered and putative CNVs can be visually inspected with the low-level summaries. Further details on such post-hoc analyses are provided in the section 'Inspecting,Filtering, and plotting HMM results' in the \texttt{VanillaICE} vignette. \section*{Session Information} The version number of R and packages loaded for generating the vignette were: <>= toLatex(sessionInfo()) @ \bibliography{ice}{} \bibliographystyle{plain} \end{document}