%\VignetteIndexEntry{RefPlus Manual} %\VignetteKeywords{Preprocessing, Affymetrix} %\VignetteDepends{RefPlus} %\VignettePackage{RefPlus} \documentclass[a4paper]{article} \usepackage{amsmath,amssymb} \usepackage{graphicx} %\setlength{\parindent}{0cm} \setlength{\parskip}{18pt} %\renewcommand{\baselinestretch}{1.1} \newcommand{\RMA}{\texttt{RMA}} \newcommand{\RMAp}{\texttt{RMA+}} \newcommand{\RMApp}{\texttt{RMA++}} \newcommand{\refRMA}{\texttt{refRMA}} \begin{document} \title{\RMAp\ and \RMApp\ using the \texttt{RefPlus} package} \author{Kai-Ming Chang, Chris Harbron, Marie C South\\ kaiming@kfsyscc.org\\ Chris.Harbron@astrazeneca.com\\ Marie.C.South@astrazeneca.com } \date{Dec 23, 2008} \maketitle %\setlength{\parskip}{0pt} %\vspace{12pt} \begin{abstract} \noindent In this vignette, we introduce the ideas behind Extrapolation Strategy({\RMAp}) and Extrapolation Averaging ({\RMApp}) methods, and give examples of using the functions in this package. \end{abstract} \thispagestyle{empty} \section{Introduction} The Extrapolation Strategy and Extrapolation Averaging are Affymetrix GeneChip microarray data pre-processing methods proposed by Goldstein (2006). These methods were independently developed by Chang, Harbron and South (2006), termed {\RMAp} and {\RMApp}. Katz et al.\ (2006) also independently developed the {\RMAp} method, termed {\refRMA}. This vignette will use the ``{\RMAp}" and ``{\RMApp}" nomenclature for these algorithms. \RMAp\ is an extension to the \RMA\ algorithm by Irizarry et al.\ (2004), and \RMApp\ is a further extension based on the \RMAp\ method. The \RMAp\ algorithm calculates the microarray intensities using a pre-stored \RMA\ model trained on a reference microarray set (can be standard reference microarrays, microarrays from an independent study, or an incomplete set of microarrays in a study). \RMAp\ measurements of a microarray can be considered as an approximation to the \RMA\ measurements of this microarray when the microarray is \RMA ed with the reference set microarrays in one batch. \RMApp\ measurements of a microarray are the average of multiple \RMAp\ measurements of a microarray based on several reference sets. If the reference sets cover more information of the microarrays to be pre-processed than a single reference set does, the \RMApp\ measurements will provide a better approximation to the \RMA\ measurements. \section{\texttt{RMA+}} \RMAp\ procedure: \begin{enumerate} \item Fit the \RMA\ model on the reference set and store the normalizing quantiles and the estimated probe effects; \item Background correct the probe intensities of the microarrays to be pre-processed; \item Normalize the background-corrected probe intensities to the normalizing quantiles (reference quantiles); \item Derive the probeset intensity using the estimated probe effects and normalized background-corrected probe intensity data. \end{enumerate} Step 1 can be done using the \texttt{rma.para} function in the package. The normalizing quantiles and the estimated probe effects are returned. Step 2-4 can be done using the \texttt{rmaplus} function. Both functions provide an option of skipping the background correction step. In this case, the microarrays can be background-corrected independently. \section{\RMApp} \RMApp procedure \begin{enumerate} \item Fit multiple \RMA\ models on several reference sets and store the normalizing quantiles and the estimated probe effects of these reference sets; \item Calculate the \RMAp\ measurements of the microarrays of interest for each reference set; \item Average multiple \RMAp\ measurements of the microarray based on these reference sets. \end{enumerate} \section{Example} \subsection{\RMAp} The Dilution dataset in the \texttt{affydata} package consists of 4 microarray samples. <<>>= ##Use Dilution in affydata package library(RefPlus) library(affydata) data(Dilution) sampleNames(Dilution) @ Firstly, we calculate the \RMA\ measurements of the 4 microarays $Ex0$: <<>>= ##Calculate RMA intensities using the rma function. Ex0<-exprs(rma(Dilution)) @ Secondly, we form a reference set using the first 3 samples and derive the reference quantiles and the reference probe effects: <<>>= ##Background correct, estimate the probe effects, and calculate the ##RMA intensities using rma.para function. Para<-rma.para(Dilution[,1:3],bg=TRUE,exp=TRUE) Ex1 <- Para[[3]] @ Then, we calculate the \RMAp\ measurements of all microarrays $Ex2$. Figure 1 compares the \RMA\ measurements and the \RMAp\ measurements of these 4 microarrays. <<>>= ##Calculate the RMA+ intensity using rmaplus function. Ex2 <- rmaplus(Dilution, rmapara=Para, bg = TRUE) @ \begin{figure}[htbp] \begin{center} <>= par(mfrow=c(2,2)) plot(Ex0[,1],Ex2[,1],pch=".",main=sampleNames(Dilution)[1]) plot(Ex0[,2],Ex2[,2],pch=".",main=sampleNames(Dilution)[2]) plot(Ex0[,3],Ex2[,3],pch=".",main=sampleNames(Dilution)[3]) plot(Ex0[,4],Ex2[,4],pch=".",main=sampleNames(Dilution)[4]) @ \caption{\RMA\ (Ex0) vs. \RMAp\ (Ex2).} \end{center} \end{figure} \subsection{\RMApp} Now, we form another reference set using the 2-4 samples and calculate a new set of \RMAp\ measurements $Ex3$. <<>>= Para2 <- rma.para(Dilution[,2:4],bg=TRUE,exp=TRUE) Ex3 <- rmaplus(Dilution, rmapara=Para2, bg = TRUE) @ We can then obtain a set of \RMApp\ measurements by averaging these two sets of \RMAp\ measurements $Ex4$. Figure 2 compares the \RMA\ measurements and the \RMApp\ measurements of these 4 microarrays. <<>>= Ex4 <- (Ex2+Ex3)/2 @ \begin{figure}[htbp] \begin{center} <>= par(mfrow=c(2,2)) plot(Ex0[,1],Ex4[,1],pch=".",main=sampleNames(Dilution)[1]) plot(Ex0[,2],Ex4[,2],pch=".",main=sampleNames(Dilution)[2]) plot(Ex0[,3],Ex4[,3],pch=".",main=sampleNames(Dilution)[3]) plot(Ex0[,4],Ex4[,4],pch=".",main=sampleNames(Dilution)[4]) @ \caption{\RMA\ (Ex0) vs. \RMApp\ (Ex4).} \end{center} \end{figure} \newpage The root mean squares differences(RMSD) between \RMA\ measurements and 2 \RMAp\ measurements, are <<>>= sqrt(mean((Ex0-Ex2)^2)) sqrt(mean((Ex0-Ex3)^2)) @ and the RMSD between \RMA\ measurements and \RMApp\ measurements is <<>>= sqrt(mean((Ex0-Ex4)^2)) @ We can see that the \RMApp\ measurements can provide a better approximation to the \RMA\ measurements, which is consistant with the comparison between figure 1 and figure 2. \begin{thebibliography}{} \item[Chang, K.M.,] Harbron, C., South, M.C. (2006) ``An Exploration of Extensions to the RMA Algorithm," \textit{Available with the RefPlus package}. \item[Goldstein,D.R.] ``Partition Resampling and Exploration Averaging: Approximation Methods for Quantifying Gene Expression in Large Numbers of Short Oligonucleotide Arrays," \textit{Bioinformatics,} 22, 2364-2372. \item[Harbron,C.,] Chang,K.M., South,M.C. (2007) ``RefPlus : an R package extending the RMA Algorithm," \textit{Bioinformatics,} 23, 2493-2494. \item[Irizarry,R.A.,] Hobbs,B., Collin,F., Beazer-Barclay,Y.D., Antonellis,K.J., Scherf,U. and Speed,T.P. (2003) ``Exploration, Normalization, and Summaries of High density Oligonucleotide Array Probe Level Data," \textit{Biostatistics,} 4, 249-264. \item[Katz,S.,] Irizarry,R.A., Lin,X., Tripputi,M., Porter,M. (2006) ``A Summarization Approach for Affymetrix GeneChip Data Using a Reference Training Set from a Large, Biologically Diverse Database," \textit{BMC Bioinformatics,} 7, 464. \end{thebibliography} \end{document}