% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{metahdep Primer} %\VignetteDepends{metahdep} %\VignettePackage{metahdep} \documentclass[12pt, a4paper]{article} \title{Vignette for metahdep: an R package for hierarchical dependence in meta-analysis} \author{John R. Stevens$^1$ and Gabriel Nicholas$^2$} \SweaveOpts{echo=TRUE} \usepackage{a4wide} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \parskip .5em \begin{document} \maketitle \begin{center} 1. Associate Professor of Statistics, Department of Mathematics and Statistics, and Center for Integrated Biosystems, Utah State University (\verb7http://www.stat.usu.edu/~jrstevens7)\\ 2. MS Biostatistician, Department of Biostatistics and Medical Informatics, University of Wisconsin at Madison\\ \end{center} \tableofcontents \section{Introduction} The \Rpackage{metahdep} package provides tools to conduct a meta-analysis of effect size estimates, particularly when hierarchical dependence is present (Stevens and Taylor, 2009). The package includes convenient commands for the meta-analysis of gene expression studies. We assume the reader is familiar with meta-analysis of effect size estimates, and here adopt the framework used in Stevens and Taylor (2009). This package vignette provides a guide for the use of these tools, using two examples (Sections \ref{Ex1} and \ref{Ex2}). The first example, in Section 2, shows the main utility of the \Rpackage{metahdep} package for the meta-analysis of gene expression studies, which motivated the development of this package (to save computation time). Some familiarity with gene expression technology will help with this first example, but is not necessary for the use of the tools in this package, as demonstrated by the second example, in Section 3, which involves foreign language learning.\\ {\bf Note:} If you use this package for gene expression studies, please cite Stevens and Nicholas (2009). If you use this package for general applications, please cite Stevens and Taylor (2009). \section{Meta-Analysis of Gene Expression Studies} \label{Ex1} The tools in the \Rpackage{metahdep} package can be used to systematically combine the results of multiple gene expression studies, accounting for both sampling and hierarchical dependence among studies, and accounting for fundamental differences between studies (Stevens and Nicholas, 2009). Sampling dependence occurs when two (or more) effect size estimates are based on the same sample of data. Hierarchical dependence occurs when two (or more) studies are conducted by the same lab or research team. The \Rpackage{metahdep} package is specifically designed for cases where each study reports for each gene an effect size estimate of differential expression between the same two conditions. In the example that follows, we describe a hypothetical group of similar gene expression studies, and refer to a ``Tissue'' difference among studies. In reality this could be any covariate describing differences among studies, and multiple covariates are possible. For purposes of illustration in this vignette, we suppose that four separate studies have been conducted, each using microarrays to look for genes exhibiting differential expression from the same control condition to the same treatment condition. Study 1 conducted experiments using two tissues (Tissue 0 and Tissue 1), testing for differential expression (control vs. treatment) in each tissue, using the hgu133a Affymetrix GeneChip. Study 2 was conducted by the same lab as Study 1 (call it Lab 1), and tested for differential expression (control vs. treatment) in Tissue 0, using the hgu95a Affymetrix GeneChip. Study 3 was conducted by another lab (call it Lab 2), testing for differential expression (control vs. treatment) in Tissue 0, using the hgu95av2 Affymetrix GeneChip. Finally, Study 4 was conducted by Lab 3, testing for differential expression (control vs. treatment) in Tissue 1 using the hgu133b Affymetrix GeneChip. Table \ref{Table:sample} summarizes the five comparisons or tests of differential expression conducted by these three labs. \begin{table}[t] \centering \begin{tabular}{c c c c c c c} Study & Comparison & Lab & Tissue & Chip & \# Control Arrays & \# Treatment Arrays \\ \hline 1 & 1 & 1 & 0 & hgu133a & 5 & 5\\ 1 & 2 & 1 & 1 & hgu133a & 5 & 5\\ 2 & 3 & 1 & 0 & hgu95a & 3 & 3\\ 3 & 4 & 2 & 0 & hgu95av2 & 2 & 2\\ 4 & 5 & 3 & 1 & hgu133b & 9 & 11 \end{tabular} \caption{\label{Table:sample} Summary of sample gene expression studies that are candidates for a meta-analyis, as each experiment tested for differential expression between the same control and treatment conditions.} \end{table} Before the results of these studies can be passed to the meta-analysis functions of the \Rpackage{metahdep} package, the data must first be formatted. As this formatting is nontrivial, Subsection \ref{Ex2.format} below illustrates the necessary steps. Subsection \ref{Ex2.run} below shows how to pass these formatted results to the main meta-analysis functions of the \Rpackage{metahdep} package. \subsection{Formatting the Gene Expression Data for Meta-Analysis} \label{Ex2.format} The first object we create to aid in formatting the data in preparation for meta-analysis is a data frame linking the probeset identifiers for the same gene on the different chip versions. This data frame is meant to help ``resolve the many-to-many relationships between probes and genes'' (Ramasamy et al. 2008). The data frame must have columns named ``chip'', ``old.name'', and ``new.name.'' Probesets sharing a common ``new.name'' will be combined in the meta-analysis, even if they are on the same array. Probesets not appearing in the data frame will not be used in the meta-analysis. In this vignette example, Affymetrix GeneChip versions hgu133a, hgu133b, hgu95a, and hgu95av2 are used. Various approaches are possible to match probesets across array versions based on common gene content; see Ramasamy et al. (2008) for a discussion. The optimal resolution of this common gene content issue is beyond the scope of this \Rpackage{metahdep} package. The code chunk below uses Entrez Gene IDs as probesets' ``new.name'' and creates a data frame in the necessary format, discarding probesets that do not map to an Entrez Gene ID. As with other possible approaches, this Entrez Gene ID mapping can allow multiple probesets to share the same ``new.name'' on the same array version. Package users may choose to keep only one probeset for each ``new.name'' per array version. This choice could be made based on sequence or biological information, or it could be based on experimental data, as suggested in Ramasamy et al. (2008); for example, the \Rfunction{findLargest} function in the \Rpackage{genefilter} package could be used to identify the probesets with the largest effect size for each Entrez Gene ID. Similarly, Rhodes et al. (2002) chose the probesets with the smallest P-values for each gene ID. Even if multiple probesets are allowed to map to the same ``new.name'' on an array version, however, their effect size estimates can still be treated as hierarchically dependent by the \Rpackage{metahdep} package. Whichever strategy is employed to match gene content across array versions, the final data frame must have the same column names (and meanings) as the sample object created here. <>= library(AnnotationDbi) library(hgu133a.db); library(hgu133b.db); library(hgu95a.db); library(hgu95av2.db) EID.hgu133a <- unlist(mget(mappedkeys(hgu133aENTREZID),hgu133aENTREZID)) EID.hgu133b <- unlist(mget(mappedkeys(hgu133bENTREZID),hgu133bENTREZID)) EID.hgu95a <- unlist(mget(mappedkeys(hgu95aENTREZID),hgu95aENTREZID)) EID.hgu95av2 <- unlist(mget(mappedkeys(hgu95av2ENTREZID),hgu95av2ENTREZID)) chip <- c(rep("hgu133a", length(EID.hgu133a)), rep("hgu133b", length(EID.hgu133b)), rep("hgu95a", length(EID.hgu95a)), rep("hgu95av2", length(EID.hgu95av2))) EID <- c(EID.hgu133a, EID.hgu133b, EID.hgu95a, EID.hgu95av2) newnames <- data.frame(chip=chip, old.name=names(EID), new.name=EID) @ For convenience (specifically to reduce the size of the sample objects created by this vignette), we use only a subset of the genes. In the following code, we randomly select a subset of 1000 Entrez Gene IDs. Note that we also identify the subsets of corresponding probeset ID's on the different array versions used among our studies of interest. <>= set.seed(1234) use.new.name <- sample(unique(newnames$new.name),1000) use.hgu133a.gn <- names(EID.hgu133a[is.element(EID.hgu133a,use.new.name)]) use.hgu133b.gn <- names(EID.hgu133b[is.element(EID.hgu133b,use.new.name)]) use.hgu95a.gn <- names(EID.hgu95a[is.element(EID.hgu95a,use.new.name)]) use.hgu95av2.gn <- names(EID.hgu95av2[is.element(EID.hgu95av2,use.new.name)]) HGU.newnames <- newnames[is.element(EID,use.new.name),] @ The resulting \Rfunction{HGU.newnames} data frame is equivalent to the \Rfunction{HGU.newnames} data frame that is provided as sample data with the \Rpackage{metahdep} package. Next, an effect size estimate (and associated variance estimate) for each probeset in each study is needed. There are a number of possible ways to calculate a meaningful effect size estimate for gene expression data; see Ramasamy et al. (2008) for a brief discussion. We adopt the estimate proposed by Hu et al. (2006), and based on the probe-level model test statistic of Bolstad (2004). This model is fit using the \Rpackage{affyPLM} package for Bioconductor; see the Appendix of this vignette for a summary, including the estimation of the variance and covariance of effect size estimates. It is necessary to estimate the covariance between effect size estimates in cases such as Study 1, where two treatment vs. control comparisons are made, one for each tissue, using data from the same study. The Appendix of this vignette summarizes this covariance estimation in the probe-level model context. This type of sampling dependence can be accounted for in the meta-analysis, as discussed in Stevens and Taylor (2009). The effect size estimates from this probe-level model, as well as the appropriate variances and covariances, are calculated by the \Rfunction{getPLM.es} function. Other effect size estimates are possible, but in order to use the meta-analysis functions of the \Rpackage{metahdep} package, the results must be in the same format as that returned by the \Rfunction{getPLM.es} function. The result of a call to \Rfunction{getPLM.es} for each study is an object of class \Rfunction{ES.obj}; see the object class help page (\Rfunction{?"ES.obj-class"}) for more on this class. In the sample code that follows, we use gene expression data freely available through the \Rpackage{ALLMLL} and \Rpackage{ALL} packages. The experimental design we describe does not match the actual design of the experiments used to generate these data. However, our goal here is to demonstrate the use of the meta-analysis tools in the \Rpackage{metahdep} package, and these publicly available data are convenient for this demonstration. For Study 1, we have 20 arrays (from the \Rfunction{MLL.A} data in the \Rpackage{ALLMLL} package). We assume that the 20 arrays in this \Rpackage{AffyBatch} object are in the following order: 5 Tissue 0 Controls, 5 Tissue 1 Controls, 5 Tissue 0 Treatments, and then 5 Tissue 1 Treatments. Based on this assumed structure, we create the \Rfunction{ES.obj} for Study 1. Note that for this study, as well as for the remaining studies, we restrict our attention to the convenient subset of probe set ID's identified previously for each chip version. The two comparisons of interest (treatment vs. control for Tissue 0, and treatment vs. control for Tissue 1) are defined by the pairs of array index vectors passed to the \Rfunction{trt1} and \Rfunction{trt2} arguments. The covariate differences of these two comparisons are represented in theh \Rfunction{covariates} argument. Because this study was conducted by Lab 1 (see Table \ref{Table:sample}), we define the hierarchical dependence group with the \Rfunction{dep.grp=1} argument: <>= library(ALLMLL); data(MLL.A) ES.exp1 <- getPLM.es(abatch=MLL.A, trt1=list(c(1:5),c(6:10)), trt2=list(c(11:15),c(16:20)), sub.gn=use.hgu133a.gn, covariates=data.frame(Tissue=c(0,1)), dep.grp=1 ) @ For Study 2, we have 6 arrays (from the \Rfunction{spikein95} data in the \Rpackage{SpikeInSubset} package). We assume that the first 3 arrays are Tissue 0 Controls, and the last 3 are Tissue 0 Treatments. Based on this assumed structure, we create the \Rfunction{ES.obj} for Study 2. Unlike Study 1, here there is only one comparison of interest, so the \Rfunction{trt1} and \Rfunction{trt2} arguments have only one element each, and the \Rfunction{covariates} data.frame argument has only one row. This study was conducted by the same Lab 1 as Study 1, and this hierarchical dependence is indicated by the \Rfunction{dep.grp} argument: <>= library(SpikeInSubset) data(spikein95) ES.exp2 <- getPLM.es(abatch=spikein95, trt1=1:3, trt2=4:6, sub.gn=use.hgu95a.gn, covariates=data.frame(Tissue=0), dep.grp=1 ) @ For Study 3, we have 4 arrays (from the \Rfunction{Dilution} data in the \Rpackage{affy} package). We assume that the first 2 arrays are Tissue 0 Controls, and the last 2 are Tissue 0 Treatments. Based on this assumed structure, we create the \Rfunction{ES.obj} for Study 3: <>= data(Dilution) ES.exp3 <- getPLM.es(abatch=Dilution, trt1=1:2, trt2=3:4, sub.gn=use.hgu95av2.gn, covariates=data.frame(Tissue=0), dep.grp=2 ) @ Finally, for Study 4, we have 20 arrays (from the \Rfunction{MLL.B} data in the \Rpackage{ALLMLL} package). We assume that the first 9 arrays are Tissue 1 Controls, followed by 11 Tissue 1 Treatments. Based on this assumed structure, we create the \Rfunction{ES.obj} for Study 4: <>= data(MLL.B) ES.exp4 <- getPLM.es(abatch=MLL.B, trt1=1:9, trt2=10:20, sub.gn=use.hgu133b.gn, covariates=data.frame(Tissue=1), dep.grp=3 ) @ With the \Rfunction{ES.obj}'s thus created for each study, we now put them together as elements in a list, in preparation for passing the list to the \Rfunction{metahdep.format} function of the \Rpackage{metahdep} package. The list created by the following code is equivalent to the \Rfunction{HGU.DifExp.list} list that is provided as sample data with the \Rpackage{metahdep} package: <>= HGU.DifExp.list <- list(ES.exp1, ES.exp2, ES.exp3, ES.exp4) @ With the list of \Rfunction{ES.obj}'s and the ``new names'' \Rfunction{data.frame} thus created, the \Rfunction{metahdep.format} function can be used to format the data in final preparation for meta-analysis. It is important to note that the same covariate(s) must be defined in each of the \Rfunction{ES.obj}'s to be used, even if they are constants in some \Rfunction{ES.obj}'s (as Tissue is in each of Studies 2, 3, and 4 here). Also, if any of the \Rfunction{ES.obj}'s do not have a value of \Rfunction{dep.grp} defined, then the \Rfunction{metahdep.format} function will assume that the studies (or \Rfunction{ES.obj} objects) are hierarchically independent, and a cautionary note will be displayed. The result of the following code is equivalent to the \Rfunction{HGU.prep.list} object that is provided as sample data with the \Rpackage{metahdep} package: <>= HGU.prep.list <- metahdep.format(HGU.DifExp.list, HGU.newnames) @ This resulting \Rfunction{HGU.prep.list} object is a list. Each element of the list is a \Rfunction{metaprep} object corresponding to a specific ``new name'' and containing the corresponding effect size estimates, covariance matrix, covariate matrix, and hierarchical dependence structure. See the object class help page (\Rfunction{?"metaprep-class"}) for more on this class. Section \ref{Ex2.run} below demonstrates how this list of \Rfunction{metaprep} objects is passed to the meta-analysis functions of the \Rpackage{metahdep} package. \subsection{Meta-Analysis of Formatted Gene Expression Studies} \label{Ex2.run} With a list of \Rfunction{metaprep} objects as returned by the \Rfunction{metahdep.format} function, the fixed effects, random effects, and hierarchical Bayes models can be applied to each ``new name'' gene individually using the \Rfunction{metahdep} function. In the following code, sampling dependence is accounted for in each of the models, and hierarchical dependence is accounted for in the two ``\Rfunction{.ds}'' objects returned when \Rfunction{delta.split=TRUE}. The non-intercept columns of the covariate matrix are centered about their means (\Rfunction{center.X=TRUE}) so that the Intercept term is of primary interest as the population mean effect size. First load the package and necessary data objects (all created above). We can also select a subset of genes on which to perform the meta-analysis. In practice, biological interest would identify the subset, but we use a random subset of 50 just for convenience in our demonstration here: <>= library(metahdep) data(HGU.newnames) data(HGU.prep.list) set.seed(123) gene.subset <- sample(HGU.newnames$new.name, 50) @ Run fixed effects meta-analysis on each gene: <>= FEMA <- metahdep(HGU.prep.list, method="FEMA", genelist=gene.subset, center.X=TRUE) @ Run random effects meta-analysis on each gene, first ignoring hierarchical dependence (\Rfunction{delta.split=FALSE}) and then accounting for hierarchical dependence (\Rfunction{delta.split=TRUE}): <>= REMA <- metahdep(HGU.prep.list, method="REMA", genelist=gene.subset, delta.split=FALSE, center.X=TRUE) REMA.ds <- metahdep(HGU.prep.list, method="REMA", genelist=gene.subset, delta.split=TRUE, center.X=TRUE) @ Run hierarchical Bayes meta-analysis on each gene, first ignoring hierarchical dependence (\Rfunction{delta.split=FALSE}) and then accounting for hierarchical dependence (\Rfunction{delta.split=TRUE}): <>= HBLM <- metahdep(HGU.prep.list, method="HBLM", genelist=gene.subset, delta.split=FALSE, center.X=TRUE, two.sided=TRUE) HBLM.ds <- metahdep(HGU.prep.list, method="HBLM", genelist=gene.subset, delta.split=TRUE, center.X=TRUE, two.sided=TRUE) @ In these hierarchical Bayes meta-analysis models, we use \Rfunction{two.sided=TRUE} so that the posterior probabilities are converted to two-sided p-values, for more traditional interpretation, as in Stevens and Taylor (2009). Each call to the \Rfunction{metahdep} function returns a \Rfunction{data.frame} object with summaries of the meta-analysis model called, with a row for each ``new name'' gene. The final chunk of code below compares the results from these five models, focusing on the test of significance of the population mean effect size (i.e., the Intercept p-value). The resulting scatterplot matrix appears in Figure \ref{Fig:Plots}. \begin{figure}[htbp] \begin{center} <>= pvals <- data.frame(FEMA=FEMA$Intercept.pval, REMA=REMA$Intercept.pval, REMA.ds=REMA.ds$Intercept.pval, HBLM=HBLM$Intercept.pval, HBLM.ds=HBLM.ds$Intercept.pval) plot(pvals, main='Comparison of Intercept P-values from Meta-Analysis',pch=16,cex=.5) @ \end{center} \caption{\label{Fig:Plots} A comparison of the Intercept p-values (testing the significance of the population mean effect size) from several meta-analysis models applied to the gene expression data.} \end{figure} \newpage \section{Meta-Analysis of Foreign Language Learning Studies} \label{Ex2} The statistical methods used by the \Rpackage{metahdep} package are described in Stevens and Taylor (2009), which focused on a meta-analytic review of 13 experiments with 18 study reports all involving the effect of native-language (L1) vocabulary aids (e.g., glossing) on second language (L2) reading comprehension. The data from that paper are included in the \Rpackage{metahdep} package for purposes of demonstration. The sample code in this section of the package vignette reproduces the main results of Stevens and Taylor (2009). A summary of the glossing data is provided in the \Rfunction{gloss.Table1} data frame within the \Rfunction{gloss} data set (see output in Table \ref{Table:gloss1}). <>= library(metahdep) data(gloss) gloss.Table1 @ \SweaveOpts{echo=FALSE} \begin{table}[t] \centering <<>>= gloss.Table1 @ \caption{\label{Table:gloss1} Summary of the glossing data.} \end{table} \SweaveOpts{echo=TRUE} Based on this summary, the vector of unbiased effect size estimates can be constructed. The following code produces this vector (\Rfunction{theta}), which is equivalent to the vector \Rfunction{gloss.theta} within the \Rfunction{gloss} data set: <>= attach(gloss.Table1) sp <- sqrt( ((n1-1)*s1^2+(n2-1)*s2^2) / (n1+n2-2) ) c.correct <- (1-3/(4*dfE-1)) theta <- c.correct * (y2-y1)/sp @ The covariance matrix corresponding to the vector of unbiased effect size estimates is constructed next, with some off-diagonal elements nonzero because of the sampling dependence (as described in Stevens and Taylor, 2009). First get the diagonal elements of the matrix: <<>>= pE <- c.correct^2*dfE/(dfE-2) var.theta <- pE*(1/n2+1/n1) + (pE-1)*theta^2 V <- diag(var.theta) @ As described in Stevens and Taylor (2009), sampling dependence exists among effect size estimates in both the Baumann and Joyce studies. As currently ordered, sampling dependence exists between studies 2 and 3 (Baumann), between studies 4 and 5 (Baumann), and among studies 10-12 (Joyce). The off-diagonal elements of the covariance matrix can be constructed accordingly. This resulting matrix \Rfunction{V} is equivalent to the matrix \Rfunction{gloss.V} within the \Rfunction{gloss} data set. <<>>= V[2,3] <- V[3,2] <- (pE[2]-1)*theta[2]*theta[3] V[4,5] <- V[5,4] <- (pE[4]-1)*theta[4]*theta[5] V[10,11] <- V[11,10] <- (pE[10]-1)*theta[10]*theta[11] V[10,12] <- V[12,10] <- (pE[10]-1)*theta[10]*theta[12] V[11,12] <- V[12,11] <- (pE[11]-1)*theta[11]*theta[12] @ To account for fundamental differences between studies in the meta-analysis, the design matrix can be constructed. This is equivalent to the matrix \Rfunction{gloss.X} within the \Rfunction{gloss} data set. <<>>= X <- matrix(0,nrow=18,ncol=6) X[,1] <- 1 X[Rec==2,2] <- 1 X[Time==2,3] <- 1 X[Part==2,4] <- 1 X[Part==3,5] <- 1 X[Pct==2,6] <- 1 colnames(X) <- c('Intercept','Rec2','Time2','Part2','Part3','Pct2') @ With the vector of unbiased effect size estimates (\Rfunction{theta}), the corresponding covariance matrix (\Rfunction{V}), and the design matrix (\Rfunction{X}) now defined, three classes of meta-analysis models may be considered: fixed effects, random effects, and hierarchical Bayes. The \Rpackage{metahdep} package has defined the \Rfunction{metahdep.FEMA}, \Rfunction{metahdep.REMA}, and \Rfunction{metahdep.HBLM} functions for these three model classes, respectively. In each model, three dependence structures are considered: independence (V diagonal), dependence (V not necessarily diagonal), and delta-split (also accounting for hierarchical dependence). The delta-split structure is not possible in the fixed effects model. For the delta-splitting models, dependence groups must be defined. In these data, the Baumann studies (numbered 2-5 in the current format) are one group, and the Joyce studies (numbered 10-12 in the current format) are another. The following commands define these dependence groups and then proceed to fit these possible models. <<>>= # define dependence groups for delta-splitting models dep.groups <- list( c(2,3,4,5), c(10,11,12) ) # fixed effects meta-analysis FEMA.indep <- metahdep.FEMA(theta, diag(diag(V)), X, center.X=TRUE) FEMA.dep <- metahdep.FEMA(theta, V, X, center.X=TRUE) # random effects meta-analysis REMA.indep <- metahdep.REMA(theta, diag(diag(V)), X, center.X=TRUE) REMA.dep <- metahdep.REMA(theta, V, X, center.X=TRUE) REMA.ds <- metahdep.REMA(theta, V, X, center.X=TRUE, delta.split=TRUE, dep.groups=dep.groups) # hierarchical Bayes meta-analysis HBLM.indep <- metahdep.HBLM(theta, diag(diag(V)), X, center.X=TRUE, two.sided=TRUE) HBLM.dep <- metahdep.HBLM(theta, V, X, center.X=TRUE, two.sided=TRUE) HBLM.ds <- metahdep.HBLM(theta, V, X, center.X=TRUE, two.sided=TRUE, delta.split=TRUE, dep.groups=dep.groups, n=20, m=20) @ The \Rfunction{metahdep.HBLM} function may return slightly different results for different values of the \Rfunction{n} and \Rfunction{m} arguments, as these control the step sizes in the necessary numerical integrations of the hierarchical Bayes model. The results from these various meta-analysis models can be summarized. The following code produces the output shown in Table \ref{Table:gloss2}; this is a reproduction of Table 4 of Stevens and Taylor (2009). <>= r1 <- round(c(FEMA.indep$beta.hats[1], FEMA.indep$beta.hat.p.values[1],NA,NA),4) r2 <- round(c(FEMA.dep$beta.hats[1], FEMA.dep$beta.hat.p.values[1],NA,NA),4) r3 <- round(c(REMA.indep$beta.hats[1], REMA.indep$beta.hat.p.values[1], REMA.indep$tau2.hat, NA),4) r4 <- round(c(REMA.dep$beta.hats[1], REMA.dep$beta.hat.p.values[1], REMA.dep$tau2.hat, NA),4) r5 <- round(c(REMA.ds$beta.hats[1], REMA.ds$beta.hat.p.values[1], REMA.ds$tau2.hat, REMA.ds$varsigma.hat),4) #r5 <- c(0.5261, 0.0026, 0.2936, -0.0816) r6 <- round(c(HBLM.indep$beta.hats[1], HBLM.indep$beta.hat.p.values[1], HBLM.indep$tau.var + HBLM.indep$tau.hat^2, NA),4) r7 <- round(c(HBLM.dep$beta.hats[1], HBLM.dep$beta.hat.p.values[1], HBLM.dep$tau.var + HBLM.dep$tau.hat^2, NA),4) r8 <- round(c(HBLM.ds$beta.hats[1], HBLM.ds$beta.hat.p.values[1], HBLM.ds$tau.var + HBLM.ds$tau.hat^2, HBLM.ds$varsigma.hat),4) model <- c(rep('Fixed',2),rep('Random',3),rep('Bayes',3)) dep <- c('Independent','Dependent',rep(c('Independent','Dependent','Delta-split'),2)) Table2 <- as.data.frame(cbind(model,dep,rbind(r1,r2,r3,r4,r5,r6,r7,r8))) colnames(Table2) <- c('Model','Structure','Intercept','P','tau-square','varsigma') rownames(Table2) <- NULL Table2 @ \SweaveOpts{echo=FALSE} \begin{table}[t] <<>>= r1 <- round(c(FEMA.indep$beta.hats[1], FEMA.indep$beta.hat.p.values[1],NA,NA),4) r2 <- round(c(FEMA.dep$beta.hats[1], FEMA.dep$beta.hat.p.values[1],NA,NA),4) r3 <- round(c(REMA.indep$beta.hats[1], REMA.indep$beta.hat.p.values[1], REMA.indep$tau2.hat, NA),4) r4 <- round(c(REMA.dep$beta.hats[1], REMA.dep$beta.hat.p.values[1], REMA.dep$tau2.hat, NA),4) #r5 <- round(c(REMA.ds$beta.hats[1], REMA.ds$beta.hat.p.values[1], REMA.ds$tau2.hat, REMA.ds$varsigma.hat),4) r5 <- c(0.5261, 0.0026, 0.2936, -0.0816) r6 <- round(c(HBLM.indep$beta.hats[1], HBLM.indep$beta.hat.p.values[1], HBLM.indep$tau.var + HBLM.indep$tau.hat^2, NA),4) r7 <- round(c(HBLM.dep$beta.hats[1], HBLM.dep$beta.hat.p.values[1], HBLM.dep$tau.var + HBLM.dep$tau.hat^2, NA),4) r8 <- round(c(HBLM.ds$beta.hats[1], HBLM.ds$beta.hat.p.values[1], HBLM.ds$tau.var + HBLM.ds$tau.hat^2, HBLM.ds$varsigma.hat),4) model <- c(rep('Fixed',2),rep('Random',3),rep('Bayes',3)) dep <- c('Independent','Dependent',rep(c('Independent','Dependent','Delta-split'),2)) Table2 <- as.data.frame(cbind(model,dep,rbind(r1,r2,r3,r4,r5,r6,r7,r8))) colnames(Table2) <- c('Model','Structure','Intercept','P','tau-square','varsigma') rownames(Table2) <- NULL Table2 @ \caption{\label{Table:gloss2} Summary of results from various meta-analysis models of the glossing data.} \end{table} \SweaveOpts{echo=TRUE} \begin{thebibliography}{80} \bibitem{Bolstad2004} Bolstad, B.M. (2004), ``Low-level Analysis of High-density Oligonucleotide Array Data: Background, Normalization and Summarization,'' PhD dissertation, U.C. Berkeley. \bibitem{Hu2006} Hu, P., Greenwood, C.M.T., and Beyene, J. (2006), ``Integrative Analysis of Gene Expression Data Including an Assessment of Pathway Enrichment for Predicting Prostate Cancer,'' {\it Cancer Informatics} 2006:2 289-300. \bibitem{Ramasamy2008} Ramasamy, A., Mondry, A., Holmes, C.C., and Altman, D.G. (2008), ``Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets,'' {\it PLoS Medicine} 5(9):e184. \bibitem{Rey1983} Rey, W.J. (1983), {\it Introduction to Robust and Quasi-Robust Statistical Methods}, Springer-Verlag, Berlin. \bibitem{Rhodes2002} Rhodes, D.R., Barrette, T.R., Rubin, M.A., Ghosh, D. and Chinnaiyan, A.M. (2002), ``Meta-Analysis of Microarrays: Interstudy Validation of Gene Expression Profiles Reveals Pathway Dysregulation in Prostate Cancer,'' {\it Cancer Research} 62:4427-4433. \bibitem{Bioinf2009} Stevens, J.R. and Nicholas, G. (2009), ``metahdep: Meta-analysis of hierarchically dependent gene expression studies'', {\it Bioinformatics} 25(19):2619-2620. \bibitem{JEBS2009} Stevens, J.R. and Taylor, A.M. (2009), ``Hierarchical Dependence in Meta-Analysis,'' {\it Journal of Educational and Behavioral Statistics} 34(1):46-73. \end{thebibliography} \newpage \appendix \section*{Appendix: Summary of Effect Size Estimate Calculation, with Variance and Covariance} For meta-analysis of gene expression data, Hu et al. (2006) demonstrated the strong performance of an effect size estimate based on Bolstad's probe-level model (Bolstad 2004). This model is fit with data from several arrays using the \Rpackage{affyPLM} package for Bioconductor. For each probeset, the Bolstad model returns a vector $\hat{\beta}$ of M-estimates for expression across the arrays, as well as a matrix $\hat{\Sigma}$, the estimated covariance matrix of $\hat{\beta}$. By the central limit theorem, and as in Rey (1983), the asymptotic distribution is \begin{eqnarray} \hat{\beta} & \sim & N(\beta,\Sigma), \nonumber \end{eqnarray} where $\beta_i$ is the expression of the gene on array $i$. Suppose that for a specific comparison $j$, a test of differential expression between groups of arrays is to be considered. We assume there are $n_{j,t}$ ``treatment'' arrays and $n_{j,c}$ ``control'' arrays. We define $c_j$ to be a contrast vector with elements $1/n_{j,t}$ for ``treatment'' arrays, $-1/n_{j,c}$ for ``control'' arrays, and 0 for all other arrays in the study. Then the gene's effect size estimate for this comparison of interest $j$ is \begin{eqnarray} \hat{\theta}_j & = & \sqrt{\frac{1}{n_{j,t}}+\frac{1}{n_{j,c}}} \times \frac{c_j^T \hat{\beta}}{ \sqrt{c_j^T \hat{\Sigma} c_j} }. \nonumber \end{eqnarray} Note that \begin{eqnarray} \frac{c_j^T \hat{\beta}}{ \sqrt{c_j^T \Sigma c_j} } & \sim & N\left( \frac{c_j^T \beta}{ \sqrt{c_j^T \Sigma c_j} }, 1 \right), \nonumber \end{eqnarray} and so \begin{eqnarray} Var \left[ \frac{c_j^T \hat{\beta}}{ \sqrt{c_j^T \Sigma c_j} } \right] & = & 1. \nonumber \end{eqnarray} Another comparison or test of differential expression could be considered between different groups of arrays, such as treatment vs. control for another tissue. Define $c_h$ to be the appropriate contrast vector, constructed similar to $c_j$. The gene's effect size estimate for this comparison of interest $h$, $\hat{\theta}_h$, can be calculated similar to $\hat{\theta}_j$. Because $\hat{\theta}_j$ and $\hat{\theta}_h$ are both based on the same estimate $\hat{\beta}$, they are sampling dependent. We can calculate \begin{eqnarray} Cov \left[ \frac{c_j^T \hat{\beta}}{ \sqrt{c_j^T \Sigma c_j} }, \frac{c_h^T \hat{\beta}}{ \sqrt{c_h^T \Sigma c_h} } \right] & = & \frac{c_j^T \Sigma c_h } { \sqrt{c_j^T \Sigma c_j} \sqrt{c_h^T \Sigma c_h} }. \nonumber \end{eqnarray} With all this, we have estimates for the variance and covariance of the gene's effect size estimates as \begin{eqnarray} Var \left[ \hat{\theta}_j \right] & \approx & \frac{1}{n_{j,t}} + \frac{1}{n_{j,c}}, \nonumber \\ Cov \left[ \hat{\theta}_j , \hat{\theta}_h \right] & \approx & \sqrt{\frac{\left( \frac{1}{n_{j,t}} + \frac{1}{n_{j,c}} \right) \left( \frac{1}{n_{h,t}} + \frac{1}{n_{h,c}} \right)}{ \left( c_j^T \hat{\Sigma} c_j \right) \left( c_h^T \hat{\Sigma} c_h \right) }} \times c_j^T \hat{\Sigma} c_h . \nonumber \end{eqnarray} \end{document}