%\VignetteIndexEntry{short review of dressCheck contents} %\VignetteDepends{dressCheck, Biobase} %\VignetteKeywords{array analysis reliability} %\VignettePackage{dressCheck} % % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % \documentclass[12pt]{article} \usepackage{amsmath,pstricks} % With MikTeX 2.9, using pstricks also requires using auto-pst-pdf or running % pdflatex will fail. Note that using auto-pst-pdf requires to set environment % variable MIKTEX_ENABLEWRITE18=t on Windows, and to set shell_escape = t in % texmf.cnf for Tex Live on Unix/Linux/Mac. \usepackage{auto-pst-pdf} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \textwidth=6.2in \bibliographystyle{plainnat} \begin{document} %\setkeys{Gin}{width=0.55\textwidth} \title{Reproducibility of Dressman JCO 2007} \author{VJ Carey} \maketitle In the light of recent high-level challenges to reproducibility of microarray studies (Ioannidis 2009 and others) the dispute between Baggerly, Coombes and Neeley (BCN) and Dressman, Potti and Nevins (DPN) in J Clin Oncology, 2008; 26(7):1186-1187, is of broad interest. But it seems that neither the editors of JCO nor the rebuttalists read the arguments of BCN with much care. In preparing an invited chapter on reproducible research in a forthcoming monograph on cancer bioinformatics, I decided to look closely at the archive generated by BCN at \url{http://bioinformatics.mdanderson.org/Supplements/ReproRsch-Ovary/} to see if a simple characterization of the dispute, perhaps with resolution, might be possible. Briefly, the following points need to be understood by those interested in pathway activation and platinum responsiveness in ovarian cancer. \begin{enumerate} \item As of April 4 2009, the `corrected RMA' archive at \url{http://data.cgt.duke.edu/platinum.php} has incorrect ID labeling of most of 119 samples. BCN show that a presumably correct relabeling can be established for 116 samples by finding maximum correlation between the corrected RMA samples and uncorrected RMA samples derivable from the CEL files also posted at the platinum.php site. Note that Dressman's `corrected RMA' terminology appears to refer to the fact that RMA-based quantifications were corrected for batch effects using sparse factor regression, and not to correction of the published mislabeling problem. \item BCN show how array run dates can be extracted from CEL files; standard Bioconductor tools facilitate this. Figures 1(a) and 1(b) below, computed independently of the source code of BCN, and based solely on the relabeled `corrected RMA' quantifications, provide evidence that batch effect-related confounding is present. RPS11, a gene in the Src pathway signature, and survival among platinum non-responders, have distributions that are systematically related to array batch date. The pattern seen in RPS11 indicates that the sparse factor regression corrections do not completely remove the batch effect. \item Figure 1(c) is a close approximation to Dressman et al.'s 2007 Figure 2B, and is computed using only the quantifications and survival data published at the platinum.php web site. The only aspect of Figure 1(c) that does not use, and thereby `repeat', Dressman's original analysis, is the scoring of pathway activation state of tumors. Neither Bild (2006) nor Dressman trouble to publish their scoring coefficients, a version of which may be easily derived by applying singular value decomposition to Bild's cell-line data. Figure 1(c) is a strong suggestion that the data and methods used by me and by BCN in their reevaluation of the 2007 article \textit{can} reproduce an important aspect of the results. Indeed, when I computed Figure 1(c) I felt that a vindication of Dressman's 2007 article might be at hand. \item Figure 1(d) shows the result of performing the analysis pattern that yielded Figure 1(c) to the E2F3 pathway. Figure 1(d) should yield the association seen in Dressman's Figure 2C, but it does not. Among platinum non-responsive patients, there is no association between E2F3 activation and survival. However, among platinum-responsive patients, a significant association is seen, as in Figure 1(e). These observations were also made by BCN in their supplementary ovca7.pdf, but with different data sources. \item By extracting run dates from the original CEL files, strata can be formed to adjust for date-related confounding. Using a parsimonious quadratic model for the effect of run date, the test for a Src pathway effect on survival among platinum non-responders, corrected for confounding, has $p=0.47$. On the other hand, the same correction in the E2F3 setting, among platinum \textit{responders}, does not substantially alter the association between pathway activation and survival: $p < 10^{-4}$ after adjustment. Similar findings were reported by BCN in ovca7.pdf. \end{enumerate} In summary, the published data archives cannot be used to reconstruct key findings in Dressman's 2007 paper, even when methods are confined strictly to those employed by Dressman et al. The standard for reproduction articulated by DPN in their scathing rebuttal to BCN is readily met (with the exception of pathway activation scoring) for any reanalysis based on `corrected RMA' quantifications, because these quantifications enjoy the sparse factor regression adjustments unique to the Dressman methodology. Thus either the published data archive or the 2007 paper need substantial revision. Three final remarks. 1) It is important to distinguish between reconstrucibility of quantitative analyses and reproducibility of research findings. Figure 1(c) shows that Dressman's Figure 2B is \textit{reconstructible}, which is in itself a good thing. The associated inference is probably not \textit{reproducible}, however, because of the confounding: any experiment with similar observational resources possessing different biologically irrelevant relationships among batch, expression, and survival would yield results that are almost surely qualitatively different. It is customary for epidemiologists to check carefully for patterns indicative of confounding in their observational datasets; it must become similarly routine for analysts in genomics. 2) All the data, computations, and graphics on which this letter depends are available in the Bioconductor experimental data archive \textit{dressCheck}, and the Sweave code for this letter is present there as `short.Rnw'. Any individual with a copy of R can regenerate, criticize, or reuse any programming underlying this letter. If the problem of reconstruction are due primarily to flaws in the published data archive, analyses underlying this letter can be regenerated with one command, once the data images are revised. 3) Dressman and colleagues are to be commended for making publicly available so much of the data underlying their report. Their analyses are extremely interesting, but it appears technical errors have led -- at least -- to confusions of subgroup labels. It is clear that the level of scrutiny to which their analysis has been subjected by BCN and by me has not been applied to the vast majority of publications based on genome-scale data analysis, and thus there is a kind of unfairness visited upon those who a) have very interesting findings worthy of further exploration and b) make their data archives available for reanalysis. How to make genome-scale data analysis more reliable and verifiable for the primary investigators is an open question. Increased reliability and verifiability will become necessary as complexity of assays and annotation schemes grows. Baggerly and colleagues show how the criticism of a complex high-level publication can be made transparent and thorough; unfortunately the work involves seven supplementary documents and hundreds of associated primary and derived data files. The \textit{dressCheck} package on which this letter is based establishes less but does so in a more concise manner -- Figure 1(c) for example is computed from the primary data with 11 lines of R code. \begin{figure} <>= psurv = function (x, digits = max(options()$digits - 4, 3), ...) { saveopt <- options(digits = digits) on.exit(options(saveopt)) if (!inherits(x, "survdiff")) stop("Object is not the result of survdiff") if (!is.null(cl <- x$call)) { } omit <- x$na.action if (length(omit)) { } if (length(x$n) == 1) { z <- sign(x$exp - x$obs) * sqrt(x$chisq) temp <- c(x$obs, x$exp, z, signif(1 - pchisq(x$chisq, 1), digits)) names(temp) <- c("Observed", "Expected", "Z", "p") print(temp) } else { if (is.matrix(x$obs)) { otmp <- apply(x$obs, 1, sum) etmp <- apply(x$exp, 1, sum) } else { otmp <- x$obs etmp <- x$exp } df <- (sum(1 * (etmp > 0))) - 1 temp <- cbind(x$n, otmp, etmp, ((otmp - etmp)^2)/etmp, ((otmp - etmp)^2)/diag(x$var)) dimnames(temp) <- list(names(x$n), c("N", "Observed", "Expected", "(O-E)^2/E", "(O-E)^2/V")) uu <- 1 - pchisq(x$chisq, df) } uu } library(dressCheck) library(chron) library(survival) data(DrAsGiven) data(corrp116) an = as.numeric pdf(file="twox3.pdf", width=8, height=5) par(mfrow=c(2,3)) plot(an(exprs(corrp116["213350_at",]))~chron(corrp116$rundate), main="(a)", xlab="array run date", ylab="RMA+SFR expression of RPS11") CC = cut(chron(corrp116$rundate),2) with(pData(corrp116), d0 <<- survdiff(Surv(Survival, dead)~CC, subset=CR==0)) with(pData(corrp116), plot(survfit(Surv(Survival, dead)~CC, subset=CR==0),col=c("red", "green"), lwd=3, xlab="Months", ylab="survival (%)", main="(b)")) text(37,.03, paste("logrank p=", round(psurv(d0),3))) #giv = DrAsGiven[intersect(featureNames(DrAsGiven), names(srcWts)),] #srcWtsL = srcWts[featureNames(giv)] #sco = t(exprs(giv))%*%srcWtsL #sdys = 1*(sco>median(sco)) #with(pData(DrAsGiven), plot(survfit(Surv(Survival, X0...alive...1...dead)~sdys, # subset=response.0.NR..1.CR==0),col=c("blue", "yellow"), lwd=3, # xlab="Months", ylab="survival (%)", main="(b)")) #with(pData(DrAsGiven), d1 <<- survdiff(Surv(Survival, X0...alive...1...dead)~sdys, # subset=response.0.NR..1.CR==0)) #text(37,.05, paste("logrank p=", round(psurv(d1),3))) data(srcWts) # get scoring coefficients, then restrict expression data to the # genes in the pathway signature corr = corrp116[intersect(featureNames(corrp116), names(srcWts)),] srcWtsL = srcWts[featureNames(corr)] # score the tumors sco = t(exprs(corr))%*%srcWtsL sdys = 1*(sco>median(sco)) # dichotomize with(pData(corrp116), plot(survfit(Surv(Survival, dead)~sdys, subset=CR==0),col=c("blue", "yellow"), lwd=3, xlab="Months", ylab="survival (%)", main="(c)")) with(pData(corrp116), d2 <<- survdiff(Surv(Survival, dead)~sdys, subset=CR==0)) text(37,.05, paste("logrank p=", round(psurv(d2),3))) data(e2f3Wts) corr = corrp116[intersect(featureNames(corrp116), names(e2f3Wts)),] eWtsL = e2f3Wts[featureNames(corr)] sco = t(exprs(corr))%*%eWtsL edys = 1*(sco