%\VignetteIndexEntry{Supplement: multi-channel assays} %\VignetteKeywords{Cell based assays} %\VignettePackage{cellHTS} \documentclass[11pt]{article} \usepackage{amsmath} \usepackage{color} \definecolor{darkblue}{rgb}{0.0,0.0,0.75} \usepackage[% baseurl={http://www.bioconductor.org},% pdftitle={Analysis of multi-channel cell-based screens},% pdfauthor={Wolfgang Huber},% pdfsubject={cellHTS},% pdfkeywords={Bioconductor},% pagebackref,bookmarks,colorlinks,linkcolor=darkblue,citecolor=darkblue,% pagecolor=darkblue,raiselinks,plainpages,pdftex]{hyperref} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rpackage}[1]{\textit{#1}} \newcommand{\Rfunction}[1]{\textit{#1}} \newcommand{\Rclass}[1]{\textit{#1}} \newcommand{\myincfig}[3]{% \begin{figure}[tp] \begin{center} \includegraphics[width=#2]{#1} \caption{\label{#1}#3} \end{center} \end{figure} } \begin{document} %------------------------------------------------------------ \title{Analysis of multi-channel cell-based screens} %------------------------------------------------------------ \author{L\'igia Br\'as, Michael Boutros and Wolfgang Huber} \maketitle \tableofcontents \section{Introduction} This techical report is a supplement of the main vignette \textit{End-to-end analysis of cell-based screens: from raw intensity readings to the annotated hit list} that is given as part of the \Rpackage{cellHTS} package. It accompanies the paper \textit{Analysis of cell-based RNAi screens} by Michael Boutros, L\'igia Br\'as and Wolfgang Huber~\cite{Boutros2006}. The report demonstrates how the \Rpackage{cellHTS} package can be applied to the documentation and analysis of multi-channel cell-based high-throughput screens (HTS), more specifically, dual-channel experiments. Such experiments are used, for example, to measure the phenotype of a pathway-specific reporter gene against a constitutive signal that can be used for normalization purposes. Typical examples for dual-channel experimental setups are dual-luciferase assays, whereby both a firefly and renilla luciferase are measured in the same well. In principle, multiplex assays can consist of many more than two channels, such as in the case of flow-cytometry readout or other microscopy-based high-content approaches. We note that in this report we present a simple approach to analyse data from dual-channel experiments, which can be expanded to experiments with more than two reporters, taking the in-built normalization functions of \textit{cellHTS} as a template, and employing the extensive statistical modeling capabilities of the R programming language. Moreover, such analyses should be adapted to the biological system and to the question of interest. This text has been produced as a reproducible document~\cite{Gentleman2004RepRes}, containing the actual computer instructions, given in the language R, to produce all results, including the figures and tables that are shown here. To reproduce the computations shown here, you will need an installation of R (version 2.3 or greater) together with a recent version of the package \Rpackage{cellHTS} and of some other add-on packages. Then, you can simply take the file \textit{twoChannels.Rnw} in the \textit{doc} directory of the package, open it in a text editor, run it using the R command \Rfunction{Sweave}, and modify it according to your needs. We start by loading the package. % <>= library("cellHTS") @ % <>= ## for debugging: options(error=recover) @ % %------------------------------------------------------------ \section{Assembling the data} \label{sec:assemble} %------------------------------------------------------------ Here, we consider a sample data of a dual-channel experiment performed with \textit{D. melanogaster} cells. The screen was conducted in microtiter plate format using a library of double-stranded RNAs (dsRNAs), in duplicates. The example data set corresponds to three 384-well plates. The purpose of the screen is to find signaling components of a given pathway. In the screen, one reporter (assigned to channel 1, and denoted here by $R_1$) monitors cell growth and viability, while the other reporter (assigned to channel 2 and denoted here by $R_2$) is indicative of pathway activity. %------------------------------------------------------------ \subsection{Reading the raw intensity files} %------------------------------------------------------------ The set of available result files and the information about them (which plate, which replicate, which channel) is given in the \emph{plate list file}. The first few lines of the plate list file for this data set are shown in Table~\ref{tab:platelist}. \input{twoChannels-platelist} Using the function \Rfunction{readPlateData}, we can read the plate list file and all the intensity files, thereby assembling the data into a single R object that can be used for subsequent analyses. First, we define the path for those files: % <>= experimentName <- "DualChannelScreen" dataPath=system.file(experimentName, package="cellHTS") @ % The input files are in the \Robject{\Sexpr{experimentName}} directory of the \Rpackage{cellHTS} package. % <>= x <- readPlateData("Platelist.txt", name=experimentName, path=dataPath) @ % <>= x @ % %% Create the tables with the first lines of the plate list file: %% it would have been nice to use the "xtable" package but it insists on %% adding row numbers, which we don't like. <>= cellHTS:::tableOutput(file.path(dataPath, "Platelist.txt"), "plate list", preName="twoChannels") @ %------------------------------------------------------------ \subsection{Annotating the plate results} %------------------------------------------------------------ \input{twoChannels-plateconfiguration} \input{twoChannels-screenlog} Next, we annotate the measured data with information on the controls, and flag invalid measurements using the information given in the \emph{plate configuration file} and in the \emph{screen log file}, respectively. Selected lines of these files are shown in Table~\ref{tab:plateconfiguration} and Table~\ref{tab:screenlog}. Morevoer, we also add the information contained in the \emph{screen description file}, which gives a general description of the screen. % <>= x <- configure(x, "Plateconf.txt", "Screenlog.txt", "Description.txt", path=dataPath) @ % %% %% Create the table for plateConf and screenLog <>= cellHTS:::tableOutput(file.path(dataPath, "Plateconf.txt"), "plate configuration", selRows=1:4, preName="twoChannels") cellHTS:::tableOutput(file.path(dataPath, "Screenlog.txt"), "screen log", selRows=1:2, preName="twoChannels") @ In this data set, instead of using the default names \emph{pos} and \emph{neg} for positive and negative controls, respectively, we use the name of the gene targeted by the probes in the control wells: \emph{geneA}, \emph{geneB}, \emph{geneC} and \emph{geneD}. This is a more straighforward approach, since not all of these four controls behave as controls for both reporters $R_1$ and $R_2$. Moreover, the two positive controls have different strengths: \emph{geneC} is expected to generate a weaker effect than \emph{geneD}. Thus, it is useful to define these controls separately at the configuration step, in order to calculate the quality measures (dynamic range and $Z'$-factors) specific for each of them in the HTML quality reports. %Note that this allows to have the same configuration file for both reporters. Below, we look at the frequency of each well annotation in the example data: % <<>>= table(x$plateConf$Content) @ %------------------------------------------------------------ \section{Data preprocessing and summarization of replicates} %------------------------------------------------------------ We can take a first look at the data by constructing the HTML quality reports using the \Rfunction{writeReport} function. As mentioned above, the controls used in the screen are reporter-specific. When calling \Rfunction{writeReport}, we need to specify to the function's arguments \Robject{posControls} and \Robject{negControls} which are the positive and negative controls for each channel: % <>= ## Define the controls for the different channels: negControls <- vector("character", length=dim(x$xraw)[4]) # channel 1 - gene A negControls[1] <- "(?i)^geneA$" # case-insensitive and match the empty string at the beginning and end of a line (to distinguish between "geneA" and "geneAB", for example. Although it is not a problem for the present well annotation) # channel 2 - gene A and geneB negControls[2] <- "(?i)^geneA$|^geneB$" posControls <- vector("character", length=dim(x$xraw)[4]) # channel 1 - no controls # channel 2 - geneC and geneD posControls[2] <- "(?i)^geneC$|^geneD$" @ % In the constitutive channel $R_1$, there is one negative control, named \emph{geneA}, and no positive controls. In the pathway-specific reporter $R_2$ there are two different negative controls (\emph{geneA} and \emph{geneB}), and two diffferent positive controls (\emph{geneC} and \emph{geneD}). Each of the arguments \Robject{posControls} and \Robject{negControls} should be defined as a vector of regular expressions with the same length as the number of channels in \Robject{x\$xraw}. These arguments will be passed to the \Rfunction{regexpr} function for pattern matching within the well annotation given in \Robject{x\$wellAnno}. Finally, we construct the quality report pages for the raw data in a directory called \Robject{raw}, in the working directory: % <>= out <- writeReport(x, outdir="raw",posControls=posControls, negControls=negControls) @ <>= out <- writeReport(x, force=TRUE, outdir="raw", posControls=posControls, negControls=negControls) @ % After this function has finished, we can view the index page of the report: % <>= browseURL(out) @ % In this experiment, reporter 1 ($R_1$) monitors cell viability. Thus, wells with low intensities in $R_1$ should be masked: these cells are not responding to a specific perturbation of the studied signaling pathway, but show a more unspecific cell viability phenotype. There is no obvious choice for a threshold for the minimum intensity $R_1$ that we consider still viable; here, we have chosen to set this cut-off as a low quantile ($5\%$) of the overall distribution of corrected intensity values in the $R_1$ channel of each replicate. However, to be able to use the overall distribution of intensities in the three plates, first we need to remove the plate-to-plate variations. This will be performed by applying plate median scaling to each replicate and channel using the function \Rfunction{normalizePlates}: % <>= x <- normalizePlates(x, normalizationMethod="median") @ % <>= ctoff <- apply(x$xnorm[,,,1], 3, quantile, probs=0.05, na.rm=TRUE) @ % % <>= R <- getMatrix(x$xnorm, channel=1, na.rm=FALSE) F <- getMatrix(x$xnorm, channel=2, na.rm=FALSE) # Use the controls of R2 channel: posC <- which(regexpr(posControls[2], as.character(x$wellAnno), perl=TRUE)>0) negC <- which(regexpr(negControls[2], as.character(x$wellAnno), perl=TRUE)>0) ylim <- range(F, na.rm=TRUE) xlim <- range(R, na.rm=TRUE) par(mfrow=c(1,ncol(F)), mai=c(1.15,1.15, 0.3,0.3)) for (r in 1:ncol(F)) { ind <- apply(cbind(R[,r],F[,r]), 1, function(z) any(is.na(z))) plot(R[!ind,r],F[!ind,r], col= densCols(cbind(R[!ind,r], F[!ind,r])), pch=16,cex=0.7, xlab="R1 (log scale)", ylab="R2 (log scale)", log="xy", ylim=ylim, xlim=xlim) abline(v=ctoff[r], col="grey", lty=2, lwd=2) points(R[posC,r],F[posC,r], col="red", pch=20, cex=0.9) points(R[negC,r],F[negC,r], col="green", pch=20, cex=0.9) ind <- which(x$xnorm[,,r,1] <= ctoff[r]) points(R[ind,r],F[ind,r], col="grey", pch=20, cex=0.9) #legend("topleft", col=c("grey", "red", "green"), legend=c("masked", "positive controls","negative controls"), bty="n", pch=20, cex=0.9) } @ % \myincfig{twoChannels-FvsRcorrected}{\textwidth}{Scatterplot of the plate median corrected intensity values in the signal-dependent channel ($R_2$) against the plate median corrected intensity values in the constitutive channel ($R_1$) for replicate 1 (left) and replicate 2 (right). Masked values are shown in grey, while positive and negative controls are shown in red and green, respectively.} % Figure~\ref{twoChannels-FvsRcorrected} shows the plate median corrected intensities in $R_2$ versus $R_1$ channels, together with the calculated threshold and the positive and negative controls of the pathway-inducible reporter $R_2$. The wells with intensity values below the calculated threshold are shown in grey and will be set to "NA" in the slot \Robject{xraw}. Since we further want to apply an alternative normalization procedure to the data (see below), we will create a new \Rpackage{cellHTS} object called \Robject{y} by copying the contents of \Robject{x}, and then flag the low intensity $R_1$ values in \Robject{y\$xraw}. %The subsequent preprocessing will be applied on this array. % <>= y <- x for(r in 1:dim(y$xnorm)[3]) { ind <- y$xnorm[,,r,1]<=ctoff[r] y$xraw[,,r,][ind] <- NA } @ % In order to distinguish between changes in the readout caused by depletion of specific pathway components versus changes in the overall cell number, the next step consists in normalizing the pathway-inducible readout ($R_2$) against the constitutive reporter ($R_1$). This can be done using the \Rfunction{normalizeChannels} function provided in the \textit{cellHTS} package. This function can also adjust for plate effects on the $R_2/R_1$ ratios (or log ratios) using the method specified in the parameter \Robject{adjustPlates}. For example, by setting it to \Robject{"median"}, the \Rfunction{normalizeChannels} function applies plate median normalization by dividing or subtracting (if data has been log transformed) each value by the median of values in that plate. Below, we apply \Rfunction{normalizeChannels} to take the $\log_2$ ratio $R_2/R_1$ using the data values in \Robject{y\$xraw}, and then use plate median normalization to remove plate-to-plate variations: % <>= y <- normalizeChannels(y, adjustPlates="median") @ % The normalized intensities are stored in the slot \Robject{y\$xnorm}. This is an array of the same size as \Robject{y\$xraw}, except in the last dimension (number of channels). As noted above, the \Rfunction{normalizeChannels} is deprecated, and currently we advice the use of the function \Rfunction{summarizeChannels}, which was designed to take the raw measurements in \Robject{xraw} and correct them for plate effects \emph{before} summarizing the two channels. Note that the step of plate correction is optional, because the argument \Robject{adjustPlates} can also be omitted when calling this function. In our case, we first adjust each channel separately by performing plate median scaling (\Robject{adjustPlates = median}). Then, we apply the function given in the \Robject{fun} argument of \Rfunction{summarizeChannels}. In this function, we define a low intensity threshold for the measurements in the constitutive channel $R_1$, which will be set to the $5\%$ quantile of the overall plate median corrected intensities in this channel. For the positions where the plate median corrected intentities in $R_1$ channel are above this threshold, we relate the two channels plate median corrected measurements by taking the $\log_2$ ratio of $R_2/R_1$, while the other positions are set to ``NA''. We call this function below on \Robject{x}, which contains the original raw data. % <>= x = summarizeChannels(x, fun = function(r1, r2, thresh=quantile(r1, probs=0.05, na.rm=TRUE)) ifelse(r1>thresh, log2(r2/r1), as.numeric(NA)), adjustPlates="median") @ % Considering this last preprocessing procedure, we proceed with the analysis using the data stored in \Robject{x}. Below, we call the \Rfunction{summarizeReplicates} function to determine the $z$-score values for each replicate, and then summarize the replicated $z$-score values by taking the average. % <>= x <- summarizeReplicates(x, zscore="-", summary="mean") @ % The resulting single $z$-score value per probe is stored in the slot \Robject{x\$score}. The left side of Figure~\ref{twoChannels-scores} shows the boxplots of the $z$-scores for the different types of probes, while the right side of the figure shows the $z$-scores for the whole screen as an image plot. % <>= par(mfrow=c(1,2)) ylim <- quantile(x$score, c(0.001, 0.999), na.rm=TRUE) boxplot(x$score ~ x$wellAnno, col="lightblue", outline=FALSE, ylim=ylim) imageScreen(x, zrange=c(-2,4)) @ % \myincfig{twoChannels-scores}{0.7\textwidth}{$z$-scores for the screen. Left Panel: Boxplots of $z$-scores for the different types of probes. Right Panel: Screen-wide image plot.} % Now that the data have been preprocessed and scored, we call again \Rfunction{writeReport} and use a web browser to view the resulting report. But first, we have to redefine the positive and negative controls for the normalized data \Robject{x\$xnorm}, because it now corresponds to a single channel. The controls for the normalized data values are the same as those of the raw data channel $R_2$. % <>= ## Define the controls for the normalized intensities (only one channel): # For the single channel, the negative controls are geneA and geneB negControls <- "(?i)^geneA$|^geneB$" posControls <- "(?i)^geneC$|^geneD$" @ <>= out <- writeReport(x, outdir="logRatio", imageScreenArgs=list(zrange=c(-4,4)), plotPlateArgs=list(xrange=c(-3,3)), posControls=posControls, negControls=negControls) @ <>= out <- writeReport(x, force=TRUE, outdir="logRatio", imageScreenArgs=list(zrange=c(-4,4)), plotPlateArgs=list(xrange=c(-3,3)), posControls=posControls, negControls=negControls) @ % The quality reports have been created in the folder \Robject{logRatio} in the working directory. <>= browseURL(out) @ % The quality reports have been created in the folder \Robject{logRatio} in the working directory. % Finally, we will save the data set to a file. % <>= save(x, file=paste(experimentName, ".rda", sep="")) @ % %--------------------------------------------------------- \section{Session info} %--------------------------------------------------------- This document was produced using: <>= toLatex(sessionInfo()) @ %------------------------------------------------------------ %Bibliography %------------------------------------------------------------ \bibliography{cellhts} \bibliographystyle{plain} \end{document}