\documentclass{article} % \VignetteIndexEntry{lapmix example} % \VignetteDepends{lapmix} \begin{document} \title{Lapmix: A package for fitting a Laplace mixture model to microarray data} \author{Yann Ruffieux} \maketitle This package fits a Laplace mixture model (Bhowmick et al. 2006) to differential gene expression data. The model is based on the one proposed by L\"{o}nnstedt and Speed (2002), which assumes normality of the mean expression in the genes. Here the Gaussian distribution is replaced by a heavier-tailed Laplace distribution. Also implemented is a maximum likelihood estimation method for the hyperparameters of the model. We have five such parameters: $\gamma$ and $\alpha$ are respectively the shape and scale parameters of the inverse gamma prior distribution for the between-gene error variance, $V$ represents a signal to noise ratio, $\omega$ is the prior probability of differential expression, and $\beta$ is an asymmetry parameter in the Laplace distribution. If we fix $\beta=0$ we assume a symmetric Laplace distribution in the model. See Bhowmick et al. (2006) for more details on the parameter estimation. The data are assumed to take the form of normalized base 2 logarithms of the expression ratios, and are stored in a $G \times n$ matrix, $G$ being the number of genes and $n$ the number of replicates per gene. If there are different numbers of replicates between genes, one can insert NaN's where appropriate to indicate `missing' replicates. The data may alternatively be stored in a list of arrays, or in an object of class \texttt{eSet} or \texttt{ExpressionSet}. Below we go through a simple example, using simulated data. First we load the library, of course: <>= library(lapmix) @ Next we simulate some symmetric Laplace microarray data: <>= set.seed(1011) G <- 3000 Y <- NULL sigma_sq <- 1/rgamma(G, shape=2.8, scale=0.04) mu <- rexp(G, rate=1/(sigma_sq*1.2))-rexp(G, rate=1/(sigma_sq*1.2)) is.diff <- sample(c(0,1), replace=TRUE, prob=c(0.9,0.1), size=G) mu <- mu*is.diff for(g in 1:G) Y <- rbind(Y, rnorm(4,mu[g], sd=sqrt(sigma_sq[g]))) @ We have generated 3000 genes with 4 replicates each, giving us the matrix \texttt{Y}. Now we fit this data to the Laplace model: <>= res <- lapmix.Fit(Y) @ The resulting list \texttt{res} contains several things of interest. For instance the empirical Bayes estimates of the hyperparameters can be accessed using: <>= res$estimates @ By default the parameters are estimated using a two-step likelihood-based approach. They can also be obtained from maximization of the full marginal likelihood, by adding the argument \texttt{two.step=FALSE} in the call to \texttt{lapmix.Fit}. We will be mostly interested in the posterior odds of differential expression (the L- or AL-stat) for each gene. The \texttt{laptopTable} ranks the top $m$ genes based on this criterion. This is very similar to the \texttt{topTable} from the \texttt{limma} package. <>= m <- 12 laptopTable(res, m) @ We can also produce a `volcano plot' based on the L-stat. \begin{figure}[h!] \begin{center} <>= lap.volcanoplot(res) @ \end{center} \end{figure} Above we have assumed a symmetric Laplace distribution for mean of differential expression. We can also fit an asymmetric Laplace model: <>= res2 <- lapmix.Fit(Y, asym=TRUE) res2$estimates @ and the result can be treated as in the symmetric case. By default the \texttt{lapmix.Fit} routine uses a `fast' hyperparameter estimation method: a very small proportion of the integrals in the marginal likelihoods cannot be computed via the $t$-distribution and are thus neglected (see Bhomwick et al. 2006, p. 632). A `slow' method can be invoked by inserting the argument \texttt{fast=FALSE} in the call to \texttt{lapmix.Fit}. This will compute the problem integrals numerically using the \texttt{integrate} routine. This is not advised, however: experiments suggest there is very little difference between the fast and slow methods, and the latter may cause convergence problems when used in conjunction with the one-step estimation approach. \section*{References} \begin{list}{}{\setlength{\itemindent}{-\leftmargin}} \item Bhowmick, D., Davison, A. C., Goldstein, D. R., and Ruffieux, Y. (2006) A Laplace mixture model for identification of differential expression in microarray experiments. \emph{Biostatistics} \textbf{7}, 4: 630-641. \item L\"{o}nnstedt, I. and Speed, T. P. (2002). Replicated microarray data. \emph{Statistica Sinica} \textbf{12}: 31-46. \end{list} \end{document}