% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
% \VignetteIndexEntry{LMGene user's guide}
% \VignetteDepends{Biobase, tools, multtest, survival, splines}

\documentclass[11pt]{article}
\usepackage{amsmath,fullpage}
\usepackage{hyperref}


\parindent 0in  % Left justify


\begin{document}

\title{\bf LMGene User's Guide}
\author{Geun-Cheol Lee, John Tillinghast and David M. Rocke}

\maketitle

\tableofcontents

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}

This article introduces usage of the \texttt{LMGene} package. 
LMGene has been developed mainly for analysis of microarray data using 
a linear model and glog data transformation in the R statistical package.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Data preparation}
\texttt{LMGene} takes objects of class \texttt{ExpressionSet}, which is the standard
data structure of the \texttt{Biobase} package. Hence, if data which is of class
\texttt{ExpressionSet} already, the user can jump to further steps, like diagnostic
plotting or g-log transformation. Otherwise, the user needs to generate new objects
of class \texttt{ExpressionSet}. For more detail, please see the vignette,
`Textual Description of Biobase' in the \texttt{Biobase} package.

{\bf Note: ExpressionSet.} In this package, an object of class \texttt{ExpressionSet}
must produce proper data using the commands \texttt{exprs(object)} and \texttt{phenoData(object)}.

{\tt Example.} \texttt{LMGene} includes a sample array data which is of class
{\tt ExpressionSet}. Let's take a look this sample data.

\begin{enumerate}

\item First, load the necessary packages in your R session.
<<eval=TRUE, echo=TRUE>>=
library(LMGene)
library(Biobase)
library(tools)
@

\item Load the sample {\tt ExpressionSet} class data in the package {\tt LMGene}.
<<eval=TRUE, echo=TRUE>>=
data(sample.eS)
@

\item View the data structure of the sample data and 
the details of {\tt exprs} and {\tt phenoData} slots in the data.
<<eval=TRUE, echo=TRUE>>=
slotNames(sample.eS)
dim(exprs(sample.eS))
exprs(sample.eS)[1:3,]
phenoData(sample.eS)
slotNames(phenoData(sample.eS))
@

\end{enumerate}

{\tt Data generation.} If you don't have {\tt ExpressionSet} class data, you need to make some.
\texttt{LMGene} provides a function that can generate an object of class {\tt ExpressionSet},
assuming that there are array data of {\tt matrix} class and experimental data of {\tt list} class.

\begin{enumerate}

\item The package has sample array and experimental data, {\tt sample.mat} and {\tt vlist}.
<<eval=TRUE, echo=TRUE>>=
data(sample.mat)
dim(sample.mat)
data(vlist)  
vlist
@

\item Generate {\tt ExpressionSet} class data using {\tt neweS} function.
<<eval=TRUE, echo=TRUE>>=
test.eS<-neweS(sample.mat, vlist)
class(test.eS)
identical(sample.eS, test.eS)
@


\end{enumerate}


{\tt c.f.} If you have different types of array data, such as {\tt RGList}, {\tt marrayRaw}, and so on, 
you can convert them into {\tt ExpressionSet} class by using {\tt as} method after installing {\tt convert} package.


\section{G-log transformation}
\begin{enumerate}
\item {\tt Estimating parameters for g-log transformation.} 
The linear model is not applied to the raw data, but to transformed and normalized data. Many people use a log transform. LMGene uses a log-like transform involving two parameters. We estimate the parameters ${\lambda}$ and ${\alpha}$ of the generalized log
transform ${\log{(y - {\alpha} + {\sqrt{(y - {\alpha})^2 + {\lambda}}})} = \sinh^{-1}(\frac{y-\alpha}{\lambda})}+\log(\lambda)$ using the
function tranest as follows:

<<eval=TRUE, echo=TRUE>>=
tranpar <- tranest(sample.eS)
tranpar
@

The optional parameter {\tt ngenes} controls how many genes are used in
the estimation. The default is all of them (up to 100,000), but this option allows the use of less. A typical call using this parameter
would be

<<eval=TRUE, echo=TRUE>>=
tranpar <- tranest(sample.eS, 100)
tranpar
@

In this case, 100 genes are chosen at random and used to estimate
the transformation parameter. The routine returns a list
containing values for lambda and alpha.

   
\item {\tt G-log transformation.} Using the obtained two parameters, the g-log transformed
expression set can be calculated as follows.
<<eval=TRUE, echo=TRUE>>=

trsample.eS <- transeS(sample.eS, tranpar$lambda, tranpar$alpha)
exprs(sample.eS)[1:3,1:8]
exprs(trsample.eS)[1:3,1:8]
@


\item {\tt Tranest options: multiple alpha, lowessnorm, model}

Rather than using a single alpha for all samples, we can estimate a separate alpha for each sample. This allows for differences in chips, in sample concentration, or exposure conditions. 

<<eval=TRUE, echo=TRUE>>=
tranparmult <- tranest(sample.eS, mult=TRUE) 
tranparmult
@


For vector alphas, transeS uses exactly the same syntax:
<<eval=TRUE, echo=TRUE>>=
trsample.eS <- transeS (sample.eS, tranparmult$lambda, tranparmult$alpha)
exprs(trsample.eS)[1:3,1:8]
@

It's also possible to estimate the parameters using the more accurate lowess normalization (as opposed to uniform normalization):

<<eval=TRUE, echo=TRUE>>=
tranparmult <- tranest(sample.eS, ngenes=100, mult=TRUE, lowessnorm=TRUE) 
tranparmult
@


It is even possible now to estimate parameters using a specified model. For example, if we think that the interaction of variables in vlist is important, we can add interaction to the model:

<<eval=TRUE, echo=TRUE>>=
tranpar <- tranest(sample.eS, model='patient + dose + patient:dose')
tranpar
@

The model is always specified in the same way as the right-hand side of an lm model. In the example above, we set the parameters to minimize the mean squared error for a regression of transformed gene expression against patient, log dose, and their interaction.

Be very careful of using interactions between factor variables. If you do not have enough replications, you can easily overfit the data and have no errors to work with.

Naturally, it's possible to use mult, lowessnorm, and model all together.


\end{enumerate}   
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Finding differentially expressed genes}

\begin{enumerate}
\item {\tt Transformation and Normalization.} Before finding differentially expressed genes,
the array data needs to be transformed and normalized. 

<<eval=TRUE, echo=TRUE>>=
trsample.eS <- transeS (sample.eS, tranparmult$lambda, tranparmult$alpha)
ntrsample.eS <- lnormeS (trsample.eS) 
@
   
\item {\tt Finding differentially expressed genes} 
The lmgene routine computes significant probes using the method of
Rocke (2003). A typical call would be

<<eval=TRUE, echo=TRUE>>=
sigprobes <- LMGene(ntrsample.eS)
@

There is an optional argument, level, which is the test level, .05 by
default. A call using this optional parameter would look like

<<eval=TRUE, echo=TRUE>>=
sigprobes <- LMGene(ntrsample.eS,level=.01)
@

The result is a list whose components have the names of the effects in the
model. The values are the significant genes for the test of that effect or else the
message "No significant genes".

As with tranest, it's possible to specify a more complex model to LMGene:

<<eval=TRUE, echo=TRUE>>=
sigprobes <- LMGene(ntrsample.eS, model='patient+dose+patient:dose')
sigprobes
@


\end{enumerate}

The routine LMGene requires the multtest package. 

\bibliographystyle{plain}
\begin{thebibliography}{99}

\bibitem{1}  
Durbin, B.P., Hardin, J.S., Hawkins, D.M., and Rocke, D.M. (2002) ``A
variance-stabilizing transformation for gene-expression microarray
data,'' \emph{Bioinformatics}, {\bf18}, S105--S110.

\bibitem{2} 
Durbin, B. and Rocke, D. M. (2003a) ``Estimation of transformation
parameters for microarray data,'' \emph{Bioinformatics}, {\bf19}, 1360--1367.

\bibitem{3} 
Durbin, B. and Rocke, D. M. (2003b) ``Exact and approximate
variance-stabilizing transformations for two-color microarrays,''
submitted for publication.

\bibitem{4} 
Geller, S.C., Gregg, J.P., Hagerman, P.J., and Rocke, D.M. (2003)
``Transformation and normalization of oligonucleotide microarray
data,'' \emph{Bioinformatics}, {\bf19}, 1817--1823.

\bibitem{5} 
Rocke, David M. (2004) ``Design and Analysis of Experiments with High Throughput Biological Assay Data,'' 
\emph{Seminars in Cell and Developmental Biology },
{\bf15}, 708--713.

\bibitem{6} 
Rocke, D., and Durbin, B. (2001) ``A model for measurement error for
gene expression arrays,'' \emph{Journal of Computational Biology},
{\bf8}, 557--569.

\bibitem{7} 
Rocke, D. and Durbin, B. (2003) ``Approximate
variance-stabilizing transformations for gene-expression microarray
data,'' \emph{Bioinformatics}, {\bf19}, 966--972.

\end{thebibliography}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}