%\VignetteIndexEntry{Extracting affy and limma objects from affylmGUI files}
%\VignetteDepends{}
%\VignetteKeywords{microarray affy linear model GUI}
%\VignettePackage{affylmGUI}
\documentclass[12pt]{article}
\usepackage{hyperref}

\textwidth=6.2in
\textheight=8.5in
\oddsidemargin=0.2in
\evensidemargin=0.2in
\headheight=0in
\headsep=0in

\begin{document}
\title{Extracting affy and limma objects from affylmGUI files}
\author{James Wettenhall}
\date{April 22, 2004}
\maketitle

\noindent This vignette gives a short example showing how to extract affy and limma
data objects from files saved by affylmGUI.  This could be used for advanced limma analysis
by an expert user after some preliminary analysis with affylmGUI by someone
unfamiliar with the command-line interface.

\noindent We will use a file \texttt{EstrogenContrastsComputed.lma} which unfortunately is
not available for downloading because it is too large.  Efforts will be made in the future
to provide a smaller data set as an example.


\noindent The \texttt{.lma} extension used by affylmGUI is simply a three-letter abbreviation of limma (Linear
Models for Microarrays).  This file is in fact a standard \texttt{.RData} file and can be loaded into
any R session as described below.

\begin{Schunk}
\begin{Sinput}
> load("EstrogenContrastsComputed.lma")
\end{Sinput}
\end{Schunk}

\noindent Firstly, let's load both the affy and limma packages, so that R knows how to
display objects defined by Biobase, affy and limma classes (e.g. \texttt{AffyBatch} and
\texttt{MArrayLM}).

\begin{Schunk}
\begin{Sinput}
> library(affy)
> library(limma)
\end{Sinput}
\end{Schunk}

\noindent Now let's have a look at the R objects available to us:
\begin{Schunk}
\begin{Sinput}
> ls()
\end{Sinput}
\end{Schunk}

\noindent Now let's look at the RNA targets for this dataset:

\begin{Schunk}
\begin{Sinput}
> Targets
      Name     FileName       Target
1  Abs10.1  low10-1.cel  EstAbsent10
2  Abs10.2  low10-2.cel  EstAbsent10
3 Pres10.1 high10-1.cel EstPresent10
4 Pres10.2 high10-2.cel EstPresent10
5  Abs48.1  low48-1.cel  EstAbsent48
6  Abs48.2  low48-2.cel  EstAbsent48
7 Pres48.1 high48-1.cel EstPresent48
8 Pres48.2 high48-2.cel EstPresent48
\end{Sinput}
\end{Schunk}

\noindent Now let's look at the first 10 gene names for this dataset (assuming that
the affylmGUI analysis which saved this file had previously found an appropriate
annotation package for this chip) :

\begin{Schunk}
\begin{Sinput}
> geneNames[1:10]
 [1] " Rab geranylgeranyltransferase, alpha subunit"
 [2] " mitogen-activated protein kinase 3"
 [3] " tyrosine kinase with immunoglobulin and epidermal growth factor homology domains"
 [4] " cytochrome P450, family 2, subfamily C, polypeptide 19"
 [5] " Burkitt lymphoma receptor 1, GTP binding protein (chemokine (C-X-C motif) receptor 5)"
 [6] " Burkitt lymphoma receptor 1, GTP binding protein (chemokine (C-X-C motif) receptor 5)"
 [7] " dual specificity phosphatase 1"
 [8] " matrix metalloproteinase 10 (stromelysin 2)"
 [9] " discoidin domain receptor family, member 1"
[10] " protein kinase, interferon-inducible double stranded RNA dependent"
\end{Sinput}
\end{Schunk}

\noindent The raw probe-level data is stored an an \texttt{AffyBatch} object called \texttt{RawAffyData}.
By typing the name of this object at the prompt, we get a brief summary of this data,
including the name of the chip name, the number of samples (8 in this case) and the
number of genes (12625 in this case).

\begin{Schunk}
\begin{Sinput}
> RawAffyData
AffyBatch object
size of arrays=640x640 features (25604 kb)
cdf=HG_U95Av2 (12625 affyids)
number of samples=8
number of genes=12625
annotation=hgu95av2
\end{Sinput}
\end{Schunk}

\noindent We can now obtain the probe set IDs as follows:

\begin{Schunk}
\begin{Sinput}
> cdfenv <- getCdfInfo(RawAffyData)
> probeSetIDs <- ls(cdfenv)
> as.matrix(probeSetIDs[1:10])
      [,1]
 [1,] "100_g_at"
 [2,] "1000_at"
 [3,] "1001_at"
 [4,] "1002_f_at"
 [5,] "1003_s_at"
 [6,] "1004_at"
 [7,] "1005_at"
 [8,] "1006_at"
 [9,] "1007_s_at"
[10,] "1008_f_at"
\end{Sinput}
\end{Schunk}

\noindent Now let's check whether normalized affy data is available by checking the
Boolean variable \texttt{NormalizedAffyData.Available}.
\begin{Schunk}
\begin{Sinput}
> NormalizedAffyData.Available
TRUE
\end{Sinput}
\end{Schunk}

\noindent Now let's check what method was used to normalize the data.  Currently there
are two choices: RMA (Robust Multiarray Averaging) and PLM (Probe-level Linear Models).

\begin{Schunk}
\begin{Sinput}
> NormMethod
[1] "RMA"
\end{Sinput}
\end{Schunk}

\noindent Since the normalized data is available, set's see a summary of it, by
typing the name of the expression set object storing the normalized data,
\texttt{NormalizedAffyData}.

\begin{Schunk}
\begin{Sinput}
> NormalizedAffyData
Expression Set (exprSet) with
        12625 genes
        8 samples
                 phenoData object with 1 variables and 8 cases
         varLabels
                sample: arbitrary numbering
\end{Sinput}
\end{Schunk}

\noindent Now let's see the design matrix used for linear modelling (which is
automatically determined from the Targets file).

\begin{Schunk}
\begin{Sinput}
> design
             EstAbsent10 EstAbsent48 EstPresent10 EstPresent48
low10-1.cel            1           0            0            0
low10-2.cel            1           0            0            0
high10-1.cel           0           0            1            0
high10-2.cel           0           0            1            0
low48-1.cel            0           1            0            0
low48-2.cel            0           1            0            0
high48-1.cel           0           0            0            1
high48-2.cel           0           0            0            1
\end{Sinput}
\end{Schunk}

\noindent Now let's see how many contrast parameterizations have been defined (i.e. how many contrast matrices).

\begin{Schunk}
\begin{Sinput}
> NumContrastParameterizations
[1] 1
\end{Sinput}
\end{Schunk}

\noindent In this case, there is only one contrast parameterization.  Now let's have a look at the
objects stored within this contrast parameterization.  The '1' in double square-brackets represents
the first (and only) contrast parameterization.

\begin{Schunk}
\begin{Sinput}
> names(ContrastParameterizationList[[1]])
[1] "NumContrastParameterizations"     "fit"
[3] "eb"                               "contrastsMatrixInList"
[5] "ContrastParameterizationNameText"
\end{Sinput}
\end{Schunk}

\noindent There is an object called contrastsMatrixInList, which is a list object containing the contrast matrix,
and some information about how the user created that contrast matrix, in this case by requesting
several pariwise comparisons between the four cases in the 2X2 factorial design
(estrogen abs/pres X time 10/40) rather than manually entering the contrast matrix numerically.

\begin{Schunk}
\begin{Sinput}
> ContrastParameterizationList[[1]]$contrastsMatrixInList
$contrasts
             (EstPresent10)-(EstAbsent10) (EstPresent48)-(EstAbsent48) (EstPresent48)-(EstPresent10)
EstAbsent10                            -1                            0                             0
EstAbsent48                             0                           -1                             0
EstPresent10                            1                            0                            -1
EstPresent48                            0                            1                             1

$contrastsCreatedFromDropDowns
[1] TRUE

$Param1
[1] 3 4 4

$Param2
[1] 1 2 3


\end{Sinput}
\end{Schunk}

\noindent Now let's look at the linear model fit object.  Currently, \texttt{affylmGUI} still
uses the old \texttt{lm.series} from \texttt{limma} rather than \texttt{lmFit}, so the \texttt{fit}
object is just a standard R list object, so typing
\texttt{ContrastParameterizationList[[1]]\$fit} and pressing enter would display a lot more data than
you would generally want to see scrolling past your screen.  Soon this will probably change, so that
\texttt{lmFit} will be used to create an fit object of class \texttt{MArrayLM}.  In either case,
the components of the fit object can be viewed as follows:
\begin{Schunk}
\begin{Sinput}
> names(ContrastParameterizationList[[1]]$fit)
[1] "coefficients"   "stdev.unscaled" "sigma"          "df.residual"    "contrasts"
[6] "Amean"
\end{Sinput}
\end{Schunk}

\noindent Empirical bayes statistics can be obtained from the \texttt{"eb"} component of
\texttt{ParameterizationList[[1]]}.  Note that recent versions of limma encourage users to
calculate empirical bayes statistics using \texttt{eBayes}, rather than \texttt{ebayes},
whereas at the time of writing limmaGUI still uses the old \texttt{ebayes} method, which
produces a standard list object, meaning that typing \texttt{ParameterizationList[[1]]\$eb}
and pressing enter will display all the data in the list, rather than a summary.  The
components of the empirical bayes list object can be viewed as follows:

\begin{Schunk}
\begin{Sinput}
> names(ContrastParameterizationList[[1]]$eb)
 [1] "coefficients"   "stdev.unscaled" "sigma"          "df.residual"    "contrasts"
 [6] "df.prior"       "s2.prior"       "var.prior"      "proportion"     "s2.post"
[11] "t"              "p.value"        "lods"
\end{Sinput}
\end{Schunk}

\noindent For example, the moderated t statistics can be obtained as follows:

\begin{Schunk}
\begin{Sinput}
> ContrastParameterizationList[[1]]$eb$t
\end{Sinput}
\end{Schunk}

\end{document}