\name{geneData}
\Rdversion{1.1}
\alias{geneData}

\title{
  View the expression data for selected genes
}
\description{
This function outputs and visualizes the expression data for seleted
genes. Potential output files include: a tab-delimited text file, a
heatmap in PDF format, and a scatter plot in PDF format.
}
\usage{
geneData(genes, exprs, ref = NULL, samp = NULL, outname = "array",
txt = TRUE, heatmap = FALSE, scatterplot = FALSE, samp.mean = FALSE,
pdf.size = c(7, 7), cols = NULL, scale = "row", limit = NULL,
label.groups = TRUE, ...)
}

\arguments{
  \item{genes}{
character, either a vector of interesting genes IDs or a 2-column
matrix, where the first column specifies gene IDs used in \code{expData}
while the second column gives another type of IDs to use for the output
data files.
}
  \item{exprs}{
    an expression matrix or matrix-like data structure, with genes as
    rows and samples as columns.
  }
  \item{ref}{
    a numeric vector of column numbers for the reference condition or
    phenotype (i.e. the control group) in the exprs data matrix. Default
    ref = NULL, all columns are considered as target experiments.
  }
  \item{samp}{
    a numeric vector of column numbers for the target condition or
    phenotype (i.e. the experiment group) in the exprs data
    matrix. Default samp = NULL, all columns other than ref are
    considered as target experiments.
  }
  \item{outname}{
    a character string, to be used as the prefix of the output data
    files. Default to be "array".
  }
  \item{txt}{
    boolean, whether to output the selected gene data as a tab-delimited
    text file. Default to be TRUE.
  }
  \item{heatmap}{
    boolean, whether to plot heatmap for the selected gene data as a PDF
    file. Default to be FALSE.
  }
  \item{scatterplot}{
    boolean, whether to make scatter plot for the selected gene data as a PDF
    file. Default to be FALSE.
  }
  \item{samp.mean}{
    boolean, whether to take the mean of gene data over the ref and samp
    group when making the scatter plot. Default to be FALSE, i.e. make
    scatter plots for the first two ref-samp pairs and label them
    differently on the same graph panel. 
  }
  \item{pdf.size}{
    a numeric vector to specify the  the width and height of PDF
    graphics region in inches. Default to be c(7, 7).
  }
  \item{cols}{
    a character vector to specify colors used for the heatmap image
    blocks. Default to be NULL, i.e. to generate a green-red spectrum
    based on the gene data automatically.
  }
  \item{scale}{
    character indicating if the values should be centered and
    scaled in either the row direction or the column direction,
    or none for the heatmap.  The default is "row", other options include "column" and
    "none".
    }
  \item{limit}{
    numeric value to specify the maximal absolute value of gene
    data to visualize using the heatmap. Gene data beyong will be
    reset to equal this value. Default to NULL, i.e. plot all gene
    data values. This argument allows optimal differentiation between most gene
    data values when extremely positive/negative values exsit and squeeze
    the normal-value region. Recommend limit = 3 when the gene data is scaled by row.
  }
  \item{label.groups}{
    boolean, whether to label the two sample groups, i.e. ref and samp,
    differently using side color bars along the heatmap area. Default to
    be TRUE.
  }
  \item{\dots}{
    other arguments to be passed into the inside \code{heatmap2} function.
  }
}
\details{
  This function integrated three most common presentation methods for
  gene expression data: tab-delimited text file, heatmap and scatter
  plot. Heatmap is ideal for visualizing relative changes with gene-wise
  standardized (or row-scaled) data. The heatmap is generated by calling
  a improved version of the heatmap.2 function from gplots package. Scatter plot is ideal for
  visualizing the modest or small but consistent changes over a gene
  set between two states under comparison.

  Although \code{geneData} is designed to be a standard-alone function,
  it is frequently used in tandem with \code{essGene} function to
  present the changes of the essential genes in signficant gene sets.
}
\value{
  The function returns invisible 1 when successfully
  executed. 
}
\references{
  Luo, W., Friedman, M., Shedden K., Hankenson, K. and Woolf, P GAGE:
  Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC
  Bioinformatics 2009, 10:161
}
\author{
  Weijun Luo <luo_weijun@yahoo.com>
}

\seealso{
  \code{\link{essGene}} extract the essential member genes in a gene
  set;
  \code{\link{gage}} the main function for GAGE analysis;
}

\examples{
data(gse16873)
cn=colnames(gse16873)
hn=grep('HN',cn, ignore.case =TRUE)
dcis=grep('DCIS',cn, ignore.case =TRUE)

#kegg test for 1-directional changes
data(kegg.gs)
gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs, 
    ref = hn, samp = dcis)
rownames(gse16873.kegg.p$greater)[1:3]
gs=unique(unlist(kegg.gs[rownames(gse16873.kegg.p$greater)[1:3]]))
essData=essGene(gs, gse16873, ref =hn, samp =dcis)
head(essData)
ref1=1:6
samp1=7:12
#generated text file for data table, pdf files for heatmap and scatterplot
for (gs in rownames(gse16873.kegg.p$greater)[1:3]) {
    outname = gsub(" |:|/", "_", substr(gs, 10, 100))
    geneData(genes = kegg.gs[[gs]], exprs = essData, ref = ref1,
        samp = samp1, outname = outname, txt = TRUE, heatmap = TRUE,
        Colv = FALSE, Rowv = FALSE, dendrogram = "none", limit = 3, scatterplot = TRUE)
}
}

\keyword{multivariate}
\keyword{manip}