\name{globaltest}

\alias{globaltest}

\title{Global Test}

\description{In microarray data, tests a (list of) group(s) of
genes for significant association with a given clinical variable.}

\usage{globaltest(X, Y, genesets, model,
    levels, d, event = 1, adjust,
    method = c("auto", "asymptotic", "permutations", "gamma"),
    nperm = 10^4, scaleX = TRUE, accuracy = 50, ...) }

\arguments{
    \item{X}{Either a matrix of gene expression data, where columns correspond to
samples and rows to genes or a Bioconductor \code{\link[Biobase:ExpressionSet-class]{ExpressionSet}}.
The data should be properly normalized beforehand (and log- or otherwise transformed),
but missing values are allowed (coded as \code{NA}). Gene and sample names can be included
as the row and column names of \code{X}.}
    \item{Y}{A vector with the clinical outcome of interest, having one value
for each sample. If \code{X} is an
\code{\link[Biobase:ExpressionSet-class]{ExpressionSet}} it can also be the
name of a covariate in the
\code{\link[Biobase:phenoData-class]{phenoData}} from the
\code{\link[Biobase:ExpressionSet-class]{ExpressionSet}},
or a \code{\link[stats:formula]{formula}} object using
these names. If the clinical outcome is survival, \code{Y} should
contain the survival times.}
    \item{genesets}{Either a vector or a list of vectors. Indicates
the group(s) of genes to be tested. Each vector in
\code{genesets} can be given in three formats. Either it can be
a vector with 1 (\code{TRUE}) or 0 (\code{FALSE}) for each gene
in \code{X}, with 1 indicating that the gene belongs to the
group. Or it can be a vector containing the column numbers (in
\code{X}) of the genes belonging to the group. Or it can be a
subset of the rownames or
\code{\link[Biobase:ExpressionSet-class]{featureNames}} for \code{X}.}
    \item{model}{Globaltest will try to determine the correct model from
the input of \code{Y} and \code{d}. To override the automatic choice, use
\code{model = "logistic"} for a two-valued outcome \code{Y} ,
\code{model = "linear"} for a continuous outcome and
\code{model = "survival"} for a survival outcome.}
    \item{levels}{If \code{Y} is a factor (or a category in the PhenoData slot of \code{X})
and contains more than 2 levels: \code{levels} is a vector of levels of \code{Y} to test. If
\code{levels} is length 2: test these 2 groups against each other.
If levels is length 1: test that level against the others.}
    \item{d}{A vector or the name of a covariate in the
\code{\link[Biobase:phenoData-class]{phenoData}} from the
\code{\link[Biobase:ExpressionSet-class]{ExpressionSet}}
\code{X}, to indicate which samples experienced an event.
Providing a value for \code{d} automatically sets \code{model = "survival"}}
    \item{event}{The value or values of \code{d} that indicates that
there was an event.}
    \item{adjust}{Confounders or risk factors for which the test must
be adjusted. Must be either a data frame or (if \code{X} is an
\code{\link[Biobase:ExpressionSet-class]{ExpressionSet}})
the names of covariates in the
\code{\link[Biobase:phenoData-class]{phenoData}} from \code{X}
or a \code{\link[stats:formula]{formula}} object using these
names. Default: no adjustment.}
    \item{method}{The method for calculation the p-value. Use code{method =
"asymptotic"} for the full asymptotic distribution of the test
statistic; \code{method = "gamma"} for the gamma (= scaled
chi-squared) approximation to that distribution and \code{method =
"permutations"} for a permutation p-value. The default: \code{method
= "auto"} chooses the permutations method if the number of possible
permutations does not exceed 10,000 and the asymptotic otherwise.
Note that \code{method = "gamma"} was the default option prior to
version 4.0.0.}
    \item{nperm}{A number of permutations. This gives the (maximum) number of
permutations to be used if \code{method = "permutations"} or
\code{method = "auto"}.}
    \item{scaleX}{If true, rescales the expression matrix to get pleasant value for
all test statistics. The expression matrix \code{X} is multiplied by a constant
in such a way that the expected value EQ of the test statistic for the global test
becomes exactly 10. This rescaling has no effect on the p-values.}
    \item{accuracy}{Numerical tuning parameter useable only with
the asymptotic method and a non-survival response. Determines how
much small eigenvalues of the \code{R} matrix are smoothed away to
increase computation speed. Choose smaller values for quicker
computations but conservative p-values; choose larger values for
slower calculations but more accuracy.}
    \item{...}{Captures deprecated input for compatibility with older versions of
globaltest.} }

\details{The Global Test tests whether a group of genes (of any
size from one single gene to all genes on the array) is
significantly associated with a clinical variable. The group could
be for example a known pathway, an area on the genome or the set
of all genes. The test investigates whether samples with similar
clinical outcomes tend to have similar gene expression patterns.
For a significant result it is not necessary that the genes in
the group have similar expression patterns, only that many of
them are correlated with the outcome.}

\note{
The options globaltest options sampling and permutation
have been replaced by separate functions from version 3.0. See
\code{\link{sampling}} and \code{\link{permutations}}.}

\value{The function returns an object of class
\code{\link[gt.result-class]{gt.result}}.}

\references{For references, type: \code{citation("globaltest")}. See
also the vignette GlobalTest.pdf included with this package.}

\author{Jelle Goeman: \email{j.j.goeman@lumc.nl}; Jan Oosting}

\seealso{Many more examples in the vignette! \code{\link{geneplot}},
\code{\link{sampleplot}}, \code{\link{sampling}},
\code{\link{gt.multtest}}, \code{\link{permutations}},
\code{\link{checkerboard}}, \code{\link{regressionplot}}.}

\examples{
    # Breast cancer data (ExpressionSet) from the Netherlands Cancer
    # Institute with annotation:
    data(vandeVijver)
    data(annotation.vandeVijver)

    # Many possible calls. See the vignette for more examples and explanation.
    globaltest(vandeVijver, "StGallen")
    globaltest(vandeVijver, "StGallen", annotation.vandeVijver)
    globaltest(vandeVijver, "Surv(TIMEsurvival, EVENTdeath)", annotation.vandeVijver)
    globaltest(vandeVijver, StGallen ~ Posnodes + StGallen, annotation.vandeVijver)
    globaltest(vandeVijver, "StGallen", method = "p")

    # Store the test result
    # See help(gt.result) for more options
    gt <- globaltest(vandeVijver, "StGallen", annotation.vandeVijver)
    gt[1:2]
    sort(gt)
    p.value(gt)

    # Also with simple vector/matrix input
    X <- matrix(rnorm(3000), 100, 30)  # random expression data
    Y <- 1:30                          # a response variable
    pathway <- 1:40                    # a pathway

    globaltest(X, Y)
    globaltest(X, Y, pathway)
}

\keyword{htest}