\name{OrderedList}
\alias{OrderedList}
\title{ Detecting Similarities of Two Microarray Studies }
\description{
  Function \code{OrderedList} aims for the \emph{comparison of
  comparisons}: given two expression studies with one ranked (ordered)
  list of genes each, we might observe considerable overlap among the
  top-scoring genes. \code{OrderedList} quantifies this overlap by
  computing a weighted similarity score, where the top-ranking genes
  contribute more to the score than the genes further down the list. The
  final list of overlapping genes consists of those probes that
  contribute a certain percentage to the overall similarity score.
}
\usage{
OrderedList(eset, B = 1000, test = "z", beta = 1, percent = 0.95, 
            verbose = TRUE, alpha=NULL, min.weight=1e-5, empirical=FALSE)
}
\arguments{
  \item{eset}{ Expression set containing the two studies of interest. Use \code{\link{prepareData}} to generate \code{eset}. }
  \item{B}{ Number of internal sub-samples needed to optimize alpha. }
  \item{test}{ String, one of 'fc' (log ratio = log fold change), 't' (t-test with equal variances) or 'z' (t-test with regularized variances). The z-statistic is implemented as described in Efron et al. (2001). }
  \item{beta}{ Either 1 or 0.5. In a comparison where the class labels of the studies match, we set \code{beta=1}. For example, in each single study the first class relates to bad prognosis while the second class relates to good prognosis. If a matching is not possible, we set \code{beta=0.5}. For example, we compare a study with good/bad prognosis classes to a study, in which the classes are two types of cancer tissues. }
  \item{percent}{ The final list of overlapping genes consists of those probes that contribute a certain percentage to the overall similarity score. Default is \code{percent=0.95}. To get the full list of genes, set \code{percent=1}. }
  \item{verbose}{ Logical value for message printing. }
  \item{alpha}{A vector of weighting parameters. If set to NULL (the default),
    parameters are computed such that top 100 to the top 2500 ranks receive
    weights above \code{min.weight}.}
  \item{min.weight}{The minimal weight to be taken into account while computing scores.}
  \item{empirical}{If \code{TRUE}, empirical confidence intervals will be computed by randomly permuting the class labels of each study. Otherwise, a hypergeometric distribution is used. Confidence intervals appear when using \code{\link{plot.OrderedList}}. }
}
\details{
In short, the similarity measure is computed as follows: Based on two-sample test statistics like the t-test, genes within each study are ranked from most up-regulated down to most down-regulated. Thus we have one ordered list per study. Now for each rank going both from top (up-regulated end) and from bottom (down-regulated end) we count the number of overlapping genes. The total overlap \eqn{A_n} for rank \eqn{n} is defined as:
\deqn{A_n = O_n (G_1,G_2) + O_n(f(G_1),f(G_2))}
where \eqn{G_1} and \eqn{G_2} are the two ordered list, \eqn{f(G_1)} and \eqn{f(G_2)} are the two flipped lists with the down-regulated genes on top and \eqn{O_n} is the size of the overlap of its two arguments. A preliminary version of the weighted overlap over all ranks \eqn{n} is then given as:
\deqn{T_\alpha(G_1,G_2) = \sum_n \exp{-\alpha n} A_n.}
The final similarity score includes the case that we cannot match the classes in each study exactly and thus do not know whether up-regulation in one list corresponds to up- or down-regulation in the other list. Here parameter \eqn{\beta} comes into play:
\deqn{
S_\alpha(G_1,G_2) = \max{ \beta T_\alpha(G_1,G_2), (1-\beta) T_\alpha (G_1,f(G_2)) }.
}
Parameter \eqn{\beta} is set by the user but parameter \eqn{\alpha} has to be tuned in a simulation using sub-samples and permutations of the original class labels.
}
\value{
Returns an object of class \code{OrderedList}, which consists of a list with entries:
  \item{n}{Total number of genes.}
  \item{label }{The concatenated study labels as provided by \code{eset}.}
  \item{p }{The p-value specifying the significance of the similarity.}
  \item{intersect }{Vector with sorted probe IDs of the overlapping genes, which contribute \code{percent} to the overall similarity score.}
  \item{alpha }{The optimal regularization parameter alpha.}
  \item{direction }{Numerical value. Returns '1' if the similarity score is higher for the originally ordered lists and '-1' if the score is higher for the comparison of one original to one flipped list. Of special interest if \code{beta=0.5}.}
  \item{scores }{Matrix of observed test scores with genes in rows and studies in columns.}
  \item{sim.scores }{List with four elements with output of the resampling with optimal \code{alpha}. \code{SIM.observed}: The observed similarity sore. \code{SIM.alternative}: Vector of observed similarity scores simulated using sub-sampling within the distinct classes of each study. \code{SIM.random}: Vector of random similarity scores simulated by randomly permuting the class labels of each study. \code{subSample}: \code{TRUE} to indicate that sub-sampling was used.}
  \item{pauc }{Vector with pAUC-scores for each candidate of the regularization parameter \eqn{\alpha}. The maximal pAUC-score defines the optimal \eqn{\alpha}. See also \code{\link{plot.OrderedList}}.}
  \item{call }{List with some of the input parameters.}
  \item{empirical }{List with confidence interval values. Is \code{NULL} if \code{empirical=FALSE}.}
}
\references{
Yang X, Bentink S, Scheid S, and Spang R (2006): Similarities of ordered gene lists, to appear in \emph{Journal of Bioinformatics and Computational Biology}.

Efron B, Tibshirani R, Storey JD, and Tusher V (2001): Empirical Bayes analysis of a microarray experiment, \emph{Journal of the American Statistical Society} \bold{96}, 1151--1160.
}
\author{ Xinan Yang, Claudio Lottaz, Stefanie Scheid }
\seealso{ \code{\link{prepareData}}, \code{\link{OL.data}}, \code{\link{OL.result}}, \code{\link{plot.OrderedList}}, \code{\link{print.OrderedList}}, \code{\link{compareLists}}}
\examples{
### Let's compare the two example studies.
### The first entries of 'out' both relate to bad prognosis.
### Hence the class labels match between the two studies
### and we can use 'OrderedList' with default 'beta=1'.
data(OL.data)
a <- prepareData(
                 list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE),
                 list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE),
		 mapping=OL.data$map
                 )
\dontrun{
OL.result <- OrderedList(a)
}

### The same comparison was done beforehand.
data(OL.result)
OL.result
plot(OL.result)
}
\keyword{ htest }