\name{si}
\alias{si}
\alias{splicing.index}

\title{ Calculate the splicing index }
\description{
 Calculates the splicing index for the probesets in one or more genes, as
 defined in the Affymetrix white paper "Alternative Transcript Analysis Methods for Exon Arrays".
}
\usage{
 si(x, v, group, gps, median.gene=FALSE,median.probeset=FALSE,unlogged=TRUE)
}

\arguments{
  \item{x}{eSet containing expression data }
  \item{v}{ Character vector of Ensembl gene names }
  \item{group}{ If defined, the column name in the ExpressionSet's pData object in which to look
    for \code{gps} }
  \item{gps}{ The two sets of arrays to compare }
  \item{median.gene}{ Use the median instead of the mean when
    calculating averages across genes}
  \item{median.probeset}{ Use the median instead of the mean when
    calculating averages across probesets in each replicate group  }
  \item{unlogged}{ Unlog the expression data before calculating the
    splicing index (and then re-log afterwards) }
}

\details{

The splicing index gives a measure of the difference in expression level
for each probeset in a gene between two sets of arrays, relative to the
gene-level average in each set. This is calculated only for those
probesets that are defined as exon targeting and non-multitargetted (See
\code{\link{select.probewise}} and \code{\link{exclude.probewise}} for
more details of how this filtering is performed.

The two sets of arrays can be specified in two ways: First, by using
numeric indices defining the appropriate columns in the expression
data. This is done by supplying these as a list to \code{gps}
(e.g. \code{gps=list(1:3,4:6)} will calculate the splicing index
between arrays 1,2,3 and 4,5,6. Alternatively, the annotation in the
\code{\link[Biobase]{pData}} object from \code{x} can be used
(e.g. \code{group="treatment",gps=c("a","b")}, will compare between
the arrays labelled "a", and "b" in the "treatment" column of
\code{pData(x)}).

The implementation also calculates a \code{p.value} and
\code{t.statistic} for each probeset; these are returned alongside the
splicing index.

By default, the splicing index is calculated using the mean across genes
and samples. Specifing \code{median.gene=TRUE} or
\code{median.probeset=TRUE} will use the median instead (for the gene or
probeset level averages, respectively). It is calculated using the
unlogged data, unless \code{unlogged=FALSE}. This only affects the
internal calculations; values in \code{x} are always assumed to be
logged, and the splicing index is always returned on the log2 scale.  }

\value{
A \code{list}, one element for each gene. Each element contains a
\code{data.frame}, with the results for a given gene. Each row
corresponds to a probeset, and there are four columns in the
\code{data.frame}: \code{"si","p.value","t.statistic"} and \code{"gene.av"}.
}
\references{\url{ http://bioinformatics.picr.man.ac.uk/}}
\author{ Crispin J Miller with contributions from Carla Moller Levet and Michal J Okoniewski}

\seealso{ \code{\link[exonmap]{splanova}} }
\keyword{ misc }
\examples{ 
  if(interactive()) {
    xmapConnect()
    data(exonmap)
    gg <- probeset.to.gene(c("2326780","2326822" ))
    spl.idx <-  si(x, gg, "group", c("a","b"))
    spl.idx <-  si(x, gg, gps=list(1:3,4:6))
  }
}