\name{CorrSample}
\alias{CorrSample}
\alias{RandPairs}
\title{Sample correlations for random pairs of genes}
\description{
\code{CorrSample} calculates the correlations, standard deviations and some auxiliary variables for random pairs of genes. A plot of the resulting object that shows that these correlations dependend systematically on the genes' variability, suggests a lack of normalization.

\code{RandPairs} is a helper function for generating random pairs from a list of genes.
}
\usage{
CorrSample(x, np, seed, rp, ndx)

RandPairs(probes, number)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{x}{a gene expression matrix, with samples as columns and genes as rows; missing values are accepted.}
  \item{np, number}{the number of random pairs}
  \item{seed}{an optional seed for the random sampling}
  \item{rp}{an optional matrix with two columns specifying the random pairs, see Details.}
  \item{ndx}{an optional logical matrix of the same dimension as \code{x} that allows to eliminate a subset of the expression values from the calculation of the correlations, standard deviations and auxiliary variables.}
  \item{probes}{a vector of genes from which to draw random pairs; can be integer, as a vector of row indices, or character, as a vector of row names.}
}
\details{
The sample of random pairs can be specified in a replicable manner either via \code{np} and \code{seed}, or by using the output from \code{RandPairs} for the parameter \code{rp}. In case we want to use the same set of random pairs (e.g. when comparing different expression measures on the same data set), the second option will be faster.
}
\value{
An object of class \code{corr.sample}; this is just a data frame with an extra class tag to allow for a plotting method. 

The data frame has \code{np} rows and nine columns: 
\item{\code{Correlation}}{the correlation between the two genes across samples}
\item{\code{StdDev}}{the geometric mean of the standard deviations of the two genes}
\item{\code{sd1},\code{sd2}}{the standard deviations of the genes}
\item{\code{m1},\code{m2}}{the means of the genes}
\item{\code{ndx1},\code{ndx2}}{the indices of the two genes; by default, these will be the corresponding row indices of \code{x}, but if \code{rp} is specified, they might be gene names.}
}
\references{
Ploner A, Miller LD, Hall P, Bergh J, Pawitan Y. Correlation test to assess low-level processing of high-density oligonucleotide microarray data. BMC Bioinformatics, 2005, 6(1):80 
\url{http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=15799785}
}
\author{Alexander Ploner \email{Alexander.Ploner@ki.se}}
\seealso{\code{\link{plot.corr.sample}}}
\examples{
# Get small example data
data(oligodata)
dim(datA.rma)

# Compute the correlations for 500 random pairs, 
# that is ca. 1/1000 of all possible pairs 
# Larger numbers are reasonable for larger data sets
cs1 = CorrSample(datA.rma, 500, seed=210)
cs1[1:5,]

# Clear correlation for pairs of genes with low average variability 
plot(cs1)

# A different way of specifying the same 
set.seed(210)
rp = RandPairs(rownames(datA.rma), 500)
cs2 = CorrSample(datA.rma, rp=rp)
cs2[1:5,]
plot(cs2)
}
\keyword{datagen}% at least one, from doc/KEYWORDS