\name{getLikelihoods} \alias{getLikelihoods} \alias{getLikelihoods.Dirichlet} \alias{getLikelihoods.NB} \alias{getLikelihoods.NBboot} \alias{getLikelihoods.Pois} %- Also NEED an '\alias' for EACH other topic documented here. \title{Finds posterior likelihoods for each count as belonging to some hypothesis.} \description{ These functions calculate posterior probabilities for each of the 'counts' in the countDP object belonging to each of the groups specified. The choice of function depends on the prior belief about the underlying distribution of the data. It is essential that the method used for calculating priors matches the method used for calculating the posterior probabilites. For a comparison of the methods, see Hardcastle & Kelly, 2009. } \usage{ getLikelihoods(cD, prs, pET = "BIC", marginalise = FALSE, subset = NULL, priorSubset = NULL, verbose = TRUE, ..., cl) getLikelihoods.Dirichlet(cD, prs, pET = "BIC", marginalise = FALSE, subset = NULL, priorSubset = NULL, verbose = TRUE, cl) getLikelihoods.Pois(cD, prs, pET = "BIC", marginalise = FALSE, subset = NULL, priorSubset = NULL, distpriors = FALSE, verbose = TRUE, cl) getLikelihoods.NB(cD, prs, pET = "BIC", marginalise = FALSE, subset = NULL, priorSubset = NULL, bootStraps = 1, conv = 1e-4, nullData = FALSE, returnAll = FALSE, returnPD = FALSE, verbose = TRUE, cl) } %- maybe also 'usage' for other objects documented here. \arguments{ \item{cD}{An object of type \code{\link{countData}}, or descending from this class.} \item{prs}{(Initial) prior probabilities for each of the groups in the 'countDP' object. Should sum to 1, unless nullData is TRUE, in which case it should sum to less than 1.} \item{pET}{What type of prior re-estimation should be attempted? Defaults to "BIC"; "none" and "iteratively" are also available.} \item{marginalise}{Should an attempt be made to numerically marginalise over a prior distribution iteratively estimated from the posterior distribution? Defaults to FALSE, as in general offers little performance gain and increases computational cost considerably.} \item{subset}{Numeric vector giving the subset of counts for which posterior likelihoods should be estimated.} \item{priorSubset}{Numeric vector giving the subset of counts which may be used to estimate prior probabilities on each of the groups. See Details.} \item{distpriors}{Should the Poisson method use an empirically derived distribution on the prior parameters of the Poisson distribution, or use the mean of the maximum likelihood estimates (default).} \item{bootStraps}{How many iterations of bootstrapping should be used in the (re)estimation of priors in the negative binomial method.} \item{conv}{If not null, bootstrapping iterations will cease if the mean squared difference between posterior likelihoods of consecutive bootstraps drops below this value.} \item{nullData}{If TRUE, looks for segments or counts with no true expression. See Details.} \item{returnAll}{If TRUE, and bootStraps > 1, then instead of returning a single countData object, the function returns a list of countData objects; one for each bootstrap. Largely used for debugging purposes.} \item{returnPD}{If TRUE, then the function returns the (log) likelihoods of the data given the models, rather than the posterior (log) likelihoods of the models given the data. Not recommended for general use.} \item{verbose}{Should status messages be displayed? Defaults to TRUE.} \item{cl}{A SNOW cluster object.} \item{...}{Any additional information to be passed by the \code{'getLikelihoods'} wrapper function to the individual functions which calculate the likelihoods.} } \details{ These functions estimate, under the assumption of various distributions, the (log) posterior likelihoods that each count belongs to a group defined by the \code{@group} slot of the \code{countData} object. The posterior likelihoods are stored on the natural log scale in the \code{@posteriors} slot of the \code{\link{countData}} object generated by this function. This is because the posterior likelihoods are calculated in this form, and ordering of the counts is better done on these log-likelihoods than on the likelihoods. If \code{'pET = "none"'} then no attempt is made to re-estimate the prior likelihoods given in the \code{'prs'} variable. However, if \code{'pET = "BIC"'}, then the function will attempt to estimate the prior likelihoods by using the Bayesian Information Criterion to identify the proportion of the data best explained by each model and taking these proportions as prior. Alternatively, an iterative re-estimation of priors is possible (\code{'pET = "iteratively"'}), in which an inital estimate for the prior likelihoods of the models is used to calculated the posteriors and then the priors are updated by taking the mean of the posterior likelihoods for each model across all data. This often works well, particularly if the 'BIC' method is used (see Hardcastle & Kelly 2010 for details). However, if the data are sufficiently non-independent, this approach may substantially mis-estimate the true priors. If it is possible to select a representative subset of the data by setting the variable \code{'subsetPriors'} that is sufficiently independent, then better estimates may be acquired. The Dirichlet and Poisson methods produce almost identical results in simulation. The Negative Binomial method produces results with much lower false discovery rates, but takes considerably longer to run. Filtering the data may be extremely advantageous in reducing run time. This can be done by passing a numeric vector to 'subset' defining a subset of the data for which posterior likelihoods are required. If 'nullData = TRUE', the algorithm attempts to find those counts or segments that have no true expression in all samples. This means that there is another, implied group where all samples are equal. The prior likelihoods given in the 'prs' object must thus sum to less than 1, with the residual going to this group. See Hardcastle & Kelly (2010) for a full comparison of the methods. A 'cluster' object is strongly recommended in order to parallelise the estimation of posterior likelihoods, particularly for the negative binomial method. However, passing NULL to the \code{cl} variable will allow the functions to run in non-parallel mode. The \code{'getLikelihoods'} function will infer the correct distribution to use from the information stored in the \code{'@priors'} slot of the \code{\link{countData}} object \code{'sD'} and call the appropriate function. } \value{ A \code{\link{countData}} object. } \references{Hardcastle T.J., and Kelly, K. baySeq: Empirical Bayesian Methods For Identifying Differential Expression In Sequence Count Data. BMC Bioinformatics (2010)} \author{Thomas J. Hardcastle} \seealso{\code{\link{countData}}, \code{\link{getPriors}}, \code{\link{topCounts}}, \code{\link{getTPs}}} \examples{ library(baySeq) # See vignette for more examples. # Create a {countData} object and estimate priors for the # Poisson methods. data(simCount) data(libsizes) replicates <- c(1,1,1,1,1,2,2,2,2,2) groups <- list(c(1,1,1,1,1,1,1,1,1,1), c(1,1,1,1,1,2,2,2,2,2)) CD <- new("countData", data = simCount, replicates = replicates, libsizes = libsizes, groups = groups) CDP.Poi <- getPriors.Pois(CD, samplesize = 20, takemean = TRUE, cl = NULL) # Get likelihoods for data with Poisson method CDPost.Poi <- getLikelihoods.Pois(CDP.Poi, prs = c(0.5, 0.5), pET = "BIC", marginalise = FALSE, cl = NULL) # Alternatively, get priors for negative binomial method CDP.NB <- getPriors.NB(CD, samplesize = 10^5, estimation = "QL", cl = NULL) # Get likelihoods for data with negative binomial method with bootstrapping CDPost.NB <- getLikelihoods.NBboot(CDP.NB, prs = c(0.5, 0.5), pET = "BIC", marginalise = FALSE, bootStraps = 1, cl = NULL) # Alternatively, if we have the 'snow' package installed we # can parallelise the functions. This will usually (not always) offer # significant performance gain. cl <- NULL try(library(snow)) try(cl <- makeCluster(4, "SOCK")) CDP.NB <- getPriors.NB(CD, samplesize = 10^5, estimation = "QL", cl = cl) CDPost.NB <- getLikelihoods.NB(CDP.NB, prs = c(0.5, 0.5), pET = "BIC", marginalise = FALSE, cl = cl) } \keyword{distribution} \keyword{models}