\name{mglm} \alias{mglm} \alias{mglmSimple} \alias{mglmLS} \alias{mglmOneGroup} \alias{mglmOneWay} \alias{mglmLevenberg} \alias{deviances.function} \alias{designAsFactor} \title{Fit Negative Binomial Generalized Linear Model to Multiple Response Vectors} \description{ Fit the same log-link negative binomial or Poisson generalized linear model (GLM) to each row of a matrix of counts. } \usage{ mglmLS(y, design, dispersion=0, offset=0, start=NULL, tol=1e-5, maxit=50, trace=FALSE) mglmOneGroup(y, dispersion=0, offset=0, maxit=50, trace=FALSE) mglmOneWay(y, design=NULL, dispersion=0, offset=0, maxit=50, trace=FALSE) mglmSimple(y, design, dispersion=0, offset=0, weights=NULL) mglmLevenberg(y, design, dispersion=0, offset=0, start=NULL) deviances.function(dispersion) designAsFactor(design) } \arguments{ \item{y}{numeric matrix containing the negative binomial counts. Rows for tags and columns for libraries.} \item{design}{numeric matrix giving the design matrix of the GLM. Assumed to be full column rank.} \item{dispersion}{numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all tags, or a vector of length equal to the number of tags giving tag-wise dispersions.} \item{offset}{numeric vector or matrix giving the offset that is to be included in the log-linear model predictor. Can be a scalar, a vector of length equal to the number of libraries, or a matrix of the same size as \code{y}.} \item{weights}{numeric vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same size as \code{y}.} \item{start}{numeric matrix of starting values for the GLM coefficients. Number of rows should agree with \code{y} and number of columns should agree with \code{design}.} \item{tol}{numeric scalar giving the convergence tolerance.} \item{maxit}{scalar giving the maximum number of iterations for the Fisher scoring algorithm.} \item{trace}{logical, whether or not to information should be output at each iteration.} } \details{ The functions \code{mglmLS}, \code{mglmOneGroup} and \code{mglmSimple} all fit negative binomial generalized linear models, with the same design matrix but possibly different dispersions, offsets and weights, to a series of response vectors. \code{mglmLS} and \code{mglmOneGroup} are vectorized in R for fast execution, while \code{mglmSimple} simply makes tagwise calls to \code{glm.fit} in the stats package. The functions are all low-level functions in that they operate on atomic objects such as matrices. They are used as work-horses by higher-level functions in the edgeR package. \code{mglmOneGroup} fits the null model, with intercept term only, to each response vector. In other words, it treats the libraries as belonging to one group. It implements Fisher scoring with a score-statistic stopping criterion for each tag. Excellent starting values are available for the null model, so this function seldom has any problems with convergence. It is used by other edgeR functions to compute the overall abundance for each tag. \code{mglmLS} fits an arbitrary log-linear model to each response vector. It implements a vectorized approximate scoring algorithm with a likelihood derivative stopping criterion for each tag. A simple line search strategy is used to ensure that the residual deviance is reduced at each iteration. This function is the work-horse of other edgeR functions such as \code{glmFit} and \code{glmLRT}. \code{mglmSimple} is not vectorized, and simply makes tag-wise calls to \code{glm.fit}. This has the advantage that it accesses all the usual information generated by \code{glm.fit}. Unfortunately, \code{glm.fit} does not always converge, and the tag-wise fitting is relatively slow. \code{mglmLevenberg} is similar to \code{mglmSimple}, but makes tagwise calls to \code{glmnb.fit} in the statmod package instead of \code{glm.fit}. \code{glmnb.fit} implements a Levenberg-Marquardt modification of the scoring algorithm to prevent divergence. All these functions treat the dispersion parameter of the negative binomial distribution as a known input. \code{deviances.function} simply chooses the appropriate deviance function to use given a scalar or vector of dispersion parameters. If the dispersion values are zero, then the Poisson deviance function is returned; if the dispersion values are positive, then the negative binomial deviance function is returned. } \value{ \code{mglmOneGroup} produces a vector of length equal to the number of tags/genes (number of rows of \code{y}) providing the single coefficent from the GLM fit for each tag/gene. This can be interpreted as a measure of the 'average expression' level of the tag/gene. \code{mglmLS} produces a list with the following components: \item{coefficients}{matrix of estimated coefficients for the linear models} \item{fitted.values}{matrix of fitted values} \item{fail}{vector of indices of tags that fail the line search, in that the maximum number of step-halvings in exceeded} \item{not.converged}{vector of indices of tags that exceed the iteration limit before satisying the convergence criterion} \code{mglmSimple} produces a list with the following components: \item{coefficients}{matrix of estimated coefficients for the linear models} \item{df.residual}{vector of residual degrees of freedom for the linear models} \item{deviance}{vector of deviances for the linear models} \item{design}{matrix giving the experimental design that was used for each of the linear models} \item{offset}{scalar, vector or matrix of offset values used for the linear models} \item{dispersion}{scalar or vector of the dispersion values used for the linear model fits} \item{weights}{matrix of final weights for the observations from the linear model fits} \item{fitted.values}{matrix of fitted values} \code{deviances.function} returns a function to calculate the deviance as appropriate for the given values of the dispersion. \code{designAsFactor} returns a factor of length equal to \code{nrow(design)}. } \author{Davis McCarthy, Yunshun Chen, Gordon Smyth} \examples{ y<-matrix(rnbinom(1000,mu=10,size=2),ncol=4) dispersion <- 0.1 ## Fit the NB GLM to the counts ave.expression <- mglmOneGroup(y, dispersion=dispersion) head(ave.expression) ## Fit the NB GLM to the counts with a given design matrix f1<-factor(c(1,1,2,2)) f2<-factor(c(1,2,1,2)) x<-model.matrix(~f1+f2) ave.expression <- mglmLS(y, x, dispersion=dispersion) head(ave.expression$coef) } \seealso{ \code{\link{glmFit}}, for more object-orientated GLM modelling for DGE data. } \keyword{algebra}