\name{plgem.fit} \alias{plgem.fit} \title{PLGEM Fitting and Evaluation} \description{ Function for fitting and evaluating goodness of fit of \bold{PLGEM} on a \sQuote{data} \code{ExpressionSet}, using the set of replicated samples identified by the \sQuote{fit.condition} condition of the \sQuote{covariateNumb} covariate. The range of gene expression values (or protein abundance levels) will be partitioned in \sQuote{p} intervals, and the model will be fit at the \sQuote{q}-th quantile of standard deviations in each partition. } \usage{ plgem.fit(data, covariateNumb=1, fit.condition=1, p=10, q=0.5, fittingEval=FALSE, plot.file=FALSE, verbose=FALSE) } \arguments{ \item{data}{an object of class \code{ExpressionSet}; see Details for important information on how the \code{phenoData} slot of this object will be interpreted by the function.} \item{covariateNumb}{\code{integer} (or coercible to \code{integer}); the covariate used to determine on which samples to fit the \bold{PLGEM}.} \item{fit.condition}{\code{integer} (or coercible to \code{integer}); the condition used for \bold{PLGEM} fitting, according to the order of unique values of the \sQuote{covariateNumb} covariate.} \item{p}{\code{integer} (or coercible to \code{integer}); number of intervals used to partition the expression value range.} \item{q}{\code{numeric} in [0,1]; the quantile of standard deviation used for \bold{PLGEM} fitting.} \item{fittingEval}{\code{logical}; if \code{TRUE}, the fitting is evaluated generating a diagnostic plot.} \item{plot.file}{\code{logical}; if \code{TRUE}, a png file is written on the current working directory.} \item{verbose}{\code{logical}; if \code{TRUE}, comments are printed out while running.} } \details{ \code{plgem.fit} fits a Power Law Global Error Model (\bold{PLGEM}) to an expression set and optionally evaluates the quality of the fit. This \bold{PLGEM} aims to find the mathematical relationship between standard deviation and mean gene expression values (or protein abundance levels) in a set of replicated microarray (or proteomics) samples, according to the following power law: ln(modeledSpread) = PLGEMslope * ln(mean) + PLGEMintercept It has been demonstrated that this model fits to Affymetrix GeneChip datasets, as well as to datasets of normalized spectral counts obtained by mass spectrometry-based proteomics (see References for details). The \sQuote{covariateNumb} covariate (the first one by default) of the \code{phenoData} of the \code{ExpressionSet} \sQuote{data} is expected to contain the necessary information about the experimental design. The values of this covariate must be sample labels, that have to be identical for samples to be treated as replicates. \code{plgem.fit} returns \sQuote{SLOPE} and \sQuote{INTERCEPT} of the above described power law; moreover it returns the Pearson's correlation coefficient (\sQuote{DATA.PEARSON}) of ln(mean) vs. ln(sd) in the original data, as well as the adjusted R squared (\sQuote{ADJ.R2.MP}) of the \bold{PLGEM} fitted to the modelling points. If argument \sQuote{fittingEval} is \code{TRUE}, a graphical control of the goodness of the \bold{PLGEM} fitting is produced and a plot containing four panels is generated. The top-left panel shows the power law, characterized by \sQuote{SLOPE} and \sQuote{INTERCEPT}. The top-right panel represents the distribution of model residuals. The bottom-left reports the contour plot of ranked residuals. The bottom-right panel finally shows the relationship between the distribution of observed residuals and the normal distribution. A good fit normally gives a horizontal symmetric rank-plot and a near normal distribution of residuals. } \value{ \code{plgem.fit} returns a list of five numbers (see Details): \item{SLOPE}{the slope of the fitted PLGEM.} \item{INTERCEPT}{the intercept of the fitted PLGEM.} \item{DATA.PEARSON}{the Pearson correlation coefficient of the linear model fitted on the original data.} \item{ADJ.R2.MP}{the adjusted R squared of PLGEM fitted on the modelling points.} \item{FIT.CONDITION}{the condition used for fitting PLGEM.} } \references{ Pavelka N, Pelizzola M, Vizzardelli C, Capozzoli M, Splendiani A, Granucci F, Ricciardi-Castagnoli P. A power law global error model for the identification of differentially expressed genes in microarray data. BMC Bioinformatics. 2004 Dec 17;5:203.; \url{http://www.biomedcentral.com/1471-2105/5/203} Pavelka N, Fournier ML, Swanson SK, Pelizzola M, Ricciardi-Castagnoli P, Florens L, Washburn MP. Statistical similarities between transcriptomics and quantitative shotgun proteomics data. Mol Cell Proteomics. 2007 Nov 19; \url{http://www.mcponline.org/cgi/content/abstract/M700240-MCP200v1} } \author{ Mattia Pelizzola \email{mattia.pelizzola@gmail.com} Norman Pavelka \email{nxp@stowers-institute.org} } \seealso{ \code{\link{plgem.obsStn}}, \code{\link{plgem.resampledStn}}, \code{\link{plgem.pValue}}, \code{\link{plgem.deg}}, \code{\link{run.plgem}} } \examples{ data(LPSeset) LPSfit <- plgem.fit(data=LPSeset, fittingEval=FALSE) sapply(LPSfit, function(x) return(as.vector(x))) } \keyword{models}