Errors and warnings in variancePartition
are mostly designed to let the user know that there is an isssue with the model. Note that some of these warnings and errors can be overridden by specifying hideErrorsInBackend=TRUE
for dream()
and showWarnings=FALSE
for fitExtractVarPartModel()
and fitVarPartModel()
.
Errors in dream()
The linear mixed model used by dream()
can be a little fragile for small sample sizes and correlated covariates.
Initial model failed: the fixed-effects model matrix is column rank deficient (rank(X) = 3 < 4 = p); the fixed effects will be jointly unidentifiable
The design matrix has redundant variables, so the model is singular and coefficients can’t be estimated. Fix by dropping one or more variables. Use canCorPairs()
to examine correlation betweeen variables.
Gene-level errors
The most common issue is when dream()
analysis succeeds for most genes, but a handful of genes fail. These genes can fail if the iterative process of fitting the linear mixed model does not converge, or if the estimated covariance matrix that is supposed be positive definite has an eigen-value that is negative or too close to zero due to rounding errors in floating point arithmetic.
In these cases, dream()
gives a warning that the model has failed for a subset of genes, and also provides the gene-level errors. All successful model fits are returned to be used for downstream analysis.
Here we demonstrate how dream()
handles model fits:
library(variancePartition)
data(varPartData)
# Redundant formula
# This example is an extreme example of redundancy
# but more subtle cases often show up in real data
form <- ~ Tissue + (1 | Tissue)
fit <- dream(geneExpr[1:30, ], form, info)
## Warning in dream(geneExpr[1:30, ], form, info): Model failed for 29 responses.
## See errors with attr(., 'errors')
# Extract gene-level errors
attr(fit, "errors")[1:2]
## gene1
## "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol): (converted from warning)
## Model may not have converged with 1 eigenvalue close to zero: -2.0e-09\n"
## gene2
## "Error: (converted from warning) Model failed to converge
## with 1 negative eigenvalue: -1.5e-08\n"
Shared by multiple functions
These are shared by dream()
, fitVarPartModel()
and fitExtractVarPartModel()
. Note that some of the these can be found in “1) Tutorial on using variancePartition”.
Warnings
No Intercept term was specified in the formula: The results will not behave as expected and may be very wrong!!
An intercept (i.e. mean term) must be specified order for the results to be statistically valid. Otherwise, the variance percentages will be very overestimated.
Categorical variables modeled as fixed effect: The results will not behave as expected and may be very wrong!!
If a linear mixed model is used, all categorical variables must be modeled as a random effect. Alternatively, a fixed effect model can be used by modeling all variables as fixed.
Cannot have more than one varying coefficient term:\newline The results will not behave as expected and may be very wrong!!
Only one varying coefficient term can be specified. For example, the formula ~(Tissue+0|Individual) + (Batch+0|Individual)
contains two varying coefficient terms and the results from this analysis are not easily interpretable. Only a formula with one term like (Tissue+0|Individual)
is allowed.
Errors
Colinear score > .99: Covariates in the formula are so strongly correlated that the parameter estimates from this model are not meaningful. Dropping one or more of the covariates will fix this problem
Error in asMethod(object) : not a positive definite matrix
In vcov.merMod(fit) : Computed variance-covariance matrix problem: not a positive definite matrix; returning NA matrix
fixed-effect model matrix is rank deficient so dropping 26 columns / coefficients
Including variables that are highly correlated can produce misleading results (see Section “Detecting problems caused by collinearity of variables”). In this case, parameter estimates from this model are not meaningful. Dropping one or more of the covariates will fix this problem.
Error in checkNlevels(reTrms$flist, n = n, control): number of levels of each grouping factor must be < number of observations
This arises when using a varying coefficient model that examines the effect of one variable inside subsets of the data defined by another: ~(A+0|B)
. See Section “Variation within multiple subsets of the data”. There must be enough observations of each level of the variable B with each level of variable A. Consider an example with samples from multiple tissues from a set of individual where we are interested in the variation across individuals within each tissue using the formula: ~(Tissue+0|Individual)
. This analysis will only work if there are multiple samples from the same individual in at least one tissue. If all tissues only have one sample per individual, the analysis will fail and variancePartition
will give this error.
Problem with varying coefficient model in formula: should have form (A+0|B)
When analyzing the variation of one variable inside another (see Section “Variation within multiple subsets of the data”.), the formula most be specified as (Tissue+0|Individual)
. This error occurs when the formula contains (Tissue|Individual)
instead.
fatal error in wrapper code
Error in mcfork() : unable to fork, possible reason: Cannot allocate memory
Error: cannot allocate buffer
This error occurs when fitVarPartModel
uses too many threads and takes up too much memory. The easiest solution is to use fitExtractVarPartModel
instead. Occasionally there is an issue in the parallel backend that is out of my control. Using fewer threads or restarting R will solve the problem.
Errors: Problems removing samples with NA/NaN/Inf values
variancePartition
fits a regression model for each gene and drops samples that have NA/NaN/Inf values in each model fit. This is generally seamless but can cause an issue when a variable specified in the formula no longer varies within the subset of samples that are retained. Consider an example with variables for sex and age where age is NA for all males samples. Dropping samples with invalid values for variables included in the formula will retain only female samples. This will cause variancePartition
to throw an error because there is now no variation in sex in the retained subset of the data. This can be resolved by removing either age or sex from the formula.
This situtation is indicated by the following errors:
Error: grouping factors must have > 1 sampled level
Error: Invalid grouping factor specification, Individual
Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels
Error in checkNlevels(reTrms\$flist, n = n, control): grouping factors must have > 1 sampled level
Errors with BiocParallel multithreading backend
Error: 'bpiterate' receive data failed: error reading from connection
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
variancePartition
uses the BiocParallel
package to run analysis in parallel across multiple cores. If there is an issue with the parallel backend you might see these errors. This often occurs in long interactive sessions, or if you manually kill a function running in parallel. There are two ways to address this issue.
Global: set the number of threads to be a smaller number. I have found that reducing the number of threads reduces the chance of random failures like this.
library(BiocParallel)
# globally specify that all multithreading using bpiterate from BiocParallel
# should use 8 cores
register(SnowParam(8))
Local: set the number of theads at each function call. This re-initializes the parallel backend and should address the error
fitExtractVarPartModel(..., BPPARAM = SnowParam(8))
fitVarPartModel(..., BPPARAM = SnowParam(8))
dream(..., BPPARAM = SnowParam(8))
voomWithDreamWeights(..., BPPARAM = SnowParam(8))
Session Info
## R version 4.3.2 Patched (2023-11-13 r85521)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_GB
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] edgeR_4.0.15 pander_0.6.5 variancePartition_1.32.5
## [4] BiocParallel_1.36.0 limma_3.58.1 ggplot2_3.4.4
## [7] knitr_1.45
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 farver_2.1.1 dplyr_1.1.4 bitops_1.0-7
## [5] fastmap_1.1.1 digest_0.6.34 lifecycle_1.0.4 statmod_1.5.0
## [9] magrittr_2.0.3 compiler_4.3.2 rlang_1.1.3 sass_0.4.8
## [13] tools_4.3.2 utf8_1.2.4 yaml_2.3.8 labeling_0.4.3
## [17] plyr_1.8.9 KernSmooth_2.23-22 withr_3.0.0 purrr_1.0.2
## [21] numDeriv_2016.8-1.1 BiocGenerics_0.48.1 grid_4.3.2 aod_1.3.3
## [25] fansi_1.0.6 caTools_1.18.2 colorspace_2.1-0 scales_1.3.0
## [29] gtools_3.9.5 iterators_1.0.14 MASS_7.3-60.0.1 cli_3.6.2
## [33] mvtnorm_1.2-4 rmarkdown_2.25 generics_0.1.3 reshape2_1.4.4
## [37] minqa_1.2.6 cachem_1.0.8 stringr_1.5.1 splines_4.3.2
## [41] parallel_4.3.2 matrixStats_1.2.0 vctrs_0.6.5 boot_1.3-28.1
## [45] Matrix_1.6-5 jsonlite_1.8.8 pbkrtest_0.5.2 locfit_1.5-9.8
## [49] jquerylib_0.1.4 tidyr_1.3.1 snow_0.4-4 glue_1.7.0
## [53] nloptr_2.0.3 codetools_0.2-19 stringi_1.8.3 gtable_0.3.4
## [57] EnvStats_2.8.1 lme4_1.1-35.1 lmerTest_3.1-3 munsell_0.5.0
## [61] tibble_3.2.1 remaCor_0.0.18 pillar_1.9.0 htmltools_0.5.7
## [65] gplots_3.1.3.1 R6_2.5.1 Rdpack_2.6 evaluate_0.23
## [69] lattice_0.22-5 Biobase_2.62.0 highr_0.10 rbibutils_2.2.16
## [73] backports_1.4.1 RhpcBLASctl_0.23-42 broom_1.0.5 fANCOVA_0.6-1
## [77] corpcor_1.6.10 bslib_0.6.1 Rcpp_1.0.12 nlme_3.1-164
## [81] xfun_0.42 pkgconfig_2.0.3