This package implements alternative inference methods for BASiCS. The original package uses adaptive Metropolis within Gibbs sampling, while BASiCStan uses Stan to enable the use of maximum a posteriori estimation, variational inference, and Hamiltonian Monte Carlo. These each have advantages for different use cases.
To use BASiCStan
, we can first use BASICS_MockSCE()
to generate an toy example dataset. We will also set a seed for reproducibility.
suppressPackageStartupMessages(library("BASiCStan"))
set.seed(42)
BASiCS_MockSCE() sce <-
The interface for running MCMC using BASiCS and using alternative inference methods using Stan is very similar. It is worth noting that the joint prior between mean and over-dispersion parameters, corresponding to Regression = TRUE
, is the only supported mode in BASiCStan()
.
BASiCS_MCMC(
amwg_fit <-
sce,N = 200,
Thin = 10,
Burn = 100,
Regression = TRUE
)#> altExp 'spike-ins' is assumed to contain spike-in genes.
#> see help(altExp) for details.
#> Running with spikes BASiCS sampler (regression case) ...
#> -------------------------------------------------------------
#> MCMC running time
#> -------------------------------------------------------------
#> user: 0.347
#> system: 0
#> elapsed: 0.347
#> -------------------------------------------------------------
#> Output
#> -------------------------------------------------------------
#> -------------------------------------------------------------
#> BASiCS version 2.18.0 :
#> vertical integration (spikes case)
#> -------------------------------------------------------------
BASiCStan(sce, Method = "sampling", iter = 10)
stan_fit <-#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
The output of BASiCStan()
is an object of class BASiCS_Chain
, similar to the output of BASiCS_MCMC()
. Therefore, you could use these as you would an object generated using Metropolis within Gibbs sampling. For example, we can plot the relationship between mean and over-dispersion estimated using the joint regression prior:
BASiCS_ShowFit(amwg_fit)
BASiCS_ShowFit(stan_fit)
Using Stan has advantages outside of the variety of inference engines available. By returning a Stan object that we can later convert to a BASiCS_Chain
object, we can leverage an even broader set of functionality. For example, Stan has the ability to easily generate MCMC diagnostics using simple functions. For example, summary()
outputs a number of useful per-chain and across-chain diagnostics:
BASiCStan(sce, ReturnBASiCS = FALSE, Method = "sampling", iter = 10)
stan_obj <-#>
#> SAMPLING FOR MODEL 'basics_regression' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.003062 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 30.62 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: WARNING: No variance estimation is
#> Chain 1: performed for num_warmup < 20
#> Chain 1:
#> Chain 1: Iteration: 1 / 10 [ 10%] (Warmup)
#> Chain 1: Iteration: 2 / 10 [ 20%] (Warmup)
#> Chain 1: Iteration: 3 / 10 [ 30%] (Warmup)
#> Chain 1: Iteration: 4 / 10 [ 40%] (Warmup)
#> Chain 1: Iteration: 5 / 10 [ 50%] (Warmup)
#> Chain 1: Iteration: 6 / 10 [ 60%] (Sampling)
#> Chain 1: Iteration: 7 / 10 [ 70%] (Sampling)
#> Chain 1: Iteration: 8 / 10 [ 80%] (Sampling)
#> Chain 1: Iteration: 9 / 10 [ 90%] (Sampling)
#> Chain 1: Iteration: 10 / 10 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.046 seconds (Warm-up)
#> Chain 1: 0.028 seconds (Sampling)
#> Chain 1: 0.074 seconds (Total)
#> Chain 1:
#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
head(summary(stan_obj)$summary)
#> mean se_mean sd 2.5% 25% 50% 75% 97.5%
#> log_mu[1] 3.098420 NaN 0 3.098420 3.098420 3.098420 3.098420 3.098420
#> log_mu[2] 1.058488 NaN 0 1.058488 1.058488 1.058488 1.058488 1.058488
#> log_mu[3] 2.107672 NaN 0 2.107672 2.107672 2.107672 2.107672 2.107672
#> log_mu[4] 2.245089 NaN 0 2.245089 2.245089 2.245089 2.245089 2.245089
#> log_mu[5] 2.002693 NaN 0 2.002693 2.002693 2.002693 2.002693 2.002693
#> log_mu[6] 1.457520 NaN 0 1.457520 1.457520 1.457520 1.457520 1.457520
#> n_eff Rhat
#> log_mu[1] NaN NaN
#> log_mu[2] NaN NaN
#> log_mu[3] NaN NaN
#> log_mu[4] NaN NaN
#> log_mu[5] NaN NaN
#> log_mu[6] NaN NaN
We can then convert this object to a BASiCS_Chain
and carry on a workflow as before:
Stan2BASiCS(stan_obj)
#> An object of class BASiCS_Chain
#> 5 MCMC samples.
#> Dataset contains 100 biological genes and 100 cells (2 batches).
#> Object stored using BASiCS version: 2.18.0
#> Parameters: mu delta s nu theta epsilon phi beta
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BASiCStan_1.8.0 rstan_2.32.6
#> [3] StanHeaders_2.32.10 BASiCS_2.18.0
#> [5] SingleCellExperiment_1.28.0 SummarizedExperiment_1.36.0
#> [7] Biobase_2.66.0 GenomicRanges_1.58.0
#> [9] GenomeInfoDb_1.42.0 IRanges_2.40.0
#> [11] S4Vectors_0.44.0 BiocGenerics_0.52.0
#> [13] MatrixGenerics_1.18.0 matrixStats_1.4.1
#>
#> loaded via a namespace (and not attached):
#> [1] gridExtra_2.3 inline_0.3.19
#> [3] rlang_1.1.4 magrittr_2.0.3
#> [5] compiler_4.4.1 DelayedMatrixStats_1.28.0
#> [7] loo_2.8.0 vctrs_0.6.5
#> [9] pkgconfig_2.0.3 crayon_1.5.3
#> [11] fastmap_1.2.0 backports_1.5.0
#> [13] XVector_0.46.0 labeling_0.4.3
#> [15] scuttle_1.16.0 utf8_1.2.4
#> [17] promises_1.3.0 rmarkdown_2.28
#> [19] UCSC.utils_1.2.0 xfun_0.48
#> [21] bluster_1.16.0 zlibbioc_1.52.0
#> [23] cachem_1.1.0 beachmat_2.22.0
#> [25] jsonlite_1.8.9 highr_0.11
#> [27] later_1.3.2 DelayedArray_0.32.0
#> [29] BiocParallel_1.40.0 irlba_2.3.5.1
#> [31] parallel_4.4.1 cluster_2.1.6
#> [33] R6_2.5.1 bslib_0.8.0
#> [35] limma_3.62.0 jquerylib_0.1.4
#> [37] Rcpp_1.0.13 assertthat_0.2.1
#> [39] knitr_1.48 splines_4.4.1
#> [41] httpuv_1.6.15 Matrix_1.7-1
#> [43] igraph_2.1.1 tidyselect_1.2.1
#> [45] abind_1.4-8 yaml_2.3.10
#> [47] viridis_0.6.5 codetools_0.2-20
#> [49] miniUI_0.1.1.1 curl_5.2.3
#> [51] pkgbuild_1.4.5 lattice_0.22-6
#> [53] tibble_3.2.1 withr_3.0.2
#> [55] shiny_1.9.1 posterior_1.6.0
#> [57] coda_0.19-4.1 evaluate_1.0.1
#> [59] RcppParallel_5.1.9 pillar_1.9.0
#> [61] tensorA_0.36.2.1 checkmate_2.3.2
#> [63] distributional_0.5.0 generics_0.1.3
#> [65] ggplot2_3.5.1 sparseMatrixStats_1.18.0
#> [67] rstantools_2.4.0 munsell_0.5.1
#> [69] scales_1.3.0 xtable_1.8-4
#> [71] glue_1.8.0 metapod_1.14.0
#> [73] tools_4.4.1 hexbin_1.28.4
#> [75] BiocNeighbors_2.0.0 ScaledMatrix_1.14.0
#> [77] locfit_1.5-9.10 scran_1.34.0
#> [79] cowplot_1.1.3 grid_4.4.1
#> [81] QuickJSR_1.4.0 edgeR_4.4.0
#> [83] colorspace_2.1-1 GenomeInfoDbData_1.2.13
#> [85] BiocSingular_1.22.0 cli_3.6.3
#> [87] rsvd_1.0.5 fansi_1.0.6
#> [89] S4Arrays_1.6.0 viridisLite_0.4.2
#> [91] dplyr_1.1.4 V8_6.0.0
#> [93] glmGamPoi_1.18.0 gtable_0.3.6
#> [95] sass_0.4.9 digest_0.6.37
#> [97] SparseArray_1.6.0 dqrng_0.4.1
#> [99] farver_2.1.2 htmltools_0.5.8.1
#> [101] lifecycle_1.0.4 httr_1.4.7
#> [103] statmod_1.5.0 mime_0.12
#> [105] ggExtra_0.10.1 MASS_7.3-61