This package implements alternative inference methods for BASiCS. The original package uses adaptive Metropolis within Gibbs sampling, while BASiCStan uses Stan to enable the use of maximum a posteriori estimation, variational inference, and Hamiltonian Monte Carlo. These each have advantages for different use cases.
To use BASiCStan
, we can first use BASICS_MockSCE()
to generate an toy example dataset. We will also set a seed for reproducibility.
The interface for running MCMC using BASiCS and using alternative inference methods using Stan is very similar. It is worth noting that the joint prior between mean and over-dispersion parameters, corresponding to Regression = TRUE
, is the only supported mode in BASiCStan()
.
amwg_fit <- BASiCS_MCMC(
sce,
N = 200,
Thin = 10,
Burn = 100,
Regression = TRUE
)
#> altExp 'spike-ins' is assumed to contain spike-in genes.
#> see help(altExp) for details.
#> Running with spikes BASiCS sampler (regression case) ...
#> -------------------------------------------------------------
#> MCMC running time
#> -------------------------------------------------------------
#> user: 0.342
#> system: 0
#> elapsed: 0.343
#> -------------------------------------------------------------
#> Output
#> -------------------------------------------------------------
#> -------------------------------------------------------------
#> BASiCS version 2.12.0 :
#> vertical integration (spikes case)
#> -------------------------------------------------------------
stan_fit <- BASiCStan(sce, Method = "sampling", iter = 10)
#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
The output of BASiCStan()
is an object of class BASiCS_Chain
, similar to the output of BASiCS_MCMC()
. Therefore, you could use these as you would an object generated using Metropolis within Gibbs sampling. For example, we can plot the relationship between mean and over-dispersion estimated using the joint regression prior:
Using Stan has advantages outside of the variety of inference engines available. By returning a Stan object that we can later convert to a BASiCS_Chain
object, we can leverage an even broader set of functionality. For example, Stan has the ability to easily generate MCMC diagnostics using simple functions. For example, summary()
outputs a number of useful per-chain and across-chain diagnostics:
stan_obj <- BASiCStan(sce, ReturnBASiCS = FALSE, Method = "sampling", iter = 10)
#>
#> SAMPLING FOR MODEL 'basics_regression' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.004508 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 45.08 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: WARNING: No variance estimation is
#> Chain 1: performed for num_warmup < 20
#> Chain 1:
#> Chain 1: Iteration: 1 / 10 [ 10%] (Warmup)
#> Chain 1: Iteration: 2 / 10 [ 20%] (Warmup)
#> Chain 1: Iteration: 3 / 10 [ 30%] (Warmup)
#> Chain 1: Iteration: 4 / 10 [ 40%] (Warmup)
#> Chain 1: Iteration: 5 / 10 [ 50%] (Warmup)
#> Chain 1: Iteration: 6 / 10 [ 60%] (Sampling)
#> Chain 1: Iteration: 7 / 10 [ 70%] (Sampling)
#> Chain 1: Iteration: 8 / 10 [ 80%] (Sampling)
#> Chain 1: Iteration: 9 / 10 [ 90%] (Sampling)
#> Chain 1: Iteration: 10 / 10 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.063144 seconds (Warm-up)
#> Chain 1: 0.039343 seconds (Sampling)
#> Chain 1: 0.102487 seconds (Total)
#> Chain 1:
#> Warning: There were 5 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
head(summary(stan_obj)$summary)
#> mean se_mean sd 2.5% 25% 50% 75% 97.5%
#> log_mu[1] 3.098420 NaN 0 3.098420 3.098420 3.098420 3.098420 3.098420
#> log_mu[2] 1.058488 NaN 0 1.058488 1.058488 1.058488 1.058488 1.058488
#> log_mu[3] 2.107672 NaN 0 2.107672 2.107672 2.107672 2.107672 2.107672
#> log_mu[4] 2.245089 NaN 0 2.245089 2.245089 2.245089 2.245089 2.245089
#> log_mu[5] 2.002693 NaN 0 2.002693 2.002693 2.002693 2.002693 2.002693
#> log_mu[6] 1.457520 NaN 0 1.457520 1.457520 1.457520 1.457520 1.457520
#> n_eff Rhat
#> log_mu[1] NaN NaN
#> log_mu[2] NaN NaN
#> log_mu[3] NaN NaN
#> log_mu[4] NaN NaN
#> log_mu[5] NaN NaN
#> log_mu[6] NaN NaN
We can then convert this object to a BASiCS_Chain
and carry on a workflow as before:
sessionInfo()
#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] BASiCStan_1.2.0 rstan_2.21.8
#> [3] ggplot2_3.4.2 StanHeaders_2.21.0-7
#> [5] BASiCS_2.12.0 SingleCellExperiment_1.22.0
#> [7] SummarizedExperiment_1.30.0 Biobase_2.60.0
#> [9] GenomicRanges_1.52.0 GenomeInfoDb_1.36.0
#> [11] IRanges_2.34.0 S4Vectors_0.38.0
#> [13] BiocGenerics_0.46.0 MatrixGenerics_1.12.0
#> [15] matrixStats_0.63.0
#>
#> loaded via a namespace (and not attached):
#> [1] bitops_1.0-7 gridExtra_2.3
#> [3] inline_0.3.19 rlang_1.1.0
#> [5] magrittr_2.0.3 compiler_4.3.0
#> [7] loo_2.6.0 DelayedMatrixStats_1.22.0
#> [9] callr_3.7.3 vctrs_0.6.2
#> [11] crayon_1.5.2 pkgconfig_2.0.3
#> [13] fastmap_1.1.1 backports_1.4.1
#> [15] XVector_0.40.0 ellipsis_0.3.2
#> [17] labeling_0.4.2 scuttle_1.10.0
#> [19] utf8_1.2.3 promises_1.2.0.1
#> [21] rmarkdown_2.21 ps_1.7.5
#> [23] xfun_0.39 bluster_1.10.0
#> [25] zlibbioc_1.46.0 cachem_1.0.7
#> [27] beachmat_2.16.0 jsonlite_1.8.4
#> [29] highr_0.10 later_1.3.0
#> [31] DelayedArray_0.26.0 BiocParallel_1.34.0
#> [33] prettyunits_1.1.1 irlba_2.3.5.1
#> [35] parallel_4.3.0 cluster_2.1.4
#> [37] R6_2.5.1 bslib_0.4.2
#> [39] limma_3.56.0 jquerylib_0.1.4
#> [41] Rcpp_1.0.10 assertthat_0.2.1
#> [43] knitr_1.42 splines_4.3.0
#> [45] httpuv_1.6.9 Matrix_1.5-4
#> [47] igraph_1.4.2 tidyselect_1.2.0
#> [49] abind_1.4-5 yaml_2.3.7
#> [51] viridis_0.6.2 codetools_0.2-19
#> [53] miniUI_0.1.1.1 processx_3.8.1
#> [55] pkgbuild_1.4.0 lattice_0.21-8
#> [57] tibble_3.2.1 withr_2.5.0
#> [59] shiny_1.7.4 posterior_1.4.1
#> [61] coda_0.19-4 evaluate_0.20
#> [63] RcppParallel_5.1.7 pillar_1.9.0
#> [65] tensorA_0.36.2 checkmate_2.1.0
#> [67] distributional_0.3.2 generics_0.1.3
#> [69] RCurl_1.98-1.12 rstantools_2.3.1
#> [71] sparseMatrixStats_1.12.0 munsell_0.5.0
#> [73] scales_1.2.1 xtable_1.8-4
#> [75] glue_1.6.2 metapod_1.8.0
#> [77] tools_4.3.0 hexbin_1.28.3
#> [79] BiocNeighbors_1.18.0 ScaledMatrix_1.8.0
#> [81] locfit_1.5-9.7 scran_1.28.0
#> [83] cowplot_1.1.1 grid_4.3.0
#> [85] edgeR_3.42.0 colorspace_2.1-0
#> [87] GenomeInfoDbData_1.2.10 BiocSingular_1.16.0
#> [89] cli_3.6.1 rsvd_1.0.5
#> [91] fansi_1.0.4 viridisLite_0.4.1
#> [93] dplyr_1.1.2 glmGamPoi_1.12.0
#> [95] gtable_0.3.3 sass_0.4.5
#> [97] digest_0.6.31 dqrng_0.3.0
#> [99] farver_2.1.1 htmltools_0.5.5
#> [101] lifecycle_1.0.3 statmod_1.5.0
#> [103] mime_0.12 ggExtra_0.10.0
#> [105] MASS_7.3-59