--- title: "Advanced Workflows for High-Level Users" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Advanced Workflows for High-Level Users} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} library(betaregscale) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 10, fig.height = 5, fig.align = "center", warning = FALSE, message = FALSE, digits = 4 ) set.seed(2026) kbl10 <- function(x, n = 10, digits = 4, ...) { knitr::kable(utils::head(as.data.frame(x), n), digits = digits, align = "c", ...) } ``` ## Audience and scope This vignette is designed for analysts who are already familiar with beta regression and require a production-oriented workflow featuring: 1. reproducible model selection; 2. inferential and predictive diagnostics; 3. mixed-effects escalation (`brs` -> `brsmm`) with Likelihood Ratio (LR) comparisons; 4. high-signal reporting tables for technical decision-making. ```{r library} library(betaregscale) ``` ## 1) Reproducible simulation and data checks ```{r data-sim} n <- 260 d <- data.frame( x1 = rnorm(n), x2 = rnorm(n), z1 = rnorm(n) ) sim <- brs_sim( formula = ~ x1 + x2 | z1, data = d, beta = c(0.15, 0.55, -0.30), zeta = c(-0.10, 0.35), link = "logit", link_phi = "logit", ncuts = 100, repar = 2 ) kbl10(head(sim, 10)) kbl10( data.frame( n = nrow(sim), exact = sum(sim$delta == 0), left = sum(sim$delta == 1), right = sum(sim$delta == 2), interval = sum(sim$delta == 3) ), digits = 4 ) ``` ## 2) Fixed-effects candidate set and model ranking ```{r fit-candidates} fit_logit <- brs(y ~ x1 + x2 | z1, data = sim, link = "logit", repar = 2) fit_probit <- brs(y ~ x1 + x2 | z1, data = sim, link = "probit", repar = 2) fit_cauchit <- brs(y ~ x1 + x2 | z1, data = sim, link = "cauchit", repar = 2) tab_rank <- brs_table( logit = fit_logit, probit = fit_probit, cauchit = fit_cauchit ) kbl10(tab_rank) ``` ## 3) Inference stack: Wald + bootstrap + AME ```{r inference-stack} kbl10(brs_est(fit_logit)) kbl10(confint(fit_logit)) boot_tab <- brs_bootstrap(fit_logit, R = 80, level = 0.95) kbl10(head(boot_tab, 10)) set.seed(2026) # For marginal effects simulation ame_mu <- brs_marginaleffects( fit_logit, model = "mean", type = "response", interval = TRUE, n_sim = 120 ) kbl10(ame_mu) ``` ```{r inference-visual} if (requireNamespace("ggplot2", quietly = TRUE)) { boot_tab_bca <- brs_bootstrap( fit_logit, R = 120, level = 0.95, ci_type = "bca", keep_draws = TRUE ) autoplot.brs_bootstrap( boot_tab_bca, type = "ci_forest", title = "Bootstrap (BCa) vs Wald intervals" ) set.seed(2026) # For marginal effects simulation ame_mu_draws <- brs_marginaleffects( fit_logit, model = "mean", type = "response", interval = TRUE, n_sim = 160, keep_draws = TRUE, ) autoplot.brs_marginaleffects(ame_mu_draws, type = "forest") } ``` ## 4) Prediction layer on analyst scale ```{r prediction-layer} score_prob <- brs_predict_scoreprob(fit_logit, scores = 0:10) kbl10(score_prob[1:8, 1:7]) ``` ## 5) Out-of-sample validation ```{r cv-layer} set.seed(2026) # For cross-validation reproducibility cv_tab <- brs_cv( y ~ x1 + x2 | z1, data = sim, k = 5, repeats = 5, repar = 2 ) kbl10(head(cv_tab, 10)) ``` ## 6) Escalation to mixed models ### 6.1 Simulate clustered data with random intercept + slope ```{r mixed-data} g <- 10 ni <- 120 id <- factor(rep(seq_len(g), each = ni)) n_mm <- length(id) x1 <- rnorm(n_mm) x2 <- rbinom(n_mm, size = 1, prob = 1 / 2) b0 <- rnorm(g, sd = 0.40) b1 <- rnorm(g, sd = 0.22) eta_mu <- 0.20 + 0.65 * x1 - 0.30 * x2 + b0[id] + b1[id] * x1 eta_phi <- rep(-0.20, n_mm) mu <- plogis(eta_mu) phi <- plogis(eta_phi) shp <- brs_repar(mu = mu, phi = phi, repar = 2) y <- round(stats::rbeta(n_mm, shp$shape1, shp$shape2) * 100) dmm <- data.frame(y = y, id = id, x1 = x1, x2 = x2) kbl10(head(dmm, 10)) ``` ### 6.2 Fit evolutionary sequence ```{r mixed-fit} fit_brs <- brs(y ~ x1 + x2, data = dmm, repar = 2) fit_ri <- brsmm(y ~ x1 + x2, random = ~ 1 | id, data = dmm, repar = 2) fit_rs <- brsmm(y ~ x1 + x2, random = ~ 1 + x1 | id, data = dmm, repar = 2) tab_lr <- anova(fit_brs, fit_ri, fit_rs, test = "Chisq") kbl10(data.frame(model = rownames(tab_lr), tab_lr, row.names = NULL)) ``` ### 6.3 Model choice by LLR/LRT (ANOVA) The `anova()` methods provide a practical Likelihood Ratio Test (LLR) workflow: * `M0 = brs` (no random effects); * `M1 = brsmm` with random intercept (`~ 1 | id`); * `M2 = brsmm` with random intercept + slope (`~ 1 + x1 | id`). In nested comparisons, the test statistic is: $$LR=2\{\ell(\hat\theta_{\text{complex}})-\ell(\hat\theta_{\text{simple}})\}$$ For the first step (`M0 -> M1`), the null hypothesis involves variance components located at the boundary of the parameter space ($\sigma_b^2=0$); therefore, p-values should be interpreted with caution. For the `M1 -> M2` step, the chi-square approximation is robust and often used as a practical decision aid. ```{r llr-decision} tab_lr_df <- data.frame(model = rownames(tab_lr), tab_lr, row.names = NULL) tab_lr_df$decision <- c( "baseline", ifelse(is.na(tab_lr_df$`Pr(>Chisq)`[2]), "inspect AIC/BIC + diagnostics", ifelse(tab_lr_df$`Pr(>Chisq)`[2] < 0.05, "prefer M1 over M0", "prefer M0 (parsimony)") ), ifelse(is.na(tab_lr_df$`Pr(>Chisq)`[3]), "inspect AIC/BIC + diagnostics", ifelse(tab_lr_df$`Pr(>Chisq)`[3] < 0.05, "prefer M2 over M1", "prefer M1 (parsimony)") ) ) kbl10(tab_lr_df) ``` ## 7) Random-effects study (numeric + visual) ```{r ranef-study} rs <- brsmm_re_study(fit_rs) print(rs) kbl10(rs$summary) kbl10(rs$D) kbl10(rs$Corr) ``` ```{r ranef-plots} if (requireNamespace("ggplot2", quietly = TRUE)) { autoplot.brsmm(fit_rs, type = "ranef_caterpillar") autoplot.brsmm(fit_rs, type = "ranef_density") autoplot.brsmm(fit_rs, type = "ranef_pairs") autoplot.brsmm(fit_rs, type = "ranef_qq") } ``` ## 8) Practical decision checklist * Start with `brs` candidates (`link`, `repar`) and rank them using `brs_table()`. * Add bootstrap analysis and Average Marginal Effects (AME) before escalating complexity. * Use `brs_cv()` for out-of-sample stability checks. * Escalate to `brsmm` only when LR/AIC/BIC metrics and diagnostics clearly support it. * For random slopes, always inspect `brsmm_re_study()` and the associated random-effects plots. ## References Ferrari, S. L. P. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. *Journal of Applied Statistics*, 31(7), 799-815. DOI: 10.1080/0266476042000214501. Validated online via: . Smithson, M., and Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. *Psychological Methods*, 11(1), 54-71. DOI: 10.1037/1082-989X.11.1.54. Validated online via: . Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. *Journal of the Royal Statistical Society: Series B*, 71(2), 319-392. DOI: 10.1111/j.1467-9868.2008.00700.x. Validated online via: .