--- title: "Model-based continuous summary tables in R" description: > Build model-based summary tables for continuous outcomes in R with table_continuous_lm(), including estimated means, robust standard errors, case weights, and APA-style output formats. output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Model-based continuous summary tables in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true") pkgdown_dark_gt <- function(tab) { tab |> gt::opt_css( css = paste( ".gt_table, .gt_heading, .gt_col_headings, .gt_col_heading,", ".gt_column_spanner_outer, .gt_column_spanner, .gt_title,", ".gt_subtitle, .gt_sourcenotes, .gt_sourcenote {", " background-color: transparent !important;", " color: currentColor !important;", "}", sep = "\n" ) ) } ``` ```{r setup} library(spicy) ``` `table_continuous_lm()` is the model-based companion to `table_continuous()`. It fits one linear model per selected continuous outcome using `lm(outcome ~ by, ...)`, then returns a compact reporting table. This makes it the better choice when you want to stay in a linear-model workflow, add heteroskedasticity-consistent standard errors, or apply case weights. ## Basic usage Use `select` for one or more continuous outcomes and `by` for the single predictor: ```{r basic} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi, life_sat_health), by = sex ) ``` For categorical predictors, the table reports estimated means by level. When the predictor is dichotomous, it can also show a single mean difference and confidence interval. ## Robust standard errors Use `vcov = "HC*"` when you want heteroskedasticity-consistent standard errors and tests: ```{r robust} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, vcov = "HC3", statistic = TRUE ) ``` The `HC*` family is computed via [sandwich::vcovHC()] and includes `"HC0"`, `"HC1"`, `"HC2"`, `"HC3"` (default for small / moderate samples), `"HC4"`, `"HC4m"`, and `"HC5"`. ## Cluster-robust standard errors When observations are not independent (repeated measurements per individual, students nested in classes, panel data), classical and `HC*` standard errors are biased downward. Use the `CR*` family together with `cluster = id_var` to get cluster-robust inference, dispatched to `clubSandwich`: ```{r cluster, eval = requireNamespace("clubSandwich", quietly = TRUE)} table_continuous_lm( sleep, select = extra, by = group, vcov = "CR2", cluster = ID, statistic = TRUE ) ``` `"CR2"` is the recommended default (Bell & McCaffrey 2002; Pustejovsky & Tipton 2018). It produces fractional Satterthwaite degrees of freedom, rendered in the displayed test header as e.g. `t(8.7)` or `F(2, 12.4)`. `"CR1"` matches Stata's `vce(cluster id)` default. ## Bootstrap and jackknife For situations where the residual distribution is non-standard or the sample is small, `vcov = "bootstrap"` and `vcov = "jackknife"` provide resampling-based variance estimators in pure base R (no dependency added): ```{r bootstrap, eval = FALSE} table_continuous_lm( sochealth, select = wellbeing_score, by = sex, vcov = "bootstrap", boot_n = 1000 # default ) ``` When `cluster` is supplied, bootstrap switches to a cluster bootstrap (Cameron, Gelbach & Miller 2008) and jackknife to leave-one-cluster-out (Quenouille 1956). Both estimators use asymptotic inference: `z` for single-coefficient contrasts and `chi^2(q)` for the global Wald test on `k > 2` categorical predictors, rendered in the displayed test header. ## Case weights Use `weights` when you want weighted estimated means or slopes in the same model-based table: ```{r weights} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = education, weights = weight, show_weighted_n = TRUE ) ``` This is often the most natural summary-table function when your reporting workflow already relies on weighted linear models. ## Numeric predictors If `by` is numeric, `table_continuous_lm()` reports the slope rather than group means: ```{r numeric-by} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = age, vcov = "HC3" ) ``` When you need the underlying returned data for further processing, use `output = "data.frame"` for the wide raw table or `output = "long"` for the analytic long table. ## Effect sizes `table_continuous_lm()` supports four effect sizes via the `effect_size` argument. All are derived from the **same fitted model** as the displayed coefficients and `R²`, so the table stays internally consistent — and all are **invariant to `vcov`** (the choice of classical or `HC*` standard errors changes the contrast SE, CI, and test statistic but not the standardized effect size itself). Cohen's *d* and Hedges' *g* are the conventions for two-group comparisons (Cohen 1988; Hedges and Olkin 1985; APA 2020) and require `by` to have exactly two non-empty levels: ```{r es-d} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = smoking, effect_size = "d" ) ``` Hedges' *g* applies the small-sample correction `J = 1 - 3/(4·df_resid - 1)`. It is generally preferred over raw *d* in published reports (Goulet-Pelletier and Cousineau 2018): ```{r es-g} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = smoking, effect_size = "g" ) ``` For categorical predictors with three or more levels (or numeric predictors), use Hays' `omega²` for a less biased estimate of the population variance explained (Hays 1963; Olejnik and Algina 2003; Lakens 2013): ```{r es-omega2} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = education, effect_size = "omega2" ) ``` Cohen's `f²` (= `R² / (1 - R²)`) is the effect size familiar from power-analysis frameworks (e.g. G*Power) and is defined for any predictor type: ```{r es-f2} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = age, effect_size = "f2" ) ``` ## Confidence intervals for effect sizes Setting `effect_size_ci = TRUE` adds a confidence interval for the effect size. The implementation uses the modern noncentral-distribution inversion approach — the consensus standard in commercial statistical software (Stata `esize` / `estat esize`, SAS `PROC TTEST` and `PROC GLM EFFECTSIZE`) and in mainstream R packages (`effectsize`, `MOTE`, `TOSTER`, `effsize`): - Noncentral *t* inversion for `"d"` and `"g"` (Steiger and Fouladi 1997; Goulet-Pelletier and Cousineau 2018; Cousineau and Goulet-Pelletier 2021), which has nominal coverage across sample sizes — unlike the older Hedges-Olkin normal approximation that is biased for small samples. - Noncentral *F* inversion for `"omega2"` and `"f2"` (Steiger 2004; Smithson 2003). In the printed and rendered outputs, the effect size is displayed with the bracketed CI (article-ready): ```{r es-ci} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = smoking, effect_size = "g", effect_size_ci = TRUE ) ``` The CI level follows `ci_level` (default `0.95`). For programmatic access, the wide raw output exposes separate numeric columns, and the long output always includes them: ```{r es-ci-raw} table_continuous_lm( sochealth, select = wellbeing_score, by = smoking, effect_size = "g", effect_size_ci = TRUE, output = "data.frame" ) ``` ```{r es-ci-long} out <- table_continuous_lm( sochealth, select = wellbeing_score, by = smoking, effect_size = "g", effect_size_ci = TRUE, output = "long" ) out[, c("variable", "es_type", "es_value", "es_ci_lower", "es_ci_upper")] ``` When `weights` is supplied, all four effect sizes (and their CIs) are computed from the weighted least-squares fit, keeping them consistent with the weighted contrast and its CI (DuMouchel and Duncan 1983). This is the right convention for case-weighted reporting; for propensity-score balance assessment (Austin and Stuart 2015) or complex-survey designs, dedicated packages (`cobalt::bal.tab()` and `survey`) are more appropriate. ## Article-style polish Pretty outcome labels and a comma decimal separator (useful for European reporting): ```{r polish} table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, labels = c( wellbeing_score = "WHO-5 wellbeing (0-100)", bmi = "Body-mass index (kg/m²)" ), effect_size = "g", effect_size_ci = TRUE, decimal_mark = "," ) ``` ## Publication-ready output The function supports the same output formats as the other summary-table helpers, including `tinytable`, `gt`, `flextable`, `excel`, `word`, and `clipboard`. ```{r gt-output, eval = build_rich_tables} pkgdown_dark_gt( table_continuous_lm( sochealth, select = c(wellbeing_score, bmi, life_sat_health), by = sex, vcov = "HC3", statistic = TRUE, output = "gt" ) ) ``` ## Tidying for downstream pipelines `table_continuous_lm()` returns an object that can be coerced to a plain `data.frame` / `tbl_df` (stripping the spicy formatting attributes) or piped into `broom::tidy()` / `broom::glance()` for use with `gtsummary`, `modelsummary`, `parameters`, or any other tidyverse-stats workflow: ```{r tidy-glance} out <- table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, effect_size = "g", effect_size_ci = TRUE ) # One row per estimated parameter: emmean per level, contrast for # binary predictors, slope for numeric predictors. broom::tidy(out) # One row per outcome with model-level statistics: r.squared, # adj.r.squared, F / t, df, p.value, nobs, weighted_n, plus the # effect-size summary. broom::glance(out) ``` ## See also - See `vignette("table-continuous", package = "spicy")` for descriptive continuous summary tables with classical group-comparison tests. - See `vignette("summary-tables-reporting", package = "spicy")` for a cross-function reporting workflow using the summary-table helpers.