--- title: "Categorical summary tables in R" description: > Build categorical summary tables in R with table_categorical(), including grouped cross-tabulations, effect sizes, confidence intervals, and export to gt, tinytable, flextable, Excel, or Word. output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Categorical summary tables in R} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true") pkgdown_dark_gt <- function(tab) { tab |> gt::opt_css( css = paste( ".gt_table, .gt_heading, .gt_col_headings, .gt_col_heading,", ".gt_column_spanner_outer, .gt_column_spanner, .gt_title,", ".gt_subtitle, .gt_sourcenotes, .gt_sourcenote {", " background-color: transparent !important;", " color: currentColor !important;", "}", sep = "\n" ) ) } ``` ```{r setup} library(spicy) ``` `table_categorical()` builds publication-ready categorical tables suitable for APA-style reporting in social science and data science research. With `by`, it produces grouped cross-tabulation tables with chi-squared \(p\)-values, effect sizes, confidence intervals, and multi-level headers. Without `by`, it produces one-way frequency-style tables for the selected variables. Export to gt, tinytable, flextable, Excel, or Word. This vignette walks through the main features. ## Basic usage For grouped tables, provide a data frame, one or more selected variables, and a grouping variable: ```{r basic} table_categorical( sochealth, select = c(smoking, physical_activity, dentist_12m), by = education ) ``` The default output is `"default"`, which prints a styled ASCII table to the console. Use `output = "data.frame"` to get a plain numeric data frame suitable for further processing. ## One-way tables Omit `by` to build a frequency-style table for the selected variables: ```{r oneway} table_categorical( sochealth, select = c(smoking, physical_activity), output = "default" ) ``` ## Output formats `table_categorical()` supports several output formats. The table below summarizes the options: | Format | Description | |---|---| | `"default"` | Styled ASCII table in the console (default) | | `"data.frame"` | Wide data frame, one row per modality | | `"long"` | Long data frame, one row per modality x group | | `"gt"` | Formatted gt table | | `"tinytable"` | Formatted tinytable | | `"flextable"` | Formatted flextable | | `"excel"` | Excel file (requires `excel_path`) | | `"clipboard"` | Copy to clipboard | | `"word"` | Word document (requires `word_path`) | ### gt output The `"gt"` format produces a table with APA-style borders, column spanners, and proper alignment: ```{r gt, eval = build_rich_tables} pkgdown_dark_gt( table_categorical( sochealth, select = c(smoking, physical_activity, dentist_12m), by = education, output = "gt" ) ) ``` ### tinytable output ```{r tinytable, eval = build_rich_tables} table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "tinytable" ) ``` ### Data frame output Use `output = "data.frame"` for a wide numeric data frame (one row per modality), or `output = "long"` for a long format (one row per modality x group): ```{r data-frame} table_categorical( sochealth, select = smoking, by = education, output = "data.frame" ) ``` ## Custom labels By default, `table_categorical()` uses variable names as row headers. Use the `labels` argument to provide human-readable labels. Two forms are accepted (matching `table_continuous()` and `table_continuous_lm()`): - A **named character vector** keyed by column name in `data` -- the recommended form. Only listed columns are relabelled; others fall back to the column name. - A **positional character vector** of the same length as `select` -- the legacy spicy < 0.11.0 form, kept for backward compatibility. ```{r labels, eval = build_rich_tables} pkgdown_dark_gt( table_categorical( sochealth, select = c(smoking, physical_activity), by = education, labels = c( smoking = "Smoking status", physical_activity = "Regular physical activity" ), output = "gt" ) ) ``` ## Association measures and confidence intervals `table_categorical()` picks the association measure per row variable based on the variable type (`assoc_measure = "auto"`, the default): * **2x2** (binary row variable vs. binary `by`) -> `phi`, * both ordered factors -> Kendall's `tau_b`, * otherwise -> Cramer's `V`. When the chosen measures differ across rows, the column header collapses to `"Effect size"` and an APA-style `Note.` line documents which measure was used for each variable. Override with a single string for uniform application, or with a named vector to mix measures per row: ```{r assoc-measure, eval = build_rich_tables} # Uniform: same measure for every row variable table_categorical( sochealth, select = smoking, by = education, assoc_measure = "lambda", output = "tinytable" ) ``` ```{r assoc-measure-named, eval = build_rich_tables} # Per-row: pick the right measure for each variable. # `smoking` x `education` is 2x3 (binary x ordered) -> Cramer's V; # `self_rated_health` x `education` is ordered x ordered -> Tau-b. # The mixed result collapses the header to "Effect size" and adds an # APA `Note.` line documenting the per-row measure. table_categorical( sochealth, select = c(smoking, self_rated_health), by = education, assoc_measure = c( smoking = "cramer_v", self_rated_health = "tau_b" ), output = "tinytable" ) ``` Add confidence intervals with `assoc_ci = TRUE`. In rendered formats (gt, tinytable, flextable), the CI is shown inline: ```{r ci-rendered, eval = build_rich_tables} pkgdown_dark_gt( table_categorical( sochealth, select = c(smoking, physical_activity), by = education, assoc_ci = TRUE, output = "gt" ) ) ``` In data formats (`"data.frame"`, `"long"`, `"excel"`, `"clipboard"`), separate `CI lower` and `CI upper` columns are added: ```{r ci-data} table_categorical( sochealth, select = smoking, by = education, assoc_ci = TRUE, output = "data.frame" ) ``` ## Weighted tables Pass survey weights with the `weights` argument. Use `rescale = TRUE` so the total weighted N matches the unweighted N: ```{r weighted, eval = build_rich_tables} pkgdown_dark_gt( table_categorical( sochealth, select = c(smoking, physical_activity), by = education, weights = "weight", rescale = TRUE, output = "gt" ) ) ``` ## Handling missing values By default, rows with missing values are dropped (`drop_na = TRUE`). Set `drop_na = FALSE` to display them as a "(Missing)" category: ```{r missing, eval = build_rich_tables} pkgdown_dark_gt( table_categorical( sochealth, select = income_group, by = education, drop_na = FALSE, output = "gt" ) ) ``` ## Filtering and reordering levels Use `levels_keep` to display only specific modalities. The order you specify controls the display order, which is useful for placing "(Missing)" first to highlight missingness: ```{r levels-keep, eval = build_rich_tables} pkgdown_dark_gt( table_categorical( sochealth, select = income_group, by = education, drop_na = FALSE, levels_keep = c("(Missing)", "Low", "High"), output = "gt" ) ) ``` ## Formatting options Control the number of digits for percentages, p-values, and the association measure: ```{r formatting, eval = build_rich_tables} pkgdown_dark_gt( table_categorical( sochealth, select = smoking, by = education, percent_digits = 2, p_digits = 4, v_digits = 3, output = "gt" ) ) ``` `p_digits` drives both the displayed precision of the `p` column and the small-*p* threshold (`p_digits = 3` -> `<.001`, `p_digits = 4` -> `<.0001`), matching `table_continuous()` and `table_continuous_lm()`. ## Decimal alignment By default (`align = "decimal"`) numeric columns are aligned on the decimal mark, the standard scientific-publication convention used by SPSS, SAS, LaTeX `siunitx`, and the native primitives of `gt::cols_align_decimal()` and `tinytable::style_tt(align = "d")`. Engines without a native primitive (`flextable`, `word`, `clipboard`, ASCII print) get the alignment via leading / trailing space padding, with `flextable` / `word` switching the body font to `Consolas` so character widths match. Pass `align = "auto"` to revert to the legacy uniform right-alignment used in spicy < 0.11.0: ```{r align} table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, align = "auto" ) ``` `"center"` and `"right"` apply literal alignment. ## Tidying for downstream pipelines `table_categorical()` returns an object that can be coerced to a plain `data.frame` / `tbl_df` (stripping the spicy formatting attributes) or piped into `broom::tidy()` / `broom::glance()` for use with `gtsummary`, `modelsummary`, `parameters`, or any other tidyverse-stats workflow: ```{r tidy-glance} out <- table_categorical( sochealth, select = c(smoking, physical_activity), by = sex ) # One row per (variable x level x group) with broom-style columns # (outcome, level, group, n, proportion). The synthetic Total # margin is excluded so each observation is counted once. broom::tidy(out) # One row per outcome with the omnibus chi-squared test and the # chosen association measure (test_type, statistic, df, p.value, # assoc_type, assoc_value, assoc_ci_lower / assoc_ci_upper, n_total). broom::glance(out) ``` ## Exporting to Excel, Word, or clipboard For Excel export, provide a file path: ```r table_categorical( sochealth, select = c(smoking, physical_activity, dentist_12m), by = education, output = "excel", excel_path = "my_table.xlsx" ) ``` For Word, use `output = "word"`: ```r table_categorical( sochealth, select = c(smoking, physical_activity, dentist_12m), by = education, output = "word", word_path = "my_table.docx" ) ``` You can also copy directly to the clipboard for pasting into a spreadsheet or a text editor: ```r table_categorical( sochealth, select = c(smoking, physical_activity), by = education, output = "clipboard" ) ```