Single cell RNA sequencing (scRNA-seq) analysis can be a challenging endeavor whose predominant approach involves clustering cells for the purposes of annotation and characterization. Signatures represent a gene set encompassing a biological process or cell type that can be used to analyze scRNA-seq data directly without clustering. Seqtometry makes use of signature scores to progressively partition data according to biaxial plots, revealing populations in a manner akin to flow cytometry. This vignette contains a simple example of a Seqtometry workflow involving peripheral blood mononuclear cell (PBMC) data. Herein, 4 immune cell populations will be identified using corresponding signatures. For more advanced workflows, please see the STAR Protocols publication.
To install this package, start R and enter:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Seqtometry")
Load a small PBMC dataset and associated signatures.
# Cleanly add functions from various packages into the current namespace
box::use(
TENxPBMCData[TENxPBMCData],
SingleCellExperiment[counts, rowData],
data.table[fread],
tibble[deframe],
stringr[str_split_1])
# Gene expression data (with gene symbols)
dat <- TENxPBMCData(dataset = "pbmc3k")
gex <- counts(dat) |> as("dgCMatrix")
dimnames(gex) <- list(rowData(dat)[["Symbol_TENx"]], dat[["Barcode"]])
# Signatures (consisting of gene symbols)
sig <- fread("signatures.tsv") |>
deframe() |>
lapply(\(x) stringr::str_split_1(x, ","))
Perform quality control, normalization, and MAGIC imputation (van Dijk et al. 2018).
box::use(
scuttle[perCellQCMetrics, perCellQCFilters, normalizeCounts],
Seqtometry[impute])
qc <- function(mat) {
# Indices of mitochondrial genes
mito_idx <- grep("^MT-", rownames(mat))
# Calculate QC metrics
res <- mat |>
perCellQCMetrics(subsets = list("Mito" = mito_idx)) |>
perCellQCFilters(sub.fields = "subsets_Mito_percent")
# Remove low quality cells
mat[, !res$discard]
}
imputed_gex <- gex |>
qc() |>
normalizeCounts() |>
impute()
Score the data for the T cell and B cell signatures.
box::use(
future[plan],
Seqtometry[score])
# May set up parallel backend if it is supported
# Will default to non-parallel (sequential) execution
plan("sequential")
# Allot 1 GiB of memory for global variables
options(future.globals.maxSize = 1024 ^ 3)
# Scoring with lymphocyte signatures
scores <- sig[c("T cell", "B cell")] |>
score(imputed_gex, signatures = _)
Create a plot from the signature scores.
box::use(ggplot2[...])
# A flow cytometry style theme for plots
theme_custom <- theme_bw() + theme(
aspect.ratio = 1,
panel.grid = element_blank(),
panel.background = element_blank(),
panel.border = element_rect(color = "black", fill = NA, linewidth = 2),
axis.title = element_text(size = 32, face = "italic"),
axis.text = element_text(size = 16))
# Utility function for generating flow cytometry style plots
biaxial_plot <- function(d, x, y) {
lim <- c(-0.15, 1.15)
brk <- seq(0, 1, 0.25)
lbl <- c("0", "0.25", "0.5", "0.75", "1")
ggplot(d, aes(x = .data[[x]], y = .data[[y]])) +
geom_density_2d(color = "black", h = c(0.2, 0.2),
contour_var = "ndensity", binwidth = 0.125) +
scale_x_continuous(limits = lim, breaks = brk, labels = lbl) +
scale_y_continuous(limits = lim, breaks = brk, labels = lbl) +
theme_custom
}
# Utility function to draw a rectangular region
rect_region <- function(lims) {
do.call(annotate, c(lims, geom = "rect",
color = "blue", fill = NA, linewidth = 2))
}
# Limits defining a rectangular region for double negative cells
dn_lims <- list(xmin = -0.075, xmax = 0.5, ymin = 0, ymax = 0.5)
# Draw the biaxial plot with the rectangular region
biaxial_plot(scores, "T cell", "B cell") + rect_region(dn_lims)
Extract (gate) the cells negative for both signatures. Then, rescore the gated cells with the Monocyte and NK cell signatures and visualize the results.
# Subset the imputed data, keeping only the cells in the blue region
gated_imputed_gex <- with(scores,
imputed_gex[, `T cell` < dn_lims$xmax & `B cell` < dn_lims$ymax])
# Score the subsetted data with the remaining signatures
gated_scores <- sig[c("NK cell", "Monocyte")] |>
Seqtometry::score(gated_imputed_gex, signatures = _)
# Plot the new scores
biaxial_plot(gated_scores, "Monocyte", "NK cell")
The above steps demonstrate how plots constructed from signature scores (using the Seqtometry scoring algorithm) can be used to identify major PBMC populations in a hierarchical fashion (in a manner akin to flow cytometry).
Additional characterization of cells can be accomplished by employing signatures corresponding to biological processes of interest. Please see the Seqtometry publication (Cell Systems) for examples.
sessionInfo()
#> R Under development (unstable) (2026-03-05 r89546)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] future_1.70.0 TENxPBMCData_1.29.0
#> [3] HDF5Array_1.39.0 h5mread_1.3.2
#> [5] rhdf5_2.55.16 DelayedArray_0.37.0
#> [7] SparseArray_1.11.11 S4Arrays_1.11.1
#> [9] abind_1.4-8 Matrix_1.7-5
#> [11] SingleCellExperiment_1.33.2 SummarizedExperiment_1.41.1
#> [13] Biobase_2.71.0 GenomicRanges_1.63.1
#> [15] Seqinfo_1.1.0 IRanges_2.45.0
#> [17] S4Vectors_0.49.0 BiocGenerics_0.57.0
#> [19] generics_0.1.4 MatrixGenerics_1.23.0
#> [21] matrixStats_1.5.0 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.3.0 httr2_1.2.2 rlang_1.1.7
#> [4] magrittr_2.0.4 otel_0.2.0 compiler_4.6.0
#> [7] RSQLite_2.4.6 png_0.1-9 vctrs_0.7.1
#> [10] box_1.2.1 stringr_1.6.0 pkgconfig_2.0.3
#> [13] crayon_1.5.3 fastmap_1.2.0 magick_2.9.1
#> [16] backports_1.5.0 dbplyr_2.5.2 XVector_0.51.0
#> [19] scuttle_1.21.0 Seqtometry_0.99.3 rmarkdown_2.31
#> [22] tinytex_0.58 purrr_1.2.1 bit_4.6.0
#> [25] xfun_0.57 cachem_1.1.0 beachmat_2.27.3
#> [28] jsonlite_2.0.0 blob_1.3.0 rhdf5filters_1.23.3
#> [31] Rhdf5lib_1.33.6 BiocParallel_1.45.0 irlba_2.3.7
#> [34] parallel_4.6.0 R6_2.6.1 RColorBrewer_1.1-3
#> [37] bslib_0.10.0 stringi_1.8.7 parallelly_1.46.1
#> [40] jquerylib_0.1.4 Rcpp_1.1.1 bookdown_0.46
#> [43] knitr_1.51 future.apply_1.20.2 tidyselect_1.2.1
#> [46] dichromat_2.0-0.1 yaml_2.3.12 codetools_0.2-20
#> [49] curl_7.0.0 listenv_0.10.1 lattice_0.22-9
#> [52] tibble_3.3.1 S7_0.2.1 withr_3.0.2
#> [55] KEGGREST_1.51.1 evaluate_1.0.5 isoband_0.3.0
#> [58] BiocFileCache_3.1.0 ExperimentHub_3.1.0 Biostrings_2.79.5
#> [61] pillar_1.11.1 BiocManager_1.30.27 filelock_1.0.3
#> [64] checkmate_2.3.4 RcppHNSW_0.6.0 ggplot2_4.0.2
#> [67] BiocVersion_3.23.1 scales_1.4.0 sparseMatrixStats_1.23.0
#> [70] globals_0.19.1 glue_1.8.0 tools_4.6.0
#> [73] AnnotationHub_4.1.0 data.table_1.18.2.1 ScaledMatrix_1.19.0
#> [76] RSpectra_0.16-2 grid_4.6.0 AnnotationDbi_1.73.0
#> [79] BiocSingular_1.27.1 cli_3.6.5 rsvd_1.0.5
#> [82] rappdirs_0.3.4 dplyr_1.2.0 gtable_0.3.6
#> [85] zeallot_0.2.0 sass_0.4.10 digest_0.6.39
#> [88] farver_2.1.2 memoise_2.0.1 htmltools_0.5.9
#> [91] lifecycle_1.0.5 httr_1.4.8 MASS_7.3-65
#> [94] bit64_4.6.0-1