You can install the latest version of spacexr from Bioconductor:
Robust Cell Type Decomposition (RCTD) is a statistical method for decomposing cell type mixtures in spatial transcriptomics data. In this vignette, we will use a simulated dataset to demonstrate how you can run RCTD on spatial transcriptomics data and visualize your results.
We will use the term pixel, synonymous with spot or bead, to refer to any observation that measures gene expression at a fixed location. We do not use the term cell because, in spatial transcriptomics data, single pixels may contain mixtures of multiple cells. The purpose of RCTD is to estimate the proportions of different cell types in each pixel of a dataset.
Before running RCTD, we must first load in:
The reference data should be stored in a SummarizedExperiment object with:
Required components:
assay: A matrix (or dgCMatrix) of gene
expression counts, with genes as rows and cells as columns. Row names
should be unique gene names, and column names should be unique cell
barcodes. Counts should be untransformed integers.
colData: A factor column containing cell type
annotations for each cell in the reference. The column name is assumed
to be “cell_type”, but another name may be specified using the
cell_type_col parameter in
createRctd.
Optional components:
colData: A numeric column named nUMI,
containing total UMI counts for each cell. If not provided,
nUMI will be calculated as the column sums of the counts
matrix.In practice, your reference could be an annotated dataset from:
Here, we load counts and cell type annotations from the
rctdSim object, but in a real-world setting, the reference
information may be distributed across several files or R objects.
# Load simulated data.
data("rctdSim")
# Create SummarizedExperiment.
reference_se <- SummarizedExperiment(
assays = list(counts = rctdSim$reference_counts),
colData = rctdSim$reference_cell_types
)
# Examine reference. (optional)
print(dim(assay(reference_se))) # Gene expression matrix dimensions
#> [1] 750 75
table(colData(reference_se)$cell_type) # Number of occurrences of each cell type
#>
#> ct1 ct2 ct3
#> 25 25 25In this vignette, we have a reference dataset of 750 genes for 75
cells. We find that there are three cell types (ct1,
ct2, and ct3), each appearing 25 times in the
reference.
Next, we will load the spatial transcriptomics data into a SpatialExperiment object, though we could use any SummarizedExperiment object with:
Required components:
assay: A matrix (or dgCMatrix) of gene
expression counts, with genes as rows and pixels as columns. Row names
should be unique gene names, and column names should be unique pixel
barcodes. Counts should be untransformed integers.Optional components:
spatialCoords: A matrix containing x and y
coordinates for each pixel.
colData: A numeric column named nUMI,
containing total UMI counts for each pixel. If not provided,
nUMI will be calculated as the column sums of the counts
matrix.
Here, we load coordinates and counts from the rctdSim
object. Again, in practice, this step will depend on how you store your
spatial transcriptomics data.
# Create SpatialExperiment.
spatial_spe <- SpatialExperiment(
assay = rctdSim$spatial_rna_counts,
spatialCoords = rctdSim$spatial_rna_coords
)
# Examine pixels. (optional)
print(dim(assay(spatial_spe))) # Gene expression matrix dimensions
#> [1] 750 12
ggplot(spatialCoords(spatial_spe), aes(x, y)) +
geom_point(size = 5) +
theme(axis.text = element_blank(), axis.ticks = element_blank()
)
We find that we have a dataset of 750 genes for 12 pixels, plotted above.
The ground truth for our cell type mixtures is given in the SpatialExperiment
object rctdSim$proportions_spe. (The ground truth has been
stored in the same format as the RCTD output, which will be discussed
later.) Note that this ground truth will not be known in practice—it is
what we will need to estimate with RCTD.
We find that there are two distinct cell type mixtures in our dataset:
ct1 and 10% ct2.ct1 and 40% ct2 and ct3
each.We can visualize these mixtures with the plotAllWeights
function, which plots each pixel as a pie chart of cell types. We will
return to this function later to visualize our RCTD results.
plotAllWeights(
rctdSim$proportions_spe,
r = 0.05, lwd = 0, title = "Ground Truth"
)
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#> ℹ Please use tidy evaluation idioms with `aes()`.
#> ℹ See also `vignette("ggplot2-in-packages")` for more information.
#> ℹ The deprecated feature was likely used in the spacexr package.
#> Please report the issue at <https://github.com/ggrajeda/spacexr/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
Running RCTD is a two-step process:
createRctd function,
which:
runRctd.Before running RCTD, we perform some preprocessing with
createRctd. Some important configuration options
include:
gene_cutoff, fc_cutoff: Used for filtering
genes in platform effect normalization. gene_cutoff filters
for average expression and fc_cutoff filters for log fold
change across cell types.gene_cutoff_reg, fc_cutoff_reg: Similar to
above, but for the RCTD step.UMI_min, UMI_max: Minimum and maximum UMI
count per pixel.Now we can run RCTD on the preprocessed data with the
runRctd function.
RCTD has three modes, each suited for different spatial technologies:
"doublet": Assigns 1-2 cell types per pixel. Best for
high-resolution technologies like Slide-seq and MERFISH."multi": Like doublet mode but can assign more cell
types (up to max_multi_types). Best for lower-resolution
technologies like 100-micron Visium."full": No restrictions on number of cell types.The rctd_mode argument is used to specify the mode in
which the RCTD algorithm runs. Other important configuration options
include:
max_cores: Number of cores used for parallel
processing. We recommend setting this to at least 4 to improve
efficiency. If set to 1, parallel processing is not used.max_multi_types: Maximum number of cell types per pixel
(multi mode only).For the purposes of demonstration, we’ll decompose the data into doublets in this vignette.
RCTD returns a SpatialExperiment object containing:
Three assays (one in full mode):
weights: Cell type proportions restricted according to
the specified modeweights_unconfident: Cell type proportions restricted
according to the specified mode, including unconfident predictions (not
available in full mode)weights_full: Unrestricted cell type proportions (not
available in full mode, use weights instead)Assays have cell types as rows and pixels as columns, with values representing the proportion (0 to 1) of each cell type in each pixel. Assay columns sum to 1 (except for rejected pixels, which sum to 0).
NOTE: Weights are transposed compared to the output of the original dmcable/spacexr package on GitHub.
Spatial coordinates for each pixel stored in
spatialCoords.
Additional colData including:
spot_class: Classification as singlet,
doublet_certain, doublet_uncertain, or
rejectfirst_type, second_type: Predicted cell
typesmin_score,
singlet_score for advanced userscell_type_list: List of cell types per pixelconf_list: List of whether cell type predictions are
confidentmin_score for advanced
usersWe can view the doublet weights for our data:
# Cell type mixture 1
assay(results_spe, "weights")[, 1:6]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#> mx11 mx12 mx13 mx14 mx15 mx16
#> ct1 1 1 1 1 1 1
#> ct2 . . . . . .
#> ct3 . . . . . .
# Cell type mixture 2
assay(results_spe, "weights")[, 7:12]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#> mx21 mx22 mx23 mx24 mx25 mx26
#> ct1 . . . . . .
#> ct2 0.4935949 0.475943 0.4685428 0.4577326 0.5103718 0.4795314
#> ct3 0.5064051 0.524057 0.5314572 0.5422674 0.4896282 0.5204686You can access the full, unrestricted weights using
assay(results_spe, "weights_full"). We will visualize these
weights later. You may use
assay(results_spe, "weights_unconfident") to access the
less confident predictions (e.g., the second_type weight
for singlets and uncertain doublets).
Below, we can view the results of RCTD deconvolution:
classification_df <- data.frame(
pixel = colnames(assay(results_spe)),
spot_class = colData(results_spe)$spot_class,
first_type = colData(results_spe)$first_type,
second_type = colData(results_spe)$second_type
)
classification_df
#> pixel spot_class first_type second_type
#> 1 mx11 singlet ct1 ct2
#> 2 mx12 singlet ct1 ct2
#> 3 mx13 singlet ct1 ct2
#> 4 mx14 singlet ct1 ct2
#> 5 mx15 singlet ct1 ct2
#> 6 mx16 singlet ct1 ct2
#> 7 mx21 doublet_certain ct2 ct3
#> 8 mx22 doublet_certain ct3 ct2
#> 9 mx23 doublet_certain ct3 ct2
#> 10 mx24 doublet_certain ct3 ct2
#> 11 mx25 doublet_certain ct2 ct3
#> 12 mx26 doublet_certain ct3 ct2In particular, this data frame provides us with the following information:
spot_class: A factor variable with the following levels
of classification:
singlet (1 cell type in the pixel)doublet_certain (2 cell types in the pixel)doublet_uncertain (2 cell types in the pixel, but only
confident of 1)reject (no prediction given for pixel)reject pixels should not be used, and
doublet_uncertain pixels should only be used for
applications that do not require knowledge of all cell types in a
pixel.first_type: the first cell type predicted on the bead
(for all spot_class conditions except
reject).second_type: the second cell type predicted on the bead
for doublet spot_class conditions (not a confident
prediction for doublet_uncertain).We can see that RCTD generally succeeded in identifying the
components of the cell mixtures, although pixels with 90%
ct1 and 10% ct2 were classified as singlets.
(As an exercise, try changing the doublet_threshold
parameter in runRctd so that these mixtures will be
identified as doublets.)
We can visualize the results of our RCTD run with the
plotAllWeights function. Simply pass in the RCTD results
(and some aesthetic arguments), and the cell type proportions will be
plotted as pie charts for each pixel.
Note that all mixtures with 90% ct1 (on the left) were
classified as singlets, so only ct1 is plotted. And for the
other mixtures (on the right), only cell types ct2 and
ct3 are plotted (without any ct1) due to the
restrictions of doublet mode. We can visualize the full, unrestricted
proportions by passing in the name of the relevant assay
("weights_full") to the plotAllWeights
function:
This is very close to the ground truth!
Finally, we can visualize how the proportion of a specific cell type
varies across space with the plotCellTypeWeight function.
We show an example with the dummy cell type ct1.
plotCellTypeWeight(
results_spe, "ct1", "weights_full",
size = 5, stroke = 0.5, title = "\nDensity of Cell Type 1\n"
)
You are now ready to apply RCTD to your own spatial transcriptomics datasets!
sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] spacexr_1.3.0 SpatialExperiment_1.19.1
#> [3] SingleCellExperiment_1.31.1 SummarizedExperiment_1.39.2
#> [5] Biobase_2.69.1 GenomicRanges_1.63.0
#> [7] Seqinfo_0.99.4 IRanges_2.45.0
#> [9] S4Vectors_0.49.0 BiocGenerics_0.55.4
#> [11] generics_0.1.4 MatrixGenerics_1.21.0
#> [13] matrixStats_1.5.0 ggplot2_4.0.0
#> [15] BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 dplyr_1.1.4 farver_2.1.2
#> [4] blob_1.2.4 filelock_1.0.3 S7_0.2.0
#> [7] fastmap_1.2.0 BiocFileCache_2.99.6 tweenr_2.0.3
#> [10] digest_0.6.37 lifecycle_1.0.4 RSQLite_2.4.3
#> [13] magrittr_2.0.4 compiler_4.5.1 rlang_1.1.6
#> [16] sass_0.4.10 tools_4.5.1 yaml_2.3.10
#> [19] knitr_1.50 S4Arrays_1.11.0 labeling_0.4.3
#> [22] bit_4.6.0 curl_7.0.0 DelayedArray_0.37.0
#> [25] RColorBrewer_1.1-3 abind_1.4-8 withr_3.0.2
#> [28] purrr_1.1.0 sys_3.4.3 grid_4.5.1
#> [31] polyclip_1.10-7 scales_1.4.0 MASS_7.3-65
#> [34] cli_3.6.5 rmarkdown_2.30 crayon_1.5.3
#> [37] httr_1.4.7 rjson_0.2.23 DBI_1.2.3
#> [40] cachem_1.1.0 ggforce_0.5.0 BiocManager_1.30.26
#> [43] XVector_0.49.3 yulab.utils_0.2.1 vctrs_0.6.5
#> [46] Matrix_1.7-4 jsonlite_2.0.0 bit64_4.6.0-1
#> [49] maketools_1.3.2 magick_2.9.0 tidyr_1.3.1
#> [52] jquerylib_0.1.4 glue_1.8.0 gtable_0.3.6
#> [55] quadprog_1.5-8 tibble_3.3.0 pillar_1.11.1
#> [58] rappdirs_0.3.3 htmltools_0.5.8.1 R6_2.6.1
#> [61] dbplyr_2.5.1 httr2_1.2.1 evaluate_1.0.5
#> [64] lattice_0.22-7 memoise_2.0.1 ggfun_0.2.0
#> [67] bslib_0.9.0 Rcpp_1.1.0 SparseArray_1.9.2
#> [70] scatterpie_0.2.6 xfun_0.53 fs_1.6.6
#> [73] buildtools_1.0.0 pkgconfig_2.0.3