1 Introduction

DESpace is a framework for identifying spatially variable genes (SVGs), a common task in spatial transcriptomics analyses, and differential spatial variable pattern (DSP) genes, which identify differences in spatial gene expression patterns across experimental conditions.

By leveraging pre-annotated spatial clusters as summarized spatial information, DESpace models gene expression with a negative binomial (NB), via edgeR (Robinson, McCarthy, and Smyth 2010), with spatial clusters as covariates. SV genes are then identified by testing the significance of spatial clusters. For detailed guidance on detecting SVGs with DESpace, refer to SVGs vignettes.

For multi-sample, multi-condition datasets, again we fit a NB model via edgeR (Robinson, McCarthy, and Smyth 2010), but this time we use spatial clusters, conditions and their interactions as covariates. DSP genes are then identified by testing the interaction between spatial clusters and conditions. Notably, this framework can identify differences also between more than 2 groups. This vignette will demonstrate how to perform DSP analyses.

2 Load packages

suppressMessages({
    library(DESpace)
    library(ggplot2)
    library(SpatialExperiment)
    library(ExperimentHub)
    library(reshape2)
    library(tidyverse)
    library(patchwork)
    library(splines)
    library(edgeR)
    library(muscat)
})
set.seed(123)

3 Data

As an example dataset, we consider a multi-sample, multi-group spatially resolved transcriptomics data - the Stereo-seq dataset of taxolotl telencephalon brain regeneration stages (Wei et al. 2022). The dataset includes axolotl brain tissues collected at various days post-injury (DPI): 2 (3 sections), 5 (3 sections), 10 (3 sections), 15 (4 sections), 20 (3 sections), 30 (1 section) and 60 (1 section), after the removal of a reproducible portion of dorsal pallium in left telencephalic hemisphere of axolotl. The original dataset is available for download via [ARTISTA] (https://db.cngb.org/stomics/artista/download/), and the processed dataset (including spatial clusters) can be accessed via muSpaData ExperimentHub package.

3.1 Input data

Here, we use a subset of the original data, consisting of three distinct regeneration stages: 2, 10 and 20 DPI, with two sections for each stage.

# Load the small example data
eh <- ExperimentHub()
spe <- eh[["EH9613"]]; rm(eh)
# The following columns from colData(spe) are specified:
coordinates <- c("sdimx", "sdimy") # coordinates of cells
spatial_cluster <- 'Banksy_smooth' # Banksy spatial clusters
condition_col <- 'condition'       # regeneration time phases
sample_col <- 'sample_id'          # tissue section id
colData(spe) |> head()

## DataFrame with 6 rows and 5 columns
##                    sample_id condition Banksy_smooth     sdimx     sdimy
##                     <factor>  <factor>      <factor> <numeric> <numeric>
## CELL.17879.10DPI_1   10DPI_1     10DPI             4         1      2406
## CELL.17922.10DPI_1   10DPI_1     10DPI             4        10      2372
## CELL.17966.10DPI_1   10DPI_1     10DPI             4        29      3090
## CELL.17976.10DPI_1   10DPI_1     10DPI             4        33      3139
## CELL.17987.10DPI_1   10DPI_1     10DPI             4        33      2267
## CELL.17988.10DPI_1   10DPI_1     10DPI             4        37      2791

The spatial tissues of each sample were annotated via Banksy (Singhal et al. 2024), classifying cells into five clusters. These cluster annotations are stored in the Banksy_smooth column of colData. Additionally, the columns sdimx and sdimy contain the spatial coordinates of the cells, while the condition column specifies the group (i.e., stage) each cell belongs to.

3.2 Quality control/filtering

Quality control (QC) procedures at the cell and gene level aim to remove both low-quality cells, and lowly abundant genes. For QC, we adhere to the instructions from “Orchestrating Spatially Resolved Transcriptomics Analysis with Bioconductor” (OSTA). Library size and UMI counts are used to identify low-quality cells. Then, we discard lowly abundant genes that are detected in fewer than 20 cells. R scripts for performing quality control on this example dataset can be found in muSpaData R scripts.

3.3 Clustering

This framework relies on spatial clusters being accessible and successfully summarizing the primary spatial characteristics of the data. In most datasets, these spatial features are either accessible or can be generated with spatial clustering algorithms.

3.3.1 Manual annotation

If the manual annotation (e.g., annotated by a pathologist) for each sample is available, we can combine all samples and use manual annotations directly. Note that cluster labels must be consistent across samples (i.e., cluster 1 in sample 1 should represent the same tissue as cluster 1 in sample 2).

3.3.2 Spatially resolved (multi-sample) clustering

If manual annotations are not available, we can use spatially resolved clustering tools. These methods, by jointly employing spatial coordinates and gene expression data, enable obtaining spatial clusters.

Among others, BayesSpace (Zhao et al. 2021) and Banksy (Singhal et al. 2024) allow jointly clustering multiple samples. In particular each tool has a specific vignettes for multi-sample clustering: BayesSpace vignettes, and Banksy vignettes.

Details on applying Banksy joint clustering to this example dataset can also be found in muSpaData R scripts.

# View Banksy clusters 
# The spatial cluster assignments are available in the `colData(spe)`
CD <- colData(spe) |> as.data.frame()
ggplot(CD, aes(x = sdimx, y = sdimy, color = factor(Banksy_smooth))) +
    geom_point(size = 0.25) +
    facet_wrap(~sample_id, scales = 'free') +
    theme_void() +
    theme(legend.position = "bottom") +
    guides(color = guide_legend(override.aes = list(size = 3))) +
    labs(color = NULL, title = "Banksy Spatial Clusters")

3.3.2.1 Single sample clustering

In our benchmarks, we have noticed that, with both BayesSpace and Banksy, joint spatial clustering of multiple samples does not always yield more accurate results than spatial clustering of individual samples. Therefore, if multi-sample clustering fails, we suggest clustering individual samples (as described in Section 3 Individual sample in the SVG Vignette) and manually matching cluster ids across samples, to ensure that “cluster 1” always refers to the same spatial region in all samples.

4 DSP testing

Once we have spatial clusters, we can search for DSP between conditions. Importantly, only clusters identified in all samples will be analyzed.

4.1 Gene-level test

Fit the model via dsp_test function. Parameter spe specifies the input SpatialExperiment or SingleCellExperiment object, while cluster_col, sample_col and condition_col define the column names in colData(spe) for spatial clusters, sample ids, and condition ids, respectively. Set verbose to TRUE (default value) to view detailed statistics.

results <- dsp_test(spe = spe,
                    cluster_col = spatial_cluster,
                    sample_col = sample_col,
                    condition_col = condition_col,
                    verbose = TRUE)

## Using 'dsp_test' for spatial variable pattern genes detection.

## Filter low quality clusters:

## Cluster levels to keep: 0, 1, 2, 3, 4

## Design model: row names represent sample names, followed by underscores and cluster names.

##          (Intercept) condition20DPI condition2DPI cluster_id1 cluster_id2
## 2DPI_1_0           1              0             1           0           0
## 2DPI_2_0           1              0             1           0           0
##          cluster_id3 cluster_id4 condition20DPI:cluster_id1
## 2DPI_1_0           0           0                          0
## 2DPI_2_0           0           0                          0
##          condition2DPI:cluster_id1 condition20DPI:cluster_id2
## 2DPI_1_0                         0                          0
## 2DPI_2_0                         0                          0
##          condition2DPI:cluster_id2 condition20DPI:cluster_id3
## 2DPI_1_0                         0                          0
## 2DPI_2_0                         0                          0
##          condition2DPI:cluster_id3 condition20DPI:cluster_id4
## 2DPI_1_0                         0                          0
## 2DPI_2_0                         0                          0
##          condition2DPI:cluster_id4
## 2DPI_1_0                         0
## 2DPI_2_0                         0

A list of results is returned, with the main results of interest stored in the gene_results data frame. This frame contains several columns, including gene names (gene_id), log2-fold changes between groups (e.g, logFC.condition2DPI.cluster_id1), average (across cells) log-2 counts per million (logCPM), likelihood ratio test statistics (LR), raw p-values (PValue) and Benjamini-Hochberg adjusted p-values (FDR).

Specifically, the column logFC.condition2DPI.cluster_id1 represents the difference in the log2-fold change of gene expression under 2 DPI in cluster 1 relative to the baseline condition (10 DPI) and baseline cluster (cluster 0).

In other words, we are testing whether the spatial structure of gene expression (summarized by the clusters) differs between 2 and 10 DPI.

head(results$gene_results, 2)

##                       gene_id logFC.condition20DPI.cluster_id1
## AMEX60DD014721 AMEX60DD014721                      -0.07170723
## AMEX60DD045083 AMEX60DD045083                       0.08759059
##                logFC.condition2DPI.cluster_id1 logFC.condition20DPI.cluster_id2
## AMEX60DD014721                      -0.3898784                       -0.8510223
## AMEX60DD045083                       0.5418637                       -1.3377605
##                logFC.condition2DPI.cluster_id2 logFC.condition20DPI.cluster_id3
## AMEX60DD014721                       0.8526301                       -0.8328237
## AMEX60DD045083                       1.1226997                        0.2478193
##                logFC.condition2DPI.cluster_id3 logFC.condition20DPI.cluster_id4
## AMEX60DD014721                       -0.945083                        0.2228780
## AMEX60DD045083                        1.266634                       -0.9964801
##                logFC.condition2DPI.cluster_id4   logCPM        LR       PValue
## AMEX60DD014721                       0.2085777 9.344907 100.69288 3.081052e-18
## AMEX60DD045083                      -0.7969334 7.505402  97.72697 1.243325e-17
##                         FDR
## AMEX60DD014721 1.540526e-14
## AMEX60DD045083 3.108313e-14

The second element of the results (a DGEList object estimated_y) contains the estimated common dispersion.

The third and fourth element of the results (DGEGLM and DGELRT objects) contain full statistics from edgeR::glmFit and edgeR::glmLRT.

class(results$estimated_y); class(results$glmLrt); class(results$glmFit)

## [1] "DGEList"
## attr(,"package")
## [1] "edgeR"

## [1] "NULL"

## [1] "DGEGLM"
## attr(,"package")
## [1] "edgeR"

Visualize the gene expression of the most significant genes with FeaturePlot(). Note that the gene names in vector feature, should also appear in the count matrix’s row names. Specifying the column names of spatial coordinates of spots is only necessary when they are not named row and col.

sample_ids <- levels(CD$sample_id)

# Identify the top DSP
(feature <- results$gene_results$gene_id[1])

## [1] "AMEX60DD014721"

# Extract the gene_name by matching the gene_id
(feature_name <- rowData(spe)$gene_id[
  rowData(spe)$gene_name %in% feature
])

## [1] "ECM1"

# generate a list of plots
plots <- lapply(sample_ids, function(sample_id) {
  
  # Subset spe for each sample
  spe_j <- spe[, colData(spe)$sample_id == sample_id]
  
  # Create FeaturePlot for the sample
  plot <- FeaturePlot(spe_j, feature,
                      coordinates = coordinates,
                      platform = "Stereo-seq", ncol = 1,
                      diverging = TRUE,
                      point_size = 0.1, legend_exprs = TRUE) + 
    theme(legend.position = "right",
          legend.key.size = unit(0.5, 'cm')) +
    labs(color = "") + ggtitle(sample_id) 
  
  return(plot)
})

The spatial structure of gene expression changes across conditions, transitioning from more localized patterns at earlier stages (2 and 10 DPI) to a broader distribution at a later stage (20 DPI).

combined_plot <- wrap_plots(plots, ncol = 3) + 
    # common legend
    plot_layout(guides = 'collect')  
combined_plot

4.2 Individual cluster test

DESpace can also be used to reveal the specific areas of the tissue affected by spatial variability; i.e., spatial clusters that are particularly over/under abundant compared to the average across conditions. Function individual_dsp() can be used to identify DSP genes for each individual cluster. Parameters cluster_col, sample_col and condition_col indicate the column names in colData(spe) for spatial clusters, sample ids, and condition ids, respectively.

cluster_results <- individual_dsp(spe,
                                  cluster_col = spatial_cluster,
                                  sample_col = sample_col,
                                  condition_col = condition_col)

individual_dsp() returns a list containing the results of the individual cluster tests. Similarly to above, the results for each cluster are presented as a data.fame, where columns contain gene names (gene_id), likelihood ratio test statistics (LR), log2-fold changes (logFC), raw p-values (PValue) and Benjamini-Hochberg adjusted p-values (FDR).

Here, we present the top results for cluster 2. logFC.condition20DPI.cluster_id2 represents the interaction between the 20 DPI condition and cluster 2. It compares the effect of 20 DPI in cluster 2 with its effect in all other clusters (i.e., all tissue regions excluding cluster 2, which serves as the baseline). A positive log-fold change value suggests that, the increase in gene expression in cluster 2 from 10 DPI (the baseline) to 20 DPI is greater than the increase in gene expression in all other clusters from 10 DPI to 20 DPI.

class(cluster_results)

## [1] "list"

names(cluster_results)

## [1] "0" "1" "2" "3" "4"

cluster_results$`2` |> head(n = 4)

##                       gene_id logFC.condition20DPI.cluster_id2
## AMEX60DD014721 AMEX60DD014721                       -0.5801351
## AMEX60DD014991 AMEX60DD014991                        2.6944195
## AMEX60DD055246 AMEX60DD055246                       -0.2660938
## AMEX60DD045083 AMEX60DD045083                       -1.0631436
##                logFC.condition2DPI.cluster_id2   logCPM       LR       PValue
## AMEX60DD014721                        1.270051 9.582333 74.62550 6.241352e-17
## AMEX60DD014991                        3.015258 7.330327 73.65886 1.012001e-16
## AMEX60DD055246                       -2.298605 5.661946 61.49402 4.433456e-14
## AMEX60DD045083                        0.906544 8.266790 59.17136 1.416127e-13
##                         FDR
## AMEX60DD014721 2.530002e-13
## AMEX60DD014991 2.530002e-13
## AMEX60DD055246 7.389093e-11
## AMEX60DD045083 1.770159e-10

Visualize the gene expression of the top gene for cluster 2.

# one of top DSPs for cluster 2
(feature <- rownames(cluster_results[["2"]])[4])

## [1] "AMEX60DD045083"

# Extract the gene_name by matching the gene_id
(feature_name <- rowData(spe)$gene_id[
  rowData(spe)$gene_name == feature
])

## [1] "SFRP2"

One way is to plot the overall abundance of SFRP2 for each cluster-sample combination. Under the null hypothesis, gene expression changes across conditions are consistent across clusters.

The boxplots below show the average log-CPM for cluster 2 and for all other clusters (excluding cluster 2) across different stages. In Cluster 2, the average abundance is highest at 2 DPI, then decreases at 10 DPI and continues to drop at 20 DPI. In contrast, although there is a slight decrease in abundance across other clusters, it remains relatively constant overall.

Code

# calculate log cpm
cps <- cpm(results$estimated_y, log = TRUE)
cps_name <- colnames(cps)
mdata <- data.frame(
    log_cpm = cps[feature, ] ,
    Banksy_smooth = factor(sub(".*_", "", cps_name)),
    day = as.numeric(sub("([0-9]+)DPI.*", "\\1", cps_name)),
    sample_id = sub("(_[0-9]+)$", "", cps_name)
)
plt <- ggplot(mdata, aes(x = factor(day), y = log_cpm)) +
    geom_jitter(aes(color = Banksy_smooth), size = 2, width = 0.1) + 
    geom_boxplot(aes(fill = ifelse(Banksy_smooth == "2", 
                                   "cluster 2", "non-cluster 2")), 
                 position = position_dodge(width = 0.8), alpha = 0.5) +
    scale_x_discrete(breaks = c(2, 10, 20)) +  
    scale_fill_manual(values = c("#4DAF4A", "grey")) + 
    labs(title = feature_name, x = "Days post injury", 
         y = "log-2 counts per million (logCPM)", fill = "",
         color = "Banksy cluster") +
     theme(legend.position = "right")

# figure
plt

Alternatively, gene expression can be visualized in physical space with FeaturePlot(). A cluster outline drawn by specifying the column names of clusters stored in colData(spe) and the vector of cluster names via cluster_col and cluster.

Code

# generate a list of FeaturePlots
plots <- lapply(sample_ids, function(sample_id) {
    # Subset spe for each sample
    spe_j <- spe[, colData(spe)$sample_id == sample_id]
    # Create FeaturePlot for the sample
    plot <- FeaturePlot(spe_j, feature, 
                        cluster_col = spatial_cluster,
                        coordinates = coordinates, cluster = '2',
                        platform = "Stereo-seq",
                        diverging = TRUE,
                        point_size = 0.1,
                        linewidth = 0.6) +
        theme(legend.position = "right",
              legend.key.size = unit(0.5, 'cm')) +
        labs(color = "") + ggtitle(sample_id) 
  
    return(plot)
})
combined_plot <- wrap_plots(plots, ncol = 3) + 
    # common legend
    plot_layout(guides = 'collect')

Again, the spatial structure of gene expression varies across groups; in particular, at 2 and 10 DPI, abundance is higher in cluster 2 (outlined in the plot), compared to the rest of the tissue, while at 20 DPI abundance is more homogeneous.

# figure
combined_plot

4.3 Smooth splines to model time

DESpace offers a flexible framework that allows users to create a custom design matrix. The default design matrix is model.matrix(~ condition * cluster). Below, we provide an example of how to create a design matrix using piecewise-cubic splines to account for the effect of time.

First, we create metadata associated with the samples and clusters. For each cluster level, there are 3 time phases (i.e., day) and 2 replicates (i.e., `rep``) for each time point.

# all combinations of sample and cluster
metadata <- expand.grid(sample_id = levels(spe$sample_id),
                        cluster = levels(spe$Banksy_smooth)
                        ) |>
    # extract time point as 'day' from sample_id 
    mutate(
      day = as.numeric(sub("DPI.*", "", sample_id)),
      rep = as.numeric(sub(".*_", "", sample_id)) 
      )
metadata |> head(n = 3)

##   sample_id cluster day rep
## 1    2DPI_1       0   2   1
## 2    2DPI_2       0   2   2
## 3   10DPI_1       0  10   1

Instead of treating time phases (e.g., 2 DPI, 10 DPI, 20 DPI) as a categorical variable, we can model the time trend using a smooth spline function. This can be achieved with the ns(x, df) function from the splines package. Here, x represents the predictor variable—time phases (day in the metadata) in our case-and df specifies the degrees of freedom, which determine the total number of parameters in the ns() time model, including the intercept.

design_model <- model.matrix(~ cluster * ns(day, df = 2), 
                             data = metadata)
rownames(design_model) <- paste0(metadata$sample_id, "_",
                                 metadata$cluster)
dim(design_model)

## [1] 30 15

design_model |> head(n = 3)

##           (Intercept) cluster1 cluster2 cluster3 cluster4 ns(day, df = 2)1
## 2DPI_1_0            1        0        0        0        0        0.0000000
## 2DPI_2_0            1        0        0        0        0        0.0000000
## 10DPI_1_0           1        0        0        0        0        0.5513298
##           ns(day, df = 2)2 cluster1:ns(day, df = 2)1 cluster2:ns(day, df = 2)1
## 2DPI_1_0         0.0000000                         0                         0
## 2DPI_2_0         0.0000000                         0                         0
## 10DPI_1_0       -0.2274421                         0                         0
##           cluster3:ns(day, df = 2)1 cluster4:ns(day, df = 2)1
## 2DPI_1_0                          0                         0
## 2DPI_2_0                          0                         0
## 10DPI_1_0                         0                         0
##           cluster1:ns(day, df = 2)2 cluster2:ns(day, df = 2)2
## 2DPI_1_0                          0                         0
## 2DPI_2_0                          0                         0
## 10DPI_1_0                         0                         0
##           cluster3:ns(day, df = 2)2 cluster4:ns(day, df = 2)2
## 2DPI_1_0                          0                         0
## 2DPI_2_0                          0                         0
## 10DPI_1_0                         0                         0

Fit the model via dsp_test function.

results <- dsp_test(spe,
                    design = design_model,
                    cluster_col = spatial_cluster,
                    sample_col = sample_col,
                    condition_col = condition_col,
                    verbose = TRUE)

# count significant DSP genes (at 5% FDR significance level)
res_global <- results$gene_results
table(res_global$FDR <= 0.05)

## 
## FALSE  TRUE 
##  4809   191

To identify key spatial clusters where expression changes across conditions, we apply the smooth spline with a single-cluster design. Specifically, we convert the original Banksy clusters into two groups: the target cluster and all other clusters. We then apply the same test as in the global test above.

# example: testing for cluster 2
# convert 5 Banksy clusters into 2 groups: cluster 2 vs. all other clusters
new_cluster <- factor(ifelse(spe$Banksy_smooth %in% '2', '2', 'Other'))
metadata2 <- expand.grid(sample_id = levels(spe$sample_id),
                         cluster = levels(new_cluster)) |>
    # extract time point as 'day' from sample_id 
    mutate(
        day = as.numeric(sub("DPI.*", "", sample_id)),
        rep = as.numeric(sub(".*_", "", sample_id)) 
      )

Create a single-cluster design.

# design model for testing the cluster 2
design_model2 <- model.matrix(~ cluster * ns(day, df = 2),
                              data = metadata2)
rownames(design_model2) <- paste0(metadata2$sample_id, "_",
                                  metadata2$cluster)
design_model2 |> head(n = 3)

##           (Intercept) clusterOther ns(day, df = 2)1 ns(day, df = 2)2
## 2DPI_1_2            1            0        0.0000000        0.0000000
## 2DPI_2_2            1            0        0.0000000        0.0000000
## 10DPI_1_2           1            0        0.5513298       -0.2274421
##           clusterOther:ns(day, df = 2)1 clusterOther:ns(day, df = 2)2
## 2DPI_1_2                              0                             0
## 2DPI_2_2                              0                             0
## 10DPI_1_2                             0                             0

Fit the single-cluster model via dsp_test.

spe$cluster2 <- new_cluster
results2 <- dsp_test(spe,
                    design = design_model2,
                    cluster_col = "cluster2",
                    sample_col = sample_col,
                    condition_col = condition_col,
                    verbose = TRUE)

# count significant DSP genes (at 5% FDR significance level)
res_global2 <- results2$gene_results
table(res_global2$FDR <= 0.05)

## 
## FALSE  TRUE 
##  4960    40

# identify the top DSP for cluster 2
(feature <- results2$gene_results$gene_id[5])

## [1] "AMEX60DD002984"

# extract the gene_name by matching the gene_id
(feature_name <- rowData(spe)$gene_id[
  rowData(spe)$gene_name %in% feature
])

## [1] "NEFM"

4.3.1 Visualization

4.3.1.1 Predicted trend
4.3.1.2 Spatial expression

To explore predicted counts based on estimated coefficients, we calculate and visualize the fitted values for NEFM. The expression of NEFM in cluster 2 first increase and then decrease, while in the remaining regions, the expression slightly increase over time.

Code

fitted_values <- results2[["glmFit"]][["fitted.values"]]
m <- melt(fitted_values[feature,]) |>
    rownames_to_column("row_name_column") |>
    setNames(c("sample_id", "fitted")) |>
    mutate(
        day = as.numeric(sub("DPI.*", "", sample_id)),
        cluster = as.factor(sub(".*_", "", sample_id)) 
      )
m |> head(n = 3)

##   sample_id    fitted day cluster
## 1  2DPI_1_2  151.7507   2       2
## 2  2DPI_2_2  130.8884   2       2
## 3 10DPI_1_2 1024.0551  10       2

plt <- ggplot(m, aes(x=day, y=fitted, group=cluster, colour = cluster)) +
    geom_jitter(size = 3, width = 0.2, height = 0) +
    scale_y_sqrt() + 
    labs(title = feature_name) +
    scale_x_continuous(breaks = c(2, 10, 20)) + 
    xlab("Days post injury")

# figure
plt

Visualize the expression of the top gene, NEFM, across samples. By using annotation_cluster = TRUE, cluster annotations are displayed on the expression plots.

Code

plots <- lapply(sample_ids, function(sample_id) {
    # Subset spe for each sample
    spe_j <- spe[, colData(spe)$sample_id == sample_id]
    # Create FeaturePlot for the sample
    plot <- FeaturePlot(spe_j, feature = feature, 
                        cluster_col = spatial_cluster,
                        coordinates = coordinates, 
                        platform = "Stereo-seq",
                        point_size = 0.001,
                        diverging = TRUE,
                        annotation_cluster = TRUE,
                        annotation_title = sample_id)
  
    return(plot)
})
combined_plot <- wrap_plots(plots, ncol = 2) + 
    # common legend
    plot_layout(guides = 'collect')

The trend aligns with the model’s prediction: gene abundance in cluster 2 peaks at 10 DPI compared to other clusters.

combined_plot

5 Session info

sessionInfo()

## R Under development (unstable) (2025-03-13 r87965)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] splines   stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] muscat_1.21.0               edgeR_4.5.9                
##  [3] limma_3.63.10               patchwork_1.3.0            
##  [5] lubridate_1.9.4             forcats_1.0.0              
##  [7] stringr_1.5.1               dplyr_1.1.4                
##  [9] purrr_1.0.4                 readr_2.1.5                
## [11] tidyr_1.3.1                 tibble_3.2.1               
## [13] tidyverse_2.0.0             reshape2_1.4.4             
## [15] ExperimentHub_2.15.0        AnnotationHub_3.15.0       
## [17] BiocFileCache_2.15.1        dbplyr_2.5.0               
## [19] SpatialExperiment_1.17.0    SingleCellExperiment_1.29.2
## [21] SummarizedExperiment_1.37.0 Biobase_2.67.0             
## [23] GenomicRanges_1.59.1        GenomeInfoDb_1.43.4        
## [25] IRanges_2.41.3              S4Vectors_0.45.4           
## [27] BiocGenerics_0.53.6         generics_0.1.3             
## [29] MatrixGenerics_1.19.1       matrixStats_1.5.0          
## [31] ggplot2_3.5.1               DESpace_1.99.1             
## [33] BiocStyle_2.35.0           
## 
## loaded via a namespace (and not attached):
##   [1] spatstat.sparse_3.1-0    bitops_1.0-9             sf_1.0-19               
##   [4] httr_1.4.7               RColorBrewer_1.1-3       doParallel_1.0.17       
##   [7] numDeriv_2016.8-1.1      tools_4.6.0              sctransform_0.4.1       
##  [10] backports_1.5.0          R6_2.6.1                 mgcv_1.9-1              
##  [13] GetoptLong_1.0.5         withr_3.0.2              prettyunits_1.2.0       
##  [16] gridExtra_2.3            sosta_0.99.4             cli_3.6.4               
##  [19] spatstat.explore_3.3-4   sandwich_3.1-1           labeling_0.4.3          
##  [22] sass_0.4.9               mvtnorm_1.3-3            spatstat.data_3.1-6     
##  [25] proxy_0.4-27             blme_1.0-6               scater_1.35.4           
##  [28] parallelly_1.42.0        RSQLite_2.3.9            shape_1.4.6.1           
##  [31] gtools_3.9.5             spatstat.random_3.3-3    car_3.1-3               
##  [34] Matrix_1.7-3             ggbeeswarm_0.7.2         abind_1.4-8             
##  [37] terra_1.8-29             lifecycle_1.0.4          multcomp_1.4-28         
##  [40] yaml_2.3.10              carData_3.0-5            gplots_3.2.0            
##  [43] SparseArray_1.7.7        grid_4.6.0               blob_1.2.4              
##  [46] crayon_1.5.3             lattice_0.22-6           beachmat_2.23.7         
##  [49] cowplot_1.1.3            KEGGREST_1.47.0          magick_2.8.5            
##  [52] pillar_1.10.1            knitr_1.50               ComplexHeatmap_2.23.0   
##  [55] rjson_0.2.23             boot_1.3-31              estimability_1.5.1      
##  [58] corpcor_1.6.10           future.apply_1.11.3      codetools_0.2-20        
##  [61] glue_1.8.0               spatstat.univar_3.1-2    data.table_1.17.0       
##  [64] vctrs_0.6.5              png_0.1-8                Rdpack_2.6.3            
##  [67] gtable_0.3.6             assertthat_0.2.1         cachem_1.1.0            
##  [70] xfun_0.51                mime_0.13                rbibutils_2.3           
##  [73] S4Arrays_1.7.3           coda_0.19-4.1            reformulas_0.4.0        
##  [76] survival_3.8-3           iterators_1.0.14         tinytex_0.56            
##  [79] units_0.8-7              statmod_1.5.0            TH.data_1.1-3           
##  [82] nlme_3.1-167             smoothr_1.0.1            pbkrtest_0.5.3          
##  [85] bit64_4.6.0-1            filelock_1.0.3           progress_1.2.3          
##  [88] EnvStats_3.0.0           bslib_0.9.0              TMB_1.9.17              
##  [91] irlba_2.3.5.1            vipor_0.4.7              KernSmooth_2.23-26      
##  [94] colorspace_2.1-1         DBI_1.2.3                DESeq2_1.47.5           
##  [97] tidyselect_1.2.1         emmeans_1.10.7           curl_6.2.1              
## [100] bit_4.6.0                compiler_4.6.0           BiocNeighbors_2.1.3     
## [103] DelayedArray_0.33.6      bookdown_0.42            scales_1.3.0            
## [106] caTools_1.18.3           classInt_0.4-11          remaCor_0.0.18          
## [109] rappdirs_0.3.3           digest_0.6.37            goftest_1.2-3           
## [112] fftwtools_0.9-11         spatstat.utils_3.1-3     minqa_1.2.8             
## [115] variancePartition_1.37.2 rmarkdown_2.29           aod_1.3.3               
## [118] XVector_0.47.2           RhpcBLASctl_0.23-42      htmltools_0.5.8.1       
## [121] pkgconfig_2.0.3          lme4_1.1-36              fastmap_1.2.0           
## [124] rlang_1.1.5              GlobalOptions_0.1.2      UCSC.utils_1.3.1        
## [127] farver_2.1.2             jquerylib_0.1.4          zoo_1.8-13              
## [130] jsonlite_1.9.1           BiocParallel_1.41.2      BiocSingular_1.23.0     
## [133] magrittr_2.0.3           Formula_1.2-5            scuttle_1.17.0          
## [136] GenomeInfoDbData_1.2.14  munsell_0.5.1            Rcpp_1.0.14             
## [139] ggnewscale_0.5.1         viridis_0.6.5            stringi_1.8.4           
## [142] MASS_7.3-65              plyr_1.8.9               parallel_4.6.0          
## [145] listenv_0.9.1            ggrepel_0.9.6            deldir_2.0-4            
## [148] Biostrings_2.75.4        tensor_1.5               hms_1.1.3               
## [151] circlize_0.4.16          locfit_1.5-9.12          ggpubr_0.6.0            
## [154] spatstat.geom_3.3-6      ggsignif_0.6.4           ScaledMatrix_1.15.0     
## [157] BiocVersion_3.21.1       evaluate_1.0.3           BiocManager_1.30.25     
## [160] tzdb_0.5.0               nloptr_2.2.1             foreach_1.5.2           
## [163] tweenr_2.0.3             polyclip_1.10-7          future_1.34.0           
## [166] clue_0.3-66              ggforce_0.4.2            rsvd_1.0.5              
## [169] broom_1.0.7              xtable_1.8-4             fANCOVA_0.6-1           
## [172] e1071_1.7-16             rstatix_0.7.2            viridisLite_0.4.2       
## [175] class_7.3-23             lmerTest_3.1-3           glmmTMB_1.1.10          
## [178] AnnotationDbi_1.69.0     memoise_2.0.1            beeswarm_0.4.0          
## [181] cluster_2.1.8.1          timechange_0.3.0         globals_0.16.3

References

Robinson, Mark D, Davis J McCarthy, and Gordon K Smyth. 2010. “EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26 (1): 139–40. https://doi.org/10.1093/bioinformatics/btp616.

Singhal, Vipul, Nigel Chou, Joseph Lee, Yifei Yue, Jinyue Liu, Wan Kee Chock, Li Lin, et al. 2024. “BANKSY Unifies Cell Typing and Tissue Domain Segmentation for Scalable Spatial Omics Data Analysis.” Nature Genetics 56 (3): 431–41. https://doi.org/10.1038/s41588-024-01664-3.

Wei, Xiaoyu, Sulei Fu, Hanbo Li, Yang Liu, Shuai Wang, Weimin Feng, Yunzhi Yang, et al. 2022. “Single-Cell Stereo-Seq Reveals Induced Progenitor Cells Involved in Axolotl Brain Regeneration.” Science 377 (6610): eabp9444. https://doi.org/10.1126/science.abp9444.

Zhao, Edward, Matthew R Stone, Xing Ren, Jamie Guenthoer, Kimberly S Smythe, Thomas Pulliam, Stephen R Williams, et al. 2021. “Spatial Transcriptomics at Subspot Resolution with BayesSpace.” Nature Biotechnology 39 (11): 1375–84. https://doi.org/10.1038/s41587-021-00935-2.

Differential Spatial Pattern between conditions

03/19/2025

Contents

1 Introduction

2 Load packages

3 Data

3.1 Input data

3.2 Quality control/filtering

3.3 Clustering

3.3.1 Manual annotation

3.3.2 Spatially resolved (multi-sample) clustering

3.3.2.1 Single sample clustering

4 DSP testing

4.1 Gene-level test

4.2 Individual cluster test

4.2.1 Visualization

4.3 Smooth splines to model time

4.3.1 Visualization

5 Session info

References