Contents

This package allows the user to decide to load data from single-cell level spatial transcriptomics technologies, such as Xenium, CosMx, or MERSCOPE, as either SpatialExperiment (SPE) or SingleCellExperiment (SCE) object.

The only difference between the two object types are where to store the spatial coordinates. For the current version of SpatialExperiment, the spatialCoords(spe) are stored in a separate slot other than colData(spe). On the other hand, SingleCellExperiment stores the spatialCoords() inside of colData(spe).

After reading in the data, we need to look at the landscape of other downstream analysis tools. For example, library(BayesSpace) is a clustering tool developed for spatial transcriptomics data, but it only takes a SCE object and looks for the spatial coordinates in the colData(sce). Other spatial visualization packages, such as library(ggspavis) in its newest version, is compatible with both SPE and SCE objects.

Therefore, to avoid the pain of object conversion, we give the flexibility to let the user decide what object type to return.

1 Setup

library(SpatialExperimentIO)
library(SpatialExperiment)
# library(ggplot2)

2 Quick Start

# ### DO NOT RUN. Example code.
# xepath <- "/path/to/folder"
# # a. Xenium as SPE
# xe_spe <- readXeniumSXE(dirname = xepath)
# #    Subset to no control genes
# xe_spe <- xe_spe[rowData(xe_spe)$Type == "Gene Expression"]
# 
# # b. Xenium as SCE
# xe_sce <- readXeniumSXE(dirname = xepath, return_type = "SCE")
# #    Subset to no control genes
# xe_sce <- xe_sce[rowData(xe_sce)$Type == "Gene Expression"]
# ### DO NOT RUN. Example code.
# cospath <- "/path/to/folder"
# # a. CosMx as SPE
# cos_spe <- readCosmxSXE(dirname = cospath)
# 
# # b. CosMx as SCE
# cos_sce <- readCosmxSXE(dirname = cospath, return_type = "SCE")
# ### DO NOT RUN. Example code.
# merpath <- "/path/to/folder"
# # a. MERSCOPE as SPE
# mer_spe <- readMerscopeSXE(dirname = merpath)
# 
# # b. MERSCOPE as SCE
# mer_sce <- readMerscopeSXE(dirname = merpath, return_type = "SCE")

That is pretty much all you need. To learn more details, please read the below sections for each technology.

3 Xenium

Xenium is an imaging-based in-situ sequencing technology developed by 10x Genomics. Compared to the full transcriptome coverage sequencing-based technology Visium, Xenium allows for transcript-level resolution count detection but with less genes. The transcripts are segmented into single cells and SpatialExperimentIO returns the cell-level SPE or SCE object. To read more about Xenium technology workflow, please refer to the Xenium technology overview. For more publicly available Xenium data, please refer to Xenium data download.

The object constructor assumes the downloaded unzipped Xenium Output Bundle contains the mandatory file of cells.csv.gz and either a folder /cell_feature_matrix or a .h5 file cell_feature_matrix.h5.

xepath <- system.file(
  file.path("extdata", "Xenium_small"),
  package = "SpatialExperimentIO")

list.files(xepath)
## [1] "cell_feature_matrix.h5" "cells.csv.gz"

3.1 Read Xenium as a SpatialExperiment object

We commented out the default specification of each variable in readXeniumSXE(). To read in Xenium as a SpatialExperiment object, you would only need to provide a valid directory name.

# # read the count matrix .h5 file - automatically DropletUtils::read10xCounts(type = "HDF5")
# xe_spe <- readXeniumSXE(dirname = xepath, 
#                         return_type = "SPE",
#                         countmatfpattern = "cell_feature_matrix.h5", 
#                         metadatafpattern = "cells.csv.gz", 
#                         coord_names = c("x_centroid", "y_centroid"))

xe_spe <- readXeniumSXE(dirname = xepath)
xe_spe
## class: SpatialExperiment 
## dim: 4 6 
## metadata(0):
## assays(1): counts
## rownames(4): ENSG00000121270 ENSG00000107796 ENSG00000163017
##   ENSG00000168615
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(9): X cell_id ... nucleus_area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x_centroid y_centroid
## imgData names(0):

Additionally, Xenium gives four types of genes in their gene panel (check with table(rowData(xe_spe)$Type) to see). You should subset to gene types that are "Gene Expression" to focus on non-control genes for downstream analysis. In this example, we obtain a Xenium dataset with 4 genes all with type of "Gene Expression" and 6 cells just for illustration.

# Subset to no control genes                         
xe_spe <- xe_spe[rowData(xe_spe)$Type == "Gene Expression"]
xe_spe
## class: SpatialExperiment 
## dim: 4 6 
## metadata(0):
## assays(1): counts
## rownames(4): ENSG00000121270 ENSG00000107796 ENSG00000163017
##   ENSG00000168615
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(9): X cell_id ... nucleus_area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x_centroid y_centroid
## imgData names(0):

If you do not have cell_feature_matrix.h5 but the folder /cell_feature_matrix instead, it should contain the following files.

# list.files(file.path(xepath, "cell_feature_matrix"))
# "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"

For this example, we only provide cell_feature_matrix.h5 for demonstration. However, alternatively you can read in Xenium by specifying countmatfpattern as the folder "cell_feature_matrix". You should also subset to "Gene Expression" gene type like previously.

# # or read the count matrix folder - automatically DropletUtils::read10xCounts(type = "sparse") 
# xe_spe <- readXeniumSXE(dirname = xepath, 
#                         countmatfpattern = "cell_feature_matrix")
# # Subset to no control genes                         
# xe_spe <- xe_spe[rowData(xe_spe)$Type == "Gene Expression"]
# xe_spe

3.2 Read Xenium as a SingleCellExperiment object

Instead, if you are interested in storing the spatialCoords() columns in colData and read Xenium in as a SingleCellExperiment, you need to change readXeniumSXE(return_type = ) to "SCE". It is also required to subset to "Gene Expression" gene type. We end up with an SCE object with 248 genes.

xe_sce <- readXeniumSXE(dirname = xepath, return_type = "SCE")
xe_sce <- xe_sce[rowData(xe_sce)$Type == "Gene Expression"]
xe_sce
## class: SingleCellExperiment 
## dim: 4 6 
## metadata(0):
## assays(1): counts
## rownames(4): ENSG00000121270 ENSG00000107796 ENSG00000163017
##   ENSG00000168615
## rowData names(3): ID Symbol Type
## colnames(6): 1 2 ... 5 6
## colData names(10): X cell_id ... cell_area nucleus_area
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

This is a mock Xenium datasets with 4 genes by 6 cells. Some Xenium data set can have a dimension of 313 genes and around 110,000 cells in the Xenium human breast cancer data.

For more visualization tools for spatial transcriptomics downstream data analysis, including helpers for QC, marker gene expression level and clustering results on reduced dimensions or its spatial distribution, please refer to BiocManager::install("ggspavis").

4 CosMx

CosMx is an imaging-based in-situ sequencing technology by Nanostring. To read more about the CosMx technology workflow, please refer to the technology overview. For more publicly available data sets, please refer to the CosMx data download website.

The object constructor assumes the data download folder contains two mandatory files with exprMat_file.csv and metadata_file.csv in the names.

cospath <- system.file(
  file.path("extdata", "CosMx_small"),
  package = "SpatialExperimentIO")

list.files(cospath)
## [1] "lung_p9s1_exprMat_file.csv"  "lung_p9s1_metadata_file.csv"

4.1 Read CosMx as a SpatialExperiment object

We commented out the default specification of each variable in readCosmxSXE(). To read in CosMx as a SpatialExperiment object, you would only need to provide a valid directory name. With this example dataset, we obtained a CosMx SPE object with 8 genes and 9 cells.

# cos_spe <- readCosmxSXE(dirname = cospath,
#                         return_type = "SPE",
#                         countmatfpattern = "exprMat_file.csv",
#                         metadatafpattern = "metadata_file.csv",
#                         coord_names = c("CenterX_global_px",
#                                         "CenterY_global_px"))

cos_spe <- readCosmxSXE(dirname = cospath)
cos_spe
## class: SpatialExperiment 
## dim: 8 9 
## metadata(0):
## assays(1): counts
## rownames(8): AATK ABL1 ... ACKR3 ACKR4
## rowData names(0):
## colnames(9): 1 2 ... 8 9
## colData names(6): fov cell_ID ... CenterY_local_px sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(0):

4.2 Read CosMx as a SingleCellExperiment object

Alternatively, you can also read CosMx in as a SCE.

cos_sce <- readCosmxSXE(dirname = cospath, return_type = "SCE")
cos_sce
## class: SingleCellExperiment 
## dim: 8 9 
## metadata(0):
## assays(1): counts
## rownames(8): AATK ABL1 ... ACKR3 ACKR4
## rowData names(0):
## colnames(9): 1 2 ... 8 9
## colData names(7): fov cell_ID ... CenterX_global_px CenterY_global_px
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

In reality, a CosMx data set can have a dimension of 980 genes and around 100,000 cells for the human lung cancer data.

5 MERSCOPE

MERSCOPE integrated MERFISH spatial transcriptomics technology with high resolution spatial imaging, fluidics, image processing, and is a product by Vizgen. To understand more about the MERFISH technology behind MERSCOPE, please refer to the MERFISH Technology Overview. For more publicly available MERSCOPE data, please see MERSCOPE data download page.

The object constructor assumes the data download folder contains two mandatory files with cell_by_gene.csv and cell_metadata.csv in the names.

merpath <- system.file(
  file.path("extdata", "MERSCOPE_small"),
  package = "SpatialExperimentIO")

list.files(merpath)
## [1] "ovarian_p2s2_cell_by_gene.csv"  "ovarian_p2s2_cell_metadata.csv"

5.1 Read MERSCOPE as a SpatialExperiment object

We commented out the default specification of each variable in readMerscopeSXE(). To read in MERSCOPE as a SpatialExperiment object, you would only need to provide a valid directory name. With this example dataset, we obtained a MERSCOPE SPE object with 9 genes and 8 cells.

# mer_spe <- readMerscopeSXE(dirname = merpath, 
#                            return_type = "SPE",
#                            countmatfpattern = "cell_by_gene.csv", 
#                            metadatafpattern = "cell_metadata.csv", 
#                            coord_names = c("center_x", "center_y"))

mer_spe <- readMerscopeSXE(dirname = merpath)
mer_spe
## class: SpatialExperiment 
## dim: 9 8 
## metadata(0):
## assays(1): counts
## rownames(9): PDK4 CCL26 ... ICAM3 TBX21
## rowData names(0):
## colnames(8): 15590 15591 ... 15596 88870
## colData names(3): fov volume sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : center_x center_y
## imgData names(0):

5.2 Read MERSCOPE as a SingleCellExperiment object

Alternatively, you can also read MERSCOPE in as a SCE.

mer_sce <- readMerscopeSXE(dirname = merpath, return_type = "SCE")
mer_sce
## class: SingleCellExperiment 
## dim: 9 8 
## metadata(0):
## assays(1): counts
## rownames(9): PDK4 CCL26 ... ICAM3 TBX21
## rowData names(0):
## colnames(8): 15590 15591 ... 15596 88870
## colData names(4): fov volume center_x center_y
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

In reality, a MERSCOPE data set can have a dimension of 550 genes and around 250,000 cells for the human ovarian cancer data.

6 STARmap PLUS

STARmap PLUS is an imaging-based in-situ sequencing technology that has been introduced by Zeng et al.. The object constructor assumes the data download folder contains two mandatory files with raw_expression_pd.csv and spatial.csv in the names.

starpath <- system.file(
  file.path("extdata", "STARmapPLUS_small"),
  package = "SpatialExperimentIO")

list.files(starpath)
## [1] "mock_spatial.csv"          "mockraw_expression_pd.csv"

6.1 Read STARmap PLUS as a SpatialExperiment object

We comment out the default parameters for your reference. In this example dataset, we provide a sample with 8 genes and 9 cells just for illustration.

# readStarmapplusSXE <- function(dirname = dirname, 
#                                return_type = "SPE",
#                                countmatfpattern = "raw_expression_pd.csv", 
#                                metadatafpattern = "spatial.csv", 
#                                coord_names = c("X", "Y", "Z"))

star_spe <- readStarmapplusSXE(dirname = starpath)
star_spe
## class: SpatialExperiment 
## dim: 8 9 
## metadata(0):
## assays(1): counts
## rownames(8): A2M ABCC9 ... ADAMTS15 ADARB2
## rowData names(0):
## colnames(9): well05_0 well05_1 ... well05_7 well05_8
## colData names(7): NAME Main_molecular_cell_type ...
##   Molecular_spatial_cell_type sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(3) : X Y Z
## imgData names(0):

6.2 Read STARmap PLUS as a SingleCellExperiment object

Alternatively, you can also return a SingleCellExperiment object.

star_sce <- readStarmapplusSXE(dirname = starpath, return_type = "SCE")
star_sce
## class: SingleCellExperiment 
## dim: 8 9 
## metadata(0):
## assays(1): counts
## rownames(8): A2M ABCC9 ... ADAMTS15 ADARB2
## rowData names(0):
## colnames(9): well05_0 well05_1 ... well05_7 well05_8
## colData names(9): NAME X ... Sub_molecular_tissue_region
##   Molecular_spatial_cell_type
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

STARmap PLUS has a gene panel of around 1000 with up to millions of cells depending on the size of the tissue. There are 20 sample on mouse brain with tissue region annotated published by Shi et al.. Their data is avaiable to downloaded from Zenodo.

7 Session Info

sessionInfo()
## R Under development (unstable) (2024-03-18 r86148)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SpatialExperiment_1.13.2    SingleCellExperiment_1.25.1
##  [3] SummarizedExperiment_1.33.3 Biobase_2.63.1             
##  [5] GenomicRanges_1.55.4        GenomeInfoDb_1.39.13       
##  [7] IRanges_2.37.1              S4Vectors_0.41.6           
##  [9] BiocGenerics_0.49.1         MatrixGenerics_1.15.0      
## [11] matrixStats_1.2.0           SpatialExperimentIO_0.99.1 
## [13] BiocStyle_2.31.0           
## 
## loaded via a namespace (and not attached):
##  [1] rjson_0.2.21              xfun_0.43                
##  [3] bslib_0.7.0               rhdf5_2.47.6             
##  [5] lattice_0.22-6            rhdf5filters_1.15.4      
##  [7] tools_4.4.0               parallel_4.4.0           
##  [9] R.oo_1.26.0               Matrix_1.7-0             
## [11] sparseMatrixStats_1.15.0  dqrng_0.3.2              
## [13] lifecycle_1.0.4           GenomeInfoDbData_1.2.12  
## [15] compiler_4.4.0            statmod_1.5.0            
## [17] codetools_0.2-20          htmltools_0.5.8.1        
## [19] sass_0.4.9                yaml_2.3.8               
## [21] crayon_1.5.2              jquerylib_0.1.4          
## [23] R.utils_2.12.3            BiocParallel_1.37.1      
## [25] DelayedArray_0.29.9       cachem_1.0.8             
## [27] limma_3.59.6              magick_2.8.3             
## [29] abind_1.4-5               locfit_1.5-9.9           
## [31] digest_0.6.35             bookdown_0.38            
## [33] fastmap_1.1.1             grid_4.4.0               
## [35] cli_3.6.2                 SparseArray_1.3.4        
## [37] magrittr_2.0.3            S4Arrays_1.3.6           
## [39] edgeR_4.1.21              DelayedMatrixStats_1.25.1
## [41] UCSC.utils_0.99.5         rmarkdown_2.26           
## [43] XVector_0.43.1            httr_1.4.7               
## [45] DropletUtils_1.23.1       R.methodsS3_1.8.2        
## [47] beachmat_2.19.4           HDF5Array_1.31.6         
## [49] evaluate_0.23             knitr_1.46               
## [51] rlang_1.1.3               Rcpp_1.0.12              
## [53] scuttle_1.13.1            formatR_1.14             
## [55] BiocManager_1.30.22       jsonlite_1.8.8           
## [57] R6_2.5.1                  Rhdf5lib_1.25.3          
## [59] zlibbioc_1.49.3