Contents

1 Introduction

The STexampleData package contains a collection of spatially resolved transcriptomics (SRT) datasets, which have been formatted into the SpatialExperiment Bioconductor class, for use in examples, demonstrations, and tutorials. The datasets are from several different SRT platforms and have been sourced from various publicly available sources. Some of the datasets include images and/or ground truth annotation labels.

2 Installation

To install the STexampleData package from Bioconductor:

install.packages("BiocManager")
BiocManager::install("STexampleData")

Alternatively, the latest version can also be installed from GitHub:

install.packages("remotes")
remotes::install_github("lmweber/STexampleData", build_vignettes = TRUE)

3 Datasets

The package contains the following datasets:

4 Load data

The following examples show how to load the example datasets as SpatialExperiment objects in an R session.

There are two options for loading the datasets: either using named accessor functions or by querying the ExperimentHub database.

4.1 Load using named accessors

library(SpatialExperiment)
library(STexampleData)

4.1.1 Visium_humanDLPFC

# load object
spe <- Visium_humanDLPFC()

# check object
spe
## class: SpatialExperiment 
## dim: 33538 4992 
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##   ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(7): barcode_id sample_id ... ground_truth cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
dim(spe)
## [1] 33538  4992
assayNames(spe)
## [1] "counts"
rowData(spe)
## DataFrame with 33538 rows and 3 columns
##                         gene_id   gene_name    feature_type
##                     <character> <character>     <character>
## ENSG00000243485 ENSG00000243485 MIR1302-2HG Gene Expression
## ENSG00000237613 ENSG00000237613     FAM138A Gene Expression
## ENSG00000186092 ENSG00000186092       OR4F5 Gene Expression
## ENSG00000238009 ENSG00000238009  AL627309.1 Gene Expression
## ENSG00000239945 ENSG00000239945  AL627309.3 Gene Expression
## ...                         ...         ...             ...
## ENSG00000277856 ENSG00000277856  AC233755.2 Gene Expression
## ENSG00000275063 ENSG00000275063  AC233755.1 Gene Expression
## ENSG00000271254 ENSG00000271254  AC240274.1 Gene Expression
## ENSG00000277475 ENSG00000277475  AC213203.1 Gene Expression
## ENSG00000268674 ENSG00000268674     FAM231C Gene Expression
colData(spe)
## DataFrame with 4992 rows and 7 columns
##                            barcode_id     sample_id in_tissue array_row
##                           <character>   <character> <integer> <integer>
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673         0         0
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673         1        50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673         1         3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673         1        59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673         1        14
## ...                               ...           ...       ...       ...
## TTGTTTCACATCCAGG-1 TTGTTTCACATCCAGG-1 sample_151673         1        58
## TTGTTTCATTAGTCTA-1 TTGTTTCATTAGTCTA-1 sample_151673         1        60
## TTGTTTCCATACAACT-1 TTGTTTCCATACAACT-1 sample_151673         1        45
## TTGTTTGTATTACACG-1 TTGTTTGTATTACACG-1 sample_151673         1        73
## TTGTTTGTGTAAATTC-1 TTGTTTGTGTAAATTC-1 sample_151673         1         7
##                    array_col ground_truth cell_count
##                    <integer>  <character>  <integer>
## AAACAACGAATAGTTC-1        16           NA         NA
## AAACAAGTATCTCCCA-1       102       Layer3          6
## AAACAATCTACTAGCA-1        43       Layer1         16
## AAACACCAATAACTGC-1        19           WM          5
## AAACAGAGCGACTCCT-1        94       Layer3          2
## ...                      ...          ...        ...
## TTGTTTCACATCCAGG-1        42           WM          3
## TTGTTTCATTAGTCTA-1        30           WM          4
## TTGTTTCCATACAACT-1        27       Layer6          3
## TTGTTTGTATTACACG-1        41           WM         16
## TTGTTTGTGTAAATTC-1        51       Layer2          5
head(spatialCoords(spe))
##                    pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1               3913               2435
## AAACAAGTATCTCCCA-1               9791               8468
## AAACAATCTACTAGCA-1               5769               2807
## AAACACCAATAACTGC-1               4068               9505
## AAACAGAGCGACTCCT-1               9271               4151
## AAACAGCTTTCAGAAG-1               3393               7583
imgData(spe)
## DataFrame with 2 rows and 4 columns
##       sample_id    image_id   data scaleFactor
##     <character> <character> <list>   <numeric>
## 1 sample_151673      lowres   ####   0.0450045
## 2 sample_151673       hires   ####   0.1500150

4.1.2 Visium_mouseCoronal

# load object
spe <- Visium_mouseCoronal()
# check object
spe
## class: SpatialExperiment 
## dim: 32285 4992 
## metadata(0):
## assays(1): counts
## rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095019 ENSMUSG00000095041
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(5): barcode_id sample_id in_tissue array_row array_col
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor

4.1.3 seqFISH_mouseEmbryo

# load object
spe <- seqFISH_mouseEmbryo()
# check object
spe
## class: SpatialExperiment 
## dim: 351 11026 
## metadata(0):
## assays(2): counts molecules
## rownames(351): Abcc4 Acp5 ... Zfp57 Zic3
## rowData names(1): gene_name
## colnames(11026): embryo1_Pos0_cell10_z2 embryo1_Pos0_cell100_z2 ...
##   embryo1_Pos28_cell97_z2 embryo1_Pos28_cell98_z2
## colData names(14): cell_id embryo ... segmentation_vertices sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x y
## imgData names(0):

4.1.4 ST_mouseOB

# load object
spe <- ST_mouseOB()
# check object
spe
## class: SpatialExperiment 
## dim: 15928 262 
## metadata(0):
## assays(1): counts
## rownames(15928): 0610007N19Rik 0610007P14Rik ... Zzef1 Zzz3
## rowData names(1): gene_name
## colnames(262): ACAACTATGGGTTGGCGG ACACAGATCCTGTTCTGA ...
##   TTTCAACCCGAGGAAGTC TTTCTAACTCATAAGGAT
## colData names(3): barcode_id sample_id layer
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x y
## imgData names(0):

4.1.5 SlideSeqV2_mouseHPC

# load object
spe <- SlideSeqV2_mouseHPC()
# check object
spe
## class: SpatialExperiment 
## dim: 23264 53208 
## metadata(0):
## assays(1): counts
## rownames(23264): 0610005C13Rik 0610007P14Rik ... n-R5s40 n-R5s95
## rowData names(1): gene_name
## colnames(53208): AACGTCATAATCGT TACTTTAGCGCAGT ... GACTTTTCTTAAAG
##   GTCAATAAAGGGCG
## colData names(3): barcode_id sample_id celltype
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : xcoord ycoord
## imgData names(0):

4.2 Load by querying ExperimentHub database

library(ExperimentHub)
# create ExperimentHub instance
eh <- ExperimentHub()

# query STexampleData datasets
myfiles <- query(eh, "STexampleData")
myfiles
## ExperimentHub with 5 records
## # snapshotDate(): 2022-04-26
## # $dataprovider: NA
## # $species: Mus musculus, Homo sapiens
## # $rdataclass: SpatialExperiment
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH7538"]]' 
## 
##            title              
##   EH7538 | Visium_humanDLPFC  
##   EH7539 | Visium_mouseCoronal
##   EH7540 | seqFISH_mouseEmbryo
##   EH7541 | ST_mouseOB         
##   EH7542 | SlideSeqV2_mouseHPC
# metadata
md <- as.data.frame(mcols(myfiles))
# load Visium_humanDLPFC dataset using ExperimentHub query
spe <- myfiles[[1]]
spe
## class: SpatialExperiment 
## dim: 33538 4992 
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
##   ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(7): barcode_id sample_id ... ground_truth cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor

5 Generating objects from raw data files

For reference, we include code scripts to generate the SpatialExperiment objects from the raw data files.

These scripts are saved in /inst/scripts/ in the source code of the STexampleData package. The scripts include references and links to the data files from the original sources for each dataset.

6 Session information

sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BumpyMatrix_1.4.0           STexampleData_1.4.5        
##  [3] ExperimentHub_2.4.0         AnnotationHub_3.4.0        
##  [5] BiocFileCache_2.4.0         dbplyr_2.1.1               
##  [7] SpatialExperiment_1.6.0     SingleCellExperiment_1.18.0
##  [9] SummarizedExperiment_1.26.1 Biobase_2.56.0             
## [11] GenomicRanges_1.48.0        GenomeInfoDb_1.32.2        
## [13] IRanges_2.30.0              S4Vectors_0.34.0           
## [15] BiocGenerics_0.42.0         MatrixGenerics_1.8.0       
## [17] matrixStats_0.62.0          BiocStyle_2.24.0           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7                  bit64_4.0.5                  
##  [3] filelock_1.0.2                httr_1.4.3                   
##  [5] tools_4.2.0                   bslib_0.3.1                  
##  [7] utf8_1.2.2                    R6_2.5.1                     
##  [9] HDF5Array_1.24.0              DBI_1.1.2                    
## [11] rhdf5filters_1.8.0            withr_2.5.0                  
## [13] tidyselect_1.1.2              bit_4.0.4                    
## [15] curl_4.3.2                    compiler_4.2.0               
## [17] cli_3.3.0                     DelayedArray_0.22.0          
## [19] bookdown_0.26                 sass_0.4.1                   
## [21] rappdirs_0.3.3                stringr_1.4.0                
## [23] digest_0.6.29                 rmarkdown_2.14               
## [25] R.utils_2.11.0                XVector_0.36.0               
## [27] pkgconfig_2.0.3               htmltools_0.5.2              
## [29] sparseMatrixStats_1.8.0       fastmap_1.1.0                
## [31] limma_3.52.1                  rlang_1.0.2                  
## [33] RSQLite_2.2.14                shiny_1.7.1                  
## [35] DelayedMatrixStats_1.18.0     jquerylib_0.1.4              
## [37] generics_0.1.2                jsonlite_1.8.0               
## [39] BiocParallel_1.30.2           dplyr_1.0.9                  
## [41] R.oo_1.24.0                   RCurl_1.98-1.6               
## [43] magrittr_2.0.3                GenomeInfoDbData_1.2.8       
## [45] scuttle_1.6.2                 Matrix_1.4-1                 
## [47] Rcpp_1.0.8.3                  Rhdf5lib_1.18.2              
## [49] fansi_1.0.3                   lifecycle_1.0.1              
## [51] R.methodsS3_1.8.1             stringi_1.7.6                
## [53] yaml_2.3.5                    edgeR_3.38.1                 
## [55] zlibbioc_1.42.0               rhdf5_2.40.0                 
## [57] grid_4.2.0                    blob_1.2.3                   
## [59] promises_1.2.0.1              parallel_4.2.0               
## [61] dqrng_0.3.0                   crayon_1.5.1                 
## [63] lattice_0.20-45               Biostrings_2.64.0            
## [65] beachmat_2.12.0               KEGGREST_1.36.0              
## [67] locfit_1.5-9.5                magick_2.7.3                 
## [69] knitr_1.39                    pillar_1.7.0                 
## [71] rjson_0.2.21                  glue_1.6.2                   
## [73] BiocVersion_3.15.2            evaluate_0.15                
## [75] BiocManager_1.30.17           png_0.1-7                    
## [77] httpuv_1.6.5                  vctrs_0.4.1                  
## [79] purrr_0.3.4                   assertthat_0.2.1             
## [81] cachem_1.0.6                  xfun_0.31                    
## [83] mime_0.12                     DropletUtils_1.16.0          
## [85] xtable_1.8-4                  later_1.3.0                  
## [87] tibble_3.1.7                  AnnotationDbi_1.58.0         
## [89] memoise_2.0.1                 ellipsis_0.3.2               
## [91] interactiveDisplayBase_1.34.0