--- title: "Building SpatialExperiment objects" author: "Dario Righelli" date: "`r format(Sys.Date(), '%b %m, %Y')`" output: BiocStyle::html_document: toc: true vignette: > %\VignetteIndexEntry{Building SpatialExperiment object} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set(cache = TRUE, autodep = TRUE, cache.lazy = FALSE) ``` # Installation The `SpatialExperiment` package is available via Bioconductor. ```{r, eval=FALSE} if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SpatialExperiment") ``` Load the package as follows: ```{r message = FALSE, warning = FALSE} library(SpatialExperiment) ``` # The SpatialExperiment class The `SpatialExperiment` class is designed to represent spatially resolved transcriptomics (ST) data. It inherits from the `SingleCellExperiment` class and is used in the same manner. In addition, the class supports storage of spatial information via `spatialData` and `spatialCoords`, and storage of images via `imgData`. # spatialData and spatialCoords The `SpatialExperiment` class constructor is defined with several arguments to provide maximum flexibility to the user. In particular, we distinguish between `spatialData` and `spatialCoords` as follows: - `spatialData` expects a `DataFrame` with all the data associated with the spatial information (optionally including spatial coordinates from `spatialCoords`). - `spatialCoords` is a numeric `matrix` containing only the defined (via `spatialCoordsNames`) spatial coordinates. When building a `SpatialExperiment` object, the columns of spatial coordinates from the `spatialData` `DFrame` are identified via the `spatialCoordsNames` argument and stored separately as a numeric `matrix` within `spatialCoords`. Following is an example of `spatialCoords` built via `spatialData` and `spatialCoordsNames`. ```{r} cd <- DataFrame(x=1:26, y=1:26, z=letters) mat <- matrix(nrow=26, ncol=26) spe <- SpatialExperiment(assay=mat, spatialData=cd, spatialCoordsNames=c("x", "y")) head(spatialCoords(spe)) spatialCoordsNames(spe) head(spatialData(spe)) spatialDataNames(spe) head(colData(spe)) ``` It is also possible to display the combined spatial information with `spatialData()` using the `spatialCoords=TRUE` argument: ```{r} spatialData(spe, spatialCoords=TRUE) ``` Alternatively, it is possible to define the `spatialDataNames` to define the `spatialData` `DFrame` from the columns of `colData`. ```{r} cd <- DataFrame(x=1:26, y=1:26, z=letters) mat <- matrix(nrow=26, ncol=26) spe <- SpatialExperiment( assay=mat, colData=cd, spatialCoordsNames=c("x", "y"), spatialDataNames="z") head(spatialData(spe)) head(spatialCoords(spe)) head(colData(spe)) ``` Also, it is possible to load a numeric `matrix` of coordinates with the `spatialCoords` argument. ```{r} y <- diag(n <- 10) mat <- matrix(0, n, m <- 2) spe <- SpatialExperiment(assays = y, spatialCoords = mat) ``` Finally, it is possible to set `spatialData`, `spatialCoords`, and `colData` separately. ```{r} mat <- as.matrix(cd[,1:2]) colnames(mat) <- c("ecs","uai") spad <- DataFrame(a=1:26, b=1:26, z=letters) asy <- matrix(nrow=26, ncol=26) spe <- SpatialExperiment(assays = asy, spatialCoords = mat, spatialData=spad, colData=cd) head(spatialData(spe)) head(spatialCoords(spe)) head(colData(spe)) ``` # Working with multiple samples To work with multiple samples, the `SpatialExperiment` class provides the `cbind` method, which assumes unique `sample_id`(s) are provided for each sample. In case the `sample_id`(s) are duplicated across multiple samples, the `cbind` method takes care of this by appending indices to create unique sample identifiers. ```{r} spe1 <- spe2 <- spe spe3 <- cbind(spe1, spe2) unique(spe3$sample_id) ``` Otherwise, it is possible to create unique `sample_id`(s) as follows. ```{r} # make sample identifiers unique spe1 <- spe2 <- spe spe1$sample_id <- paste(spe1$sample_id, "sample1", sep = ".") spe2$sample_id <- paste(spe2$sample_id, "sample2", sep = ".") # combine into single object spe3 <- cbind(spe1, spe2) spe3 ``` # Subsetting a SpatialExperiment object Subsetting objects is automatically defined to synchronize across all attributes of the objects, as for any other Bioconductor *Experiment* class. For example, it is possible to `subset` by `sample_id` as follows: ```{r} spe3[, colData(spe)$sample_id=="sample1.sample1"] ``` # sample_id requires one-to-one mapping replacement In particular, when trying to replace the `sample_id`(s) of a `SpatialExperiment` object, these must map uniquely with the already existing ones, otherwise an error is returned. ```{r, error=TRUE} new <- spe3$sample_id; new[1] <- "sample1.sample2" spe3$sample_id <- new new[1] <- "third.one.of.two" spe3$sample_id <- new ``` # Spot-based ST data (e.g. 10x Genomics Visium) When working with spot-based ST data, such as *10x Genomics Visium* or other platforms providing images, it is possible to store the image information in the dedicated `imgData` structure. Also, the `SpatialExperiment` class stores a `sample_id` value in the `spatialData` structure, which is possible to set with the `sample_id` argument (default is "sample_01"). Here we show how to load the default *Space Ranger* data files from a 10x Genomics Visium experiment, and build a `SpatialExperiment` object. In particular, the `readImgData()` function is used to build an `imgData` `DataFrame` to be passed to the `SpatialExperiment` constructor. The `sample_id` used to build the `imgData` object must be the same one used to build the `SpatialExperiment` object, otherwise an error is returned. ```{r} dir <- system.file( file.path("extdata", "10xVisium", "section1"), package = "SpatialExperiment") # read in counts fnm <- file.path(dir, "raw_feature_bc_matrix") sce <- DropletUtils::read10xCounts(fnm) # read in image data img <- readImgData( path = file.path(dir, "spatial"), sample_id="foo") # read in spatial coordinates fnm <- file.path(dir, "spatial", "tissue_positions_list.csv") xyz <- read.csv(fnm, header = FALSE, col.names = c( "barcode", "in_tissue", "array_row", "array_col", "pxl_row_in_fullres", "pxl_col_in_fullres")) # construct observation & feature metadata rd <- S4Vectors::DataFrame( symbol = rowData(sce)$Symbol) # construct 'SpatialExperiment' (spe <- SpatialExperiment( assays = list(counts = assay(sce)), colData = colData(sce), rowData = rd, imgData = img, spatialData=DataFrame(xyz), spatialCoordsNames=c("pxl_col_in_fullres", "pxl_row_in_fullres"), sample_id="foo")) ``` Alternatively, the `read10xVisium()` function facilitates the import of *10x Genomics Visium* data to handle one or more samples organized in folders reflecting the default *Space Ranger* folder tree organization: sample
|—outs
· · |—raw/filtered_feature_bc_matrix.h5
· · |—raw/filtered_feature_bc_matrix
· · · · |—barcodes.tsv
· · · · |—features.tsv
· · · · |—matrix.mtx
· · |—spatial
· · · · |—scalefactors_json.json
· · · · |—tissue_lowres_image.png
· · · · |—tissue_positions_list.csv
```{r} dir <- system.file( file.path("extdata", "10xVisium"), package = "SpatialExperiment") sample_ids <- c("section1", "section2") samples <- file.path(dir, sample_ids) (spe <- read10xVisium(samples, sample_ids, type = "sparse", data = "raw", images = "lowres", load = FALSE)) ``` # Molecule-based ST data To demonstrate how to accommodate molecule-based ST data (e.g. *seqFISH* platform) inside a `SpatialExperiment` object, we generate some mock data of `r n = 1000` molecule coordinates across `r ng = 50` and `r nc = 20` cells. These should be formatted into a `data.frame` where each row corresponds to a molecule, and columns specify the xy-position as well as which gene/cell the molecule has been assigned to: ```{r message = FALSE, warning = FALSE} # sample xy-coordinates in [0,1] x <- runif(n) y <- runif(n) # assign each molecule to some gene-cell pair gs <- paste0("gene", seq(ng)) cs <- paste0("cell", seq(nc)) gene <- sample(gs, n, TRUE) cell <- sample(cs, n, TRUE) # construct data.frame of molecule coodinates df <- data.frame(gene, cell, x, y) head(df) ``` Next, it is possible to re-shape the above table into a `r BiocStyle::Biocpkg("BumpyMatrix")` using `splitAsBumpyMatrix()`, which takes as input the xy-coordinates, as well as arguments specifying the row and column index of each observation: ```{r message = FALSE, warning = FALSE} # (assure gene & cell are factor so that # missing observations aren't dropped) df$gene <- factor(df$gene, gs) df$cell <- factor(df$cell, cs) # construct BumpyMatrix library(BumpyMatrix) mol <- splitAsBumpyMatrix( df[, c("x", "y")], row = gs, col = cs) ``` Finally, it is possible to construct a `SpatialExperiment` object with two data slots: - The `counts` assay stores the number of molecules per gene and cell (equivalent to transcript counts in spot-based data). - The `molecules` assay holds the spatial molecule positions (xy-coordinates). Here, each entry is a `DFrame` that contains the positions of all molecules from a given gene that have been assigned to a given cell. ```{r message = FALSE, warning = FALSE} # get count matrix y <- with(df, table(gene, cell)) y <- as.matrix(unclass(y)) y[1:5, 1:5] # construct SpatialExperiment spe <- SpatialExperiment( assays = list( counts = y, molecules = mol)) spe ``` The `BumpyMatrix` of molecule locations can be accessed using the dedicated `molecules()` accessor: ```{r message = FALSE, warning = FALSE} molecules(spe) ``` # Session Info ```{r tidy=TRUE} sessionInfo() ```