--- title: "Introduction to the SpatialFeatureExperiment class" author: "Lambda Moses, Lior Pachter" date: "`r format(Sys.Date(), '%b %d, %Y')`" output: BiocStyle::html_document: toc: true number_sections: true toc_depth: 3 toc_float: collapsed: true vignette: > %\VignetteIndexEntry{Introduction to the SpatialFeatureExperiment class} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Installation This package can be installed from Bioconductor: ```{r, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("SpatialFeatureExperiment") ``` # Class structure ## Introduction `SpatialFeatureExperiment` (SFE) is a new [S4](http://adv-r.had.co.nz/S4.html) class built on top of [`SpatialExperiment`](https://bioconductor.org/packages/release/bioc/html/SpatialExperiment.html) (SPE). `SpatialFeatureExperiment` incorporates geometries and geometry operations with the [`sf`](https://cran.r-project.org/web/packages/sf/index.html) package. Examples of supported geometries are Visium spots represented with polygons corresponding to their size, cell or nuclei segmentation polygons, tissue boundary polygons, pathologist annotation of histological regions, and transcript spots of genes. Using `sf`, `SpatialFeatureExperiment` leverages the GEOS C++ libraries underlying `sf` for geometry operations, including algorithms for for determining whether geometries intersect, finding intersection geometries, buffering geometries with margins, etc. A schematic of the SFE object is shown below: ```{r, echo=FALSE, out.width = "100%", fig.cap="Schematics of the SFE object", fig.alt="SpatialFeatureExperiment expands on SpatialExperiment by adding column, row, and annotation geometries and spatial graphs. This is explained in detail in the following paragraphs."} knitr::include_graphics("sfe_schematics.png") ``` Below is a list of SFE features that extend the SPE object: * `colGeometries` are `sf` data frames associated with the entities that correspond to columns of the gene count matrix, such as Visium spots or cells. The geometries in the `sf` data frames can be Visium spot centroids, Visium spot polygons, or for datasets with single cell resolution, cell or nuclei segmentations. Multiple `colGeometries` can be stored in the same SFE object, such as one for cell segmentation and another for nuclei segmentation. There can be non-spatial, attribute columns in a `colGeometry` rather than `colData`, because the `sf` class allows users to specify how attributes relate to geometries, such as "constant", "aggregate", and "identity". See the `agr` argument of the [`st_sf` documentation](https://r-spatial.github.io/sf/reference/sf.html). * `colGraphs` are spatial neighborhood graphs of cells or spots. The graphs have class `listw` (`spdep` package), and the `colPairs` of `SingleCellExperiment` was not used so no conversion is necessary to use the numerous spatial dependency functions from `spdep`, such as those for Moran's I, Geary's C, Getis-Ord Gi*, LOSH, etc. Conversion is also not needed for other classical spatial statistics packages such as `spatialreg` and `adespatial`. * `rowGeometries` are similar to `colGeometries`, but support entities that correspond to rows of the gene count matrix, such as genes. A potential use case is to store transcript spots for each gene in smFISH or in situ sequencing based datasets. * `rowGraphs` are similar to `colGraphs`. A potential use case may be spatial colocalization of transcripts of different genes. * `annotGeometries` are `sf` data frames associated with the dataset but not directly with the gene count matrix, such as tissue boundaries, histological regions, cell or nuclei segmentation in Visium datasets, and etc. These geometries are stored in this object to facilitate plotting and using `sf` for operations such as to find the number of nuclei in each Visium spot and which histological regions each Visium spot intersects. Unlike `colGeometries` and `rowGeometries`, the number of rows in the `sf` data frames in `annotGeometries` is not constrained by the dimension of the gene count matrix and can be arbitrary. * `annotGraphs` are similar to `colGraphs` and `rowGraphs`, but are for entities not directly associated with the gene count matrix, such as spatial neighborhood graphs for nuclei in Visium datasets, or other objects like myofibers. These graphs are relevant to `spdep` analyses of attributes of these geometries such as spatial autocorrelation in morphological metrics of myofibers and nuclei. With geometry operations with `sf`, these attributes and results of analyses of these attributes (e.g. spatial regions defined by the attributes) may be related back to gene expression. * `localResults` are similar to [`reducedDims` in `SingleCellExperiment`](https://bioconductor.org/packages/release/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html#3_Adding_low-dimensional_representations), but stores results from univariate and bivariate local spatial analysis results, such as from [`localmoran`](https://r-spatial.github.io/spdep/reference/localmoran.html), [Getis-Ord Gi\*](https://r-spatial.github.io/spdep/reference/localG.html), and [local spatial heteroscedasticity (LOSH)](https://r-spatial.github.io/spdep/reference/LOSH.html). Unlike in `reducedDims`, for each type of results (type is the type of analysis such as Getis-Ord Gi\*), each feature (e.g. gene) or pair of features for which the analysis is performed has its own results. The local spatial analyses can also be performed for attributes of `colGeometries` and `annotGeometries` in addition to gene expression and `colData`. Results of multivariate spatial analysis such as [MULTISPATI PCA](https://cran.r-project.org/web/packages/adespatial/vignettes/tutorial.html#multispati-analysis) can be stored in `reducedDims`. ```{r setup} library(SpatialFeatureExperiment) library(SpatialExperiment) library(SFEData) library(sf) library(Matrix) library(RBioFormats) ``` ```{r} # Example dataset (sfe <- McKellarMuscleData(dataset = "small")) ``` ## Geometries User interfaces to get or set the geometries and spatial graphs emulate those of `reducedDims` and `row/colPairs` in `SingleCellExperiment`. Column and row geometries also emulate `reducedDims` in internal implementation, while annotation geometries and spatial graphs differ. ### Column and row Column and row geometries can be get or set with the `dimGeometries()` or `dimGeometry()` function. The `MARGIN` argument is as in the `apply()` function: `MARGIN = 1` means row, and `MARGIN = 2` means column. `dimGeometry()` gets or sets one particular geometry by name of index. ```{r} # Get Visium spot polygons (spots <- dimGeometry(sfe, "spotPoly", MARGIN = 2)) ``` ```{r} plot(st_geometry(spots)) ``` ```{r} # Setter dimGeometry(sfe, "foobar", MARGIN = 2) <- spots ``` `dimGeometries()` gets or sets all geometry of the given margin. ```{r} # Getter, all geometries of one margin (cgs <- dimGeometries(sfe, MARGIN = 2)) ``` ```{r} # Setter, all geometries dimGeometries(sfe, MARGIN = 2) <- cgs ``` `dimGeometryNames()` gets or sets the names of the geometries ```{r} (cg_names <- dimGeometryNames(sfe, MARGIN = 2)) ``` ```{r} # Setter dimGeometryNames(sfe, MARGIN = 2) <- cg_names ``` `colGeometry(sfe, "spotPoly")`, `colGeometries(sfe)`, and `colGeometryNames(sfe)` are shorthands for `dimGeometry(sfe, "spotPoly", MARGIN = 2)`, `dimGeometries(sfe, MARGIN = 2)`, and `dimGeometryNames(sfe, MARGIN = 2)` respectively. Similarly, `rowGeometr*(sfe, ...)` is a shorthand of `dimGeometr*(sfe, ..., MARGIN = 1)`. There are shorthands for some specific column or row geometries. For example, `spotPoly(sfe)` is equivalent to `colGeometry(sfe, "spotPoly")` for Visium spot polygons, and `txSpots(sfe)` is equivalent to `rowGeometry(sfe, "txSpots")` for transcript spots in single molecule technologies. ```{r} # Getter (spots <- spotPoly(sfe)) ``` ```{r} # Setter spotPoly(sfe) <- spots ``` ### Annotation Annotation geometries can be get or set with `annotGeometries()` or `annotGeometry()`. In column or row geometries, the number of rows of the `sf` data frame (i.e. the number of geometries in the data frame) is constrained by the number of rows or columns of the gene count matrix respectively, because just like `rowData` and `colData`, each row of a `rowGeometry` or `colGeometry` `sf` data frame must correspond to a row or column of the gene count matrix respectively. In contrast, an `annotGeometry` `sf` data frame can have any dimension, not constrained by the dimension of the gene count matrix. Similar to column and row geometries, annotation geometries have `annotGeometry()`, `annotGeometries()`, and `annotGeometryNames()` getters and setters. ```{r} # Getter, by name or index (tb <- annotGeometry(sfe, "tissueBoundary")) ``` ```{r} plot(st_geometry(tb)) ``` ```{r} # Setter, by name or index annotGeometry(sfe, "tissueBoundary") <- tb ``` ```{r} # Get all annoation geometries as named list ags <- annotGeometries(sfe) ``` ```{r} # Set all annotation geometries with a named list annotGeometries(sfe) <- ags ``` ```{r} # Get names of annotation geometries (ag_names <- annotGeometryNames(sfe)) ``` ```{r} # Set names annotGeometryNames(sfe) <- ag_names ``` There are shorthands for specific annotation geometries. For example, `tissueBoundary(sfe)` is equivalent to `annotGeometry(sfe, "tissueBoundary")`. `cellSeg()` (cell segmentation) and `nucSeg()` (nuclei segmentation) would first query `colGeometries` (for single cell, single molecule technologies, equivalent to `colGeometry(sfe, "cellSeg")` or `colGeometry(sfe, "nucSeg")`), and if not found, they will query `annotGeometries` (for array capture and microdissection technologies, equivalent to `annotGeometry(sfe, "cellSeg")` or `annotGeometry(sfe, "nucSeg")`). ```{r} # Getter (tb <- tissueBoundary(sfe)) ``` ```{r} # Setter tissueBoundary(sfe) <- tb ``` ## Spatial graphs Column, row, and annotation spatial graphs can be get or set with `spatialGraphs()` and `spatialGraph()` functions. Similar to `dimGeometr*` functions, `spatialGraph*` functions have a `MARGIN` argument. However, since internally, row and column geometries are implemented very differently from annotation geometries, while row, column, and annotation graphs are implemented the same way, for the `spatialGraph*` functions, `MARGIN = 1` means rows, `MARGIN = 2` means columns, and `MARGIN = 3` means annotation. Similar to `dimGeometry*` functions, there are `rowGraph*`, `colGraph*`, and `annotGraph*` getter and setter functions for each margin. This package wraps functions in the `spdep` package to find spatial neighborhood graphs. In this example, triangulation is used to find the spatial graph; many other methods are also supported, such as k nearest neighbors, distance based neighbors, and polygon contiguity. ```{r} (g <- findSpatialNeighbors(sfe, MARGIN = 2, method = "tri2nb")) ``` ```{r} plot(g, coords = spatialCoords(sfe)) ``` ```{r} # Set graph by name spatialGraph(sfe, "graph1", MARGIN = 2) <- g # Or equivalently colGraph(sfe, "graph1") <- g ``` ```{r} # Get graph by name g <- spatialGraph(sfe, "graph1", MARGIN = 2L) # Or equivalently g <- colGraph(sfe, "graph1") g ``` For Visium, spatial neighborhood graph of the hexagonal grid can be found with the known locations of the barcodes. ```{r} colGraph(sfe, "visium") <- findVisiumGraph(sfe) ``` ```{r} plot(colGraph(sfe, "visium"), coords = spatialCoords(sfe)) ``` All graphs of the SFE object, or if specified, of the margin of interest, can be get or set with `spatialGraphs()` and the margin specific wrappers. ```{r} colGraphs(sfe) ``` Similar to `dimGeometries()`, the graphs have `spatialGraphNames()` getter and setter and the margin specific wrappers. ```{r} colGraphNames(sfe) ``` ## Multiple samples Thus far, the example dataset used only has one sample. The `SpatialExperiment` (SPE) object has a special column in `colData` called `sample_id`, so data from multiple tissue sections can coexist in the same SPE object for joint dimension reduction and clustering while keeping the spatial coordinates separate. It's important to keep spatial coordinates of different tissue sections separate because first, the coordinates would only make sense within the same section, and second, the coordinates from different sections can have overlapping numeric values. SFE inherits from SPE, and with geometries and spatial graphs, `sample_id` is even more important. The geometry and graph getter and setter functions have a `sample_id` argument, which is optional when only one sample is present in the SFE object. This argument is mandatory if multiple samples are present, and can be a character vector for multiple samples or "all" for all samples. Below are examples of using the getters and setters for multiple samples. ```{r} # Construct toy dataset with 2 samples sfe1 <- McKellarMuscleData(dataset = "small") sfe2 <- McKellarMuscleData(dataset = "small2") spotPoly(sfe2)$sample_id <- "sample02" (sfe_combined <- cbind(sfe1, sfe2)) ``` Use the `sampleIDs` function to see the names of all samples ```{r} sampleIDs(sfe_combined) ``` ```{r} # Only get the geometries for the second sample (spots2 <- colGeometry(sfe_combined, "spotPoly", sample_id = "sample02")) ``` ```{r} # Only set the geometries for the second sample # Leaving geometries of the first sample intact colGeometry(sfe_combined, "spotPoly", sample_id = "sample02") <- spots2 ``` ```{r} # Set graph only for the second sample colGraph(sfe_combined, "foo", sample_id = "sample02") <- findSpatialNeighbors(sfe_combined, sample_id = "sample02") ``` ```{r} # Get graph only for the second sample colGraph(sfe_combined, "foo", sample_id = "sample02") ``` ```{r} # Set graph of the same name for both samples # The graphs are computed separately for each sample colGraphs(sfe_combined, sample_id = "all", name = "visium") <- findVisiumGraph(sfe_combined, sample_id = "all") ``` ```{r} # Get multiple graphs of the same name colGraphs(sfe_combined, sample_id = "all", name = "visium") ``` ```{r} # Or just all graphs of the margin colGraphs(sfe_combined, sample_id = "all") ``` Sample IDs can also be changed, with the `changeSampleIDs()` function, with a named vector whose names are the old names and values are the new names. ```{r} sfe_combined <- changeSampleIDs(sfe, replacement = c(Vis5A = "foo", sample02 = "bar")) sfe_combined ``` # Object construction ## From scratch An SFE object can be constructed from scratch with the assay matrices and metadata. In this toy example, `dgCMatrix` is used, but since SFE inherits from SingleCellExperiment (SCE), other types of arrays supported by SCE such as delayed arrays should also work. ```{r} # Visium barcode location from Space Ranger data("visium_row_col") coords1 <- visium_row_col[visium_row_col$col < 6 & visium_row_col$row < 6,] coords1$row <- coords1$row * sqrt(3) # Random toy sparse matrix set.seed(29) col_inds <- sample(1:13, 13) row_inds <- sample(1:5, 13, replace = TRUE) values <- sample(1:5, 13, replace = TRUE) mat <- sparseMatrix(i = row_inds, j = col_inds, x = values) colnames(mat) <- coords1$barcode rownames(mat) <- sample(LETTERS, 5) ``` That should be sufficient to create an SPE object, and an SFE object, even though no `sf` data frame was constructed for the geometries. The constructor behaves similarly to the SPE constructor. The centroid coordinates of the Visium spots in the toy example can be converted into spot polygons with the `spotDiameter` argument. Spot diameter in pixels in full resolution image can be found in the `scalefactors_json.json` file in Space Ranger output. ```{r} sfe3 <- SpatialFeatureExperiment(list(counts = mat), colData = coords1, spatialCoordsNames = c("col", "row"), spotDiameter = 0.7) ``` More geometries and spatial graphs can be added after calling the constructor. Geometries can also be supplied in the constructor. ```{r} # Convert regular data frame with coordinates to sf data frame cg <- df2sf(coords1[,c("col", "row")], c("col", "row"), spotDiameter = 0.7) rownames(cg) <- colnames(mat) sfe3 <- SpatialFeatureExperiment(list(counts = mat), colGeometries = list(foo = cg)) ``` ## Space Ranger output Space Ranger output can be read in a similar manner as in `SpatialExperiment`; the returned SFE object has the `spotPoly` column geometry for the spot polygons. If the filtered matrix is read in, then a column graph called `visium` will also be present, for the spatial neighborhood graph of the Visium spots on tissue. The graph is not computed if all spots are read in regardless of whether they are on tissue. ```{r} dir <- system.file("extdata", package = "SpatialFeatureExperiment") sample_ids <- c("sample01", "sample02") samples <- file.path(dir, sample_ids) ``` Inside the `outs` directory: ```{r} list.files(file.path(samples[1], "outs")) ``` There should also be `raw_feature_bc_matrix` though this toy example only has the filtered matrix. Inside the matrix directory: ```{r} list.files(file.path(samples[1], "outs", "filtered_feature_bc_matrix")) ``` Inside the `spatial` directory: ```{r} list.files(file.path(samples[1], "outs", "spatial")) ``` Not all Visium datasets have all the files here. The `barcode_fluorescence_intensity.csv` file is only present for datasets with fluorescent imaging rather than bright field H&E. ```{r} (sfe3 <- read10xVisiumSFE(samples, sample_id = sample_ids, type = "sparse", data = "filtered", images = "hires")) ``` The `barcode_fluorescence_intensity.csv` file is read into `colData`. The `spatial_enrichment.csv` file contains Moran's I and its p-values for each gene; it is read into `rowData`. Instead of pixels in the full resolution image, the Visium data can be read so the units are microns. Full resolution pixels is related to microns by the spacing between spots, which is known to be 100 microns. The unit can be set in the `unit` argument; for now only "micron" and "full_res_image_pixel" are supported for Visium: ```{r} (sfe3 <- read10xVisiumSFE(samples, sample_id = sample_ids, type = "sparse", data = "filtered", images = "hires", unit = "micron")) ``` The unit of the SFE object can be checked: ```{r} unit(sfe3) ``` At present, this is merely a string and SFE doesn't perform unit conversion. Unlike in `SpatialExperiment`, SFE reads the images as `terra::SpatRaster` objects, so the images are not loaded into memory unless necessary. Also, with `terra`, if a larger image is associated with the SFE object, it will not be fully loaded into memory when plotted; rather, it's downsampled. ```{r} class(imgRaster(getImg(sfe3))) ``` ## Vizgen MERFISH output The commercialized MERFISH from Vizgen has a standard output format, that can be read into SFE with `readVizgen()`. Because the cell segmentation from each field of view (FOV) has a separate HDF5 file and a MERFISH dataset can have hundreds of FOVs, we strongly recommend reading the MERFISH output on a server with a large number of CPU cores. Alternatively, some but not all MERFISH datasets store cell segmentation in a `parquet` file, which can be more easily read into R. This requires the installation of `arrow`. Here we read a toy dataset which is the first FOV from a real dataset: ```{r} fp <- tempdir() dir_use <- VizgenOutput(file_path = file.path(fp, "vizgen")) list.files(dir_use) ``` The optional `add_molecules` argument can be set to `TRUE` to read in the transcript spots ```{r} (sfe_mer <- readVizgen(dir_use, z = 3L, image = "PolyT", add_molecules = TRUE)) ``` The unit is always in microns. To make it easier and faster to read the data next time, the processed cell segmentation geometries and transcript spots are written to the same directory where the data resides: ```{r} list.files(dir_use) ``` ## 10X Xenium output SFE supports reading the output from Xenium Onboarding Analysis (XOA) v1 and v2 with the function `readXenium()`. Especially for XOA v2, `arrow` is strongly recommended. The cell and nuclei polygon vertices and transcript spot coordinates are in `parquet` files Similar to `readVizgen()`, `readXenium()` makes `sf` data frames from the vertices and transcript spots and saves them as GeoParquet files. ```{r} dir_use <- XeniumOutput("v2", file_path = file.path(fp, "xenium")) list.files(dir_use) ``` ```{r} # RBioFormats issue: https://github.com/aoles/RBioFormats/issues/42 try(sfe_xen <- readXenium(dir_use, add_molecules = TRUE)) (sfe_xen <- readXenium(dir_use, add_molecules = TRUE)) ``` ```{r} list.files(dir_use) ``` ## Nanostring CosMX output This is similar to `readVizgen()` and `readXenium()`, except that the output doesn't come with images. ```{r} dir_use <- CosMXOutput(file_path = file.path(fp, "cosmx")) list.files(dir_use) ``` ```{r} (sfe_cosmx <- readCosMX(dir_use, add_molecules = TRUE)) ``` ```{r} list.files(dir_use) ``` ## Other technologies A read function for Visium HD is in progress. Contribution for Akoya, Molecular Cartography, and Curio Seeker are welcome. See the [issues](https://github.com/pachterlab/SpatialFeatureExperiment/issues). ## Coercion from `SpatialExperiment` SPE objects can be coerced into SFE objects. If column geometries or spot diameter are not specified, then a column geometry called "centroids" will be created. ```{r} spe <- read10xVisium(samples, sample_ids, type = "sparse", data = "filtered", images = "hires", load = FALSE) ``` For the coercion, column names must not be duplicate. ```{r} colnames(spe) <- make.unique(colnames(spe), sep = "-") rownames(spatialCoords(spe)) <- colnames(spe) ``` ```{r} (sfe3 <- toSpatialFeatureExperiment(spe)) ``` If images are present in the SPE object, they will be converted into `SpatRaster` when the SPE object is converted into SFE. Plotting functions in the `Voyager` package relies on `SpatRaster` to plot the image behind the geometries. ## Coercion from `Seurat` Seurat objects canbe coerced into SFE objects though coercion from SFE to Seurat is not yet implemented. ```{r} dir_extdata <- system.file("extdata", package = "SpatialFeatureExperiment") obj_vis <- readRDS(file.path(dir_extdata, "seu_vis_toy.rds")) ``` ```{r} sfe_conv_vis <- toSpatialFeatureExperiment(x = obj_vis, image_scalefactors = "lowres", unit = "micron", BPPARAM = BPPARAM) sfe_conv_vis ``` # Operations ## Non-geometric SFE objects can be concatenated with `cbind`, as was done just now to create a toy example with 2 samples. ```{r} sfe_combined <- cbind(sfe1, sfe2) ``` The SFE object can also be subsetted like a matrix, like an SCE object. More complexity arises when it comes to the spatial graphs. The `drop` argument of the SFE method `[` determines what to do with the spatial graphs. If `drop = TRUE`, then all spatial graphs will be removed, since the graphs with nodes and edges that have been removed are no longer valid. If `drop = FALSE`, which is the default, then the spatial graphs will be reconstructed with the remaining nodes after subsetting. Reconstruction would only work when the original graphs were constructed with `findSpatialNeighbors` or `findVisiumGraph` in this package, which records the method and parameters used to construct the graphs. If reconstruction fails, then a waning will be issued and the graphs removed. ```{r} (sfe_subset <- sfe[1:10, 1:10, drop = TRUE]) ``` ```{r, eval=FALSE} # Will give warning because graph reconstruction fails sfe_subset <- sfe[1:10, 1:10] ``` If images are present, then they will be cropped to the bounding box of the remaining geometries after subsetting. ## Geometric Just like `sf` data frames, SFE objects can be subsetted by a geometry and a predicate relating geometries. For example, if all Visium spots were read into an SFE object regardless of whether they are in tissue, and the `tissueBoundary` annotation geometry is provided, then the tissue boundary geometry can be used to subset the SFE object to obtain a new SFE object with only spots on tissue. Loupe does not give the tissue boundary polygon; such polygon can be obtained by thresholding the H&E image and converting the mask into polygons with OpenCV or the `terra` R package, or by manual annotation in QuPath or LabKit (the latter needs to be converted into polygon). ### Crop Use the `crop` function to directly get the subsetted SFE object. When images are present, they are cropped by the bounding box of the cropped geometries. ```{r} # Before plot(st_geometry(tissueBoundary(sfe))) plot(spotPoly(sfe), col = "gray", add = TRUE) ``` ```{r} sfe_in_tissue <- crop(sfe, y = tissueBoundary(sfe), colGeometryName = "spotPoly") ``` Note that for large datasets with many geometries, cropping can take a while to run. ```{r} # After plot(st_geometry(tissueBoundary(sfe))) plot(spotPoly(sfe_in_tissue), col = "gray", add = TRUE) ``` `crop` can also be used in the conventional sense of cropping, i.e. specifying a bounding box. ```{r} sfe_cropped <- crop(sfe, colGeometryName = "spotPoly", sample_id = "Vis5A", xmin = 5500, xmax = 6500, ymin = 13500, ymax = 14500) ``` The `colGeometryName` is used to determine which columns in the gene count matrix to keep. All geometries in the SFE object will be subsetted so only portions intersecting `y` or the bounding box are kept. Since the intersection operation can produce a mixture of geometry types, such as intersection of two polygons producing polygons, points, and lines, the geometry types of the `sf` data frames after subsetting may be different from those of the originals. The cropping is done independently for each `sample_id`, and only on `sample_id`s specified. Again, `sample_id` is optional when there is only one sample in the SFE object. Geometry predicates and operations can also be performed to return the results without subsetting an SFE object. For example, one may want a logical vector indicating whether each Visium spot intersects the tissue, or a numeric vector of how many nuclei there are in each Visium spot. Or get the intersections between each Visium spot and nuclei. Again, the geometry predicates and operations are performed independently for each sample, and the `sample_id` argument is optional when there is only one sample. ```{r} # Get logical vector colData(sfe)$in_tissue <- annotPred(sfe, colGeometryName = "spotPoly", annotGeometryName = "tissueBoundary", sample_id = "Vis5A") # Get the number of nuclei per Visium spot colData(sfe)$n_nuclei <- annotNPred(sfe, "spotPoly", annotGeometryName = "nuclei") # Get geometries of intersections of Visium spots and myofibers spot_intersections <- annotOp(sfe, colGeometryName = "spotPoly", annotGeometryName = "myofiber_simplified") ``` Sometimes the spatial coordinates of different samples can take very different values. The values can be made more comparable by moving all tissues so the bottom left corner of the bounding box would be at the origin, which would facilitate plotting and comparison across samples with `geom_sf` and `facet_*`. To find the bounding box of all geometries in each sample of an SFE object: ```{r} bbox(sfe, sample_id = "Vis5A") ``` To move the coordinates: ```{r} sfe_moved <- removeEmptySpace(sfe, sample_id = "Vis5A") ``` The original bounding box before moving is stored within the SFE object, which can be read by `dimGeometry` setters so newly added geometries can have coordinates moved as well; this behavior can be turned off with the optional argument `translate = FALSE` in `dimGeometry` setters. ### Transform When images are present, they might need to be flipped to align with the spots. `SpatialExperiment` implements methods to rotate and mirror images, and SFE implements methods for SFE objects to transpose and mirror images (`terra::rotate()` does NOT rotate the image in the conventional sense -- rather it changes the longitudes and where the globe is cut to project to 2D just like cutting a world map at the Atlantic vs. the Pacific). `SpatialExperiment` represents images with S4 classes inheriting from the `VirtualSpatialImage` virtual class. To be compatible with SPE, SFE uses `SpatRasterImage`, which is a thin wrapper of `SpatRaster` inheriting from the virtual class. Transformations can be applied to `SpatRasterImage`, as well as SFE objects with sample and image IDs specified. When an image is transposed, it is flipped about the line going from top left to bottom right: ```{r, fig.width=1, fig.height=1} img <- getImg(sfe3, image_id = "hires") plot(imgRaster(img)) plot(transposeImg(img) |> imgRaster()) ``` Arguments for the SFE method of `mirrorImg()` differ from those of the SPE method, to match `terra::flip()`: ```{r, fig.width=1, fig.height=1} plot(mirrorImg(img, direction = "vertical") |> imgRaster()) plot(mirrorImg(img, direction = "horizontal") |> imgRaster()) ``` Here we apply the transformation to an SFE object, where the image specified by the sample and image IDs are transformed: ```{r} sfe3 <- mirrorImg(sfe3, sample_id = "sample01", image_id = "hires") ``` So far, `transposeImg()` and `mirrorImg()` only transform the image. But the entire SFE object, including all the geometries and images, can be transformed at once. ```{r} sfe_mirrored <- mirror(sfe_in_tissue) sfe_transposed <- transpose(sfe_in_tissue) ``` ```{r, fig.width=6, fig.height=2} par(mfrow = c(1, 3), mar = rep(1.5, 4)) plot(st_geometry(tissueBoundary(sfe_in_tissue))) plot(spotPoly(sfe_in_tissue), col = "gray", add = TRUE) plot(st_geometry(tissueBoundary(sfe_mirrored))) plot(spotPoly(sfe_mirrored), col = "gray", add = TRUE) plot(st_geometry(tissueBoundary(sfe_transposed))) plot(spotPoly(sfe_transposed), col = "gray", add = TRUE) ``` Transforming the entire SFE object can be useful when the tissue has a orientation and a conventional direction of the orientation, such as rostral is conventionally at the top while caudal is at the bottom in coronal brain sections, while anterior is at the left and posterior is at the right in saggital brain sections, to make data conform to the convention. # Limitations and future directions These are the limitations of the current version of SFE: 1. By integrating with `sf`, which is designed for vector spatial data (specifying coordinates of points, lines, and polygons vertices), SFE only supports vector data for the geometries, and raster (like an image, with a value at each pixel) is not supported. Vector is chosen, as it is a more memory efficient way to represent cell and nuclei segmentation than a raster map. 2. The spatial graphs are `listw` objects so no conversion is necessary to use the well-established spatial statistical methods in the `spdep`, `spatialreg`, and `adespatial` packages. However, `igraph` implements many graph analysis methods, and conversion is required to use them. Whether future versions of SFE will stick to `listw` depends on importance of methods that use spatial graphs in `igraph` class. 3. While Simple Features support 3D and spatiotemporal coordinates, most geospatial resources SFE leverages `sf` for is for 2D data. 4. Spatial point process analysis with the `spatstat` package may be relevant, such as in analyzing spatial distribution of nuclei or transcript spots. As `spatstat` predates `sf` by over a decade, `spatstat` does not play very nicely with `sf`. However, since analyses of nuclei and transcript spot localization don't center on the gene count matrix, whether `spatstat` analyses should be integrated into SFE (which is centered on the gene count matrix) is questionable. 5. Geometries for very large datasets can get very large. On disk operations of the geometries should be considered. The geospatial field already has on disk tools for both vector and raster data. So far, SFE has only been tested on data that fit into memory. 6. Setting units of length in the SFE object and converting units. This can make geometries of different samples and datasets comparable, and helpful to plotting scale bars when plotting geometries. ```{r} # Clean up unlink(file.path(fp, "vizgen"), recursive = TRUE) unlink(file.path(fp, "xenium"), recursive = TRUE) unlink(file.path(fp, "cosmx"), recursive = TRUE) ``` # Session info ```{r} sessionInfo() ```