--- title: "Introduction to Clustering of Local Indicators of Spatial Assocation (LISA) curves" date: "`r BiocStyle::doc_date()`" author: - name: Nicolas Canete affiliation: - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia email: nicolas.canete@sydney.edu.au - name: Ellis Patrick affiliation: - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia - School of Mathematics and Statistics, University of Sydney, Australia email: ellis.patrick@sydney.edu.au - name: Alex Run Qin affiliation: - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia - School of Mathematics and Statistics, University of Sydney, Australia email: alex.qin@sydney.edu.au package: "`r BiocStyle::pkg_ver('lisaClust')`" abstract: > Identify and visualise regions of cell type colocalization in multiplexed imaging data that has been segmented at a single-cell resolution. vignette: > %\VignetteIndexEntry{"Inroduction to lisaClust"} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} output: BiocStyle::html_document bibliography: REFERENCES.bib editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, message = FALSE, warning = FALSE ) library(BiocStyle) ``` # Installation ```{r, eval = FALSE} if (!require("BiocManager")) { install.packages("BiocManager") } BiocManager::install("lisaClust") ``` ```{r message=FALSE, warning=FALSE} # load required packages library(lisaClust) library(spicyR) library(ggplot2) library(SingleCellExperiment) library(SpatialDatasets) ``` # Overview Clustering local indicators of spatial association (LISA) functions is a methodology for identifying consistent spatial organisation of multiple cell-types in an unsupervised way. This can be used to enable the characterization of interactions between multiple cell-types simultaneously and can complement traditional pairwise analysis. In our implementation our LISA curves are a localised summary of an L-function from a Poisson point process model. Our framework `lisaClust` can be used to provide a high-level summary of cell-type colocalization in high-parameter spatial cytometry data, facilitating the identification of distinct tissue compartments or identification of complex cellular microenvironments. # Quick start ## Generate toy data To illustrate our `lisaClust` framework, we consider a very simple toy example where two cell-types are completely separated spatially. We simulate data for two different images. ```{r eval=T} set.seed(51773) x <- round(c( runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3, runif(200) + 3, runif(200) + 2, runif(200) + 1, runif(200) ), 4) * 100 y <- round(c( runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3, runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3 ), 4) * 100 cellType <- factor(paste("c", rep(rep(c(1:2), rep(200, 2)), 4), sep = "")) imageID <- rep(c("s1", "s2"), c(800, 800)) cells <- data.frame(x, y, cellType, imageID) ggplot(cells, aes(x, y, colour = cellType)) + geom_point() + facet_wrap(~imageID) + theme_minimal() ``` ## Create Single Cell Experiment object First we store our data in a `SingleCellExperiment` object. ```{r} SCE <- SingleCellExperiment(colData = cells) SCE ``` ## Running lisaCLust We can then use the convenience function `lisaClust` to simultaneously calculate local indicators of spatial association (LISA) functions and perform k-means clustering. The number of clusters can be specified with the `k =` parameter. In the example below, we've chosen `k = 2`, resulting in a total of 2 clusters. The cell type column can be specified using the `cellType = ` argument. By default, `lisaClust` uses the column named `cellType`. The clusters identified by `lisaClust` are stored in `colData` of the `SingleCellExperiment` object as a new column called `regions`. ```{r} SCE <- lisaClust(SCE, k = 2) colData(SCE) |> head() ``` ## Plot identified regions `lisaClust` also provides the convenient `hatchingPlot` function to visualise the different regions that have been demarcated by the clustering. `hatchingPlot` outputs a `ggplot` object where the regions are marked by different hatching patterns. In a real biological dataset, this allows us to plot both regions and cell-types on the same visualization. In the example below, we can visualise our stimulated data where our 2 cell types have been separated neatly into 2 distinct regions based on which cell type each region is dominated by. `region_2` is dominated by the red cell type `c1`, and `region_1` is dominated by the blue cell type `c2`. ```{r} hatchingPlot(SCE, useImages = c("s1", "s2")) ``` ## Using other clustering methods. While the `lisaClust` function is convenient, we have not implemented an exhaustive suite of clustering methods as it is very easy to do this yourself. There are just two simple steps. ### Generate LISA curves We can calculate local indicators of spatial association (LISA) functions using the `lisa` function. Here the LISA curves are a localised summary of an L-function from a Poisson point process model. The radii that will be calculated over can be set with `Rs`. ```{r} lisaCurves <- lisa(SCE, Rs = c(20, 50, 100)) head(lisaCurves) ``` ### Perform some clustering The LISA curves can then be used to cluster the cells. Here we use k-means clustering. However, other clustering methods like SOM could also be used. We can store these cell clusters or cell "regions" in our `SingleCellExperiment` object. ```{r} # Custom clustering algorithm kM <- kmeans(lisaCurves, 2) # Storing clusters into colData colData(SCE)$custom_region <- paste("region", kM$cluster, sep = "_") colData(SCE) |> head() ``` # Keren et al. breast cancer data. Next, we apply our `lisaClust` framework to two images of breast cancer obtained by @keren2018. ## Read in data We will start by reading in the data from the `SpatialDatasets` package as a `SingleCellExperiment` object. Here the data is in a format consistent with that outputted by CellProfiler. ```{r} kerenSPE <- SpatialDatasets::spe_Keren_2018() ``` ## Generate LISA curves This data includes annotation of the cell-types of each cell. Hence, we can move directly to performing k-means clustering on the local indicators of spatial association (LISA) functions using the `lisaClust` function, remembering to specify the `imageID`, `cellType`, and `spatialCoords` columns in `colData`. For the purpose of demonstration, we will be using only images 5 and 6 of the `kerenSPE` dataset. ```{r} kerenSPE <- kerenSPE[,kerenSPE$imageID %in% c("5", "6")] kerenSPE <- lisaClust(kerenSPE, k = 5 ) ``` These regions are stored in `colData` and can be extracted. ```{r} colData(kerenSPE)[, c("imageID", "region")] |> head(20) ``` ## Examine cell type enrichment `lisaClust` also provides a convenient function, `regionMap`, for examining which cell types are located in which regions. In this example, we use this to check which cell types appear more frequently in each region than expected by chance. Here, we clearly see that healthy epithelial and mesenchymal tissue are highly concentrated in region 1, immune cells are concentrated in regions 2 and 4, whilst tumour cells are concentrated in region 3. We can further segregate these cells by increasing the number of clusters, i.e., increasing the parameter `k = ` in the `lisaClust()` function. For the purposes of demonstration, let's take a look at the `hatchingPlot` of these regions. ```{r} regionMap(kerenSPE, type = "bubble" ) ``` ## Plot identified regions Finally, we can use `hatchingPlot` to construct a `ggplot` object where the regions are marked by different hatching patterns. This allows us to visualize the 5 regions and 17 cell-types simultaneously. ```{r fig.height=7, fig.width=9} hatchingPlot(kerenSPE, nbp = 300) ``` # References # sessionInfo() ```{r} sessionInfo() ```