Contents

1 Installation

if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("lisaClust")
# load required packages
library(lisaClust)
library(spicyR)
library(ggplot2)
library(SingleCellExperiment)

2 Overview

Clustering local indicators of spatial association (LISA) functions is a methodology for identifying consistent spatial organisation of multiple cell-types in an unsupervised way. This can be used to enable the characterization of interactions between multiple cell-types simultaneously and can complement traditional pairwise analysis. In our implementation our LISA curves are a localised summary of an L-function from a Poisson point process model. Our framework lisaClust can be used to provide a high-level summary of cell-type colocalization in high-parameter spatial cytometry data, facilitating the identification of distinct tissue compartments or identification of complex cellular microenvironments.

3 Quick start

3.1 Generate toy data

TO illustrate our lisaClust framework, here we consider a very simple toy example where two cell-types are completely separated spatially. We simulate data for two different images.

set.seed(51773)
x <- round(c(runif(200),runif(200)+1,runif(200)+2,runif(200)+3,
           runif(200)+3,runif(200)+2,runif(200)+1,runif(200)),4)*100
y <- round(c(runif(200),runif(200)+1,runif(200)+2,runif(200)+3,
             runif(200),runif(200)+1,runif(200)+2,runif(200)+3),4)*100
cellType <- factor(paste('c',rep(rep(c(1:2),rep(200,2)),4),sep = ''))
imageID <- rep(c('s1', 's2'),c(800,800))

cells <- data.frame(x, y, cellType, imageID)

ggplot(cells, aes(x,y, colour = cellType)) + geom_point() + facet_wrap(~imageID) + theme_minimal()

3.2 Create Single Cell Experiment object

First we store our data in a SingleCellExperiment object.


SCE <- SingleCellExperiment(colData = cells)
SCE
## class: SingleCellExperiment 
## dim: 0 1600 
## metadata(0):
## assays(0):
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): x y cellType imageID
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

3.3 Running lisaCLust

We can then use the convenience function lisaClust to simultaneously calculate local indicators of spatial association (LISA) functions using the lisa function and perform k-means clustering. The number of clusters can be specified with the k = parameter. In the example below, we’ve chosen k = 2, resulting in a total of 2 clusters.

These clusters are stored in colData of the SingleCellExperiment object, as a new column with the column name regions.

SCE <- lisaClust(SCE, k = 2)
colData(SCE) |> head()
## DataFrame with 6 rows and 5 columns
##           x         y cellType     imageID      region
##   <numeric> <numeric> <factor> <character> <character>
## 1     36.72     38.58       c1          s1    region_2
## 2     61.38     41.29       c1          s1    region_2
## 3     33.59     80.98       c1          s1    region_2
## 4     50.17     64.91       c1          s1    region_2
## 5     82.93     35.60       c1          s1    region_2
## 6     83.13      2.69       c1          s1    region_2

3.4 Plot identified regions

lisaClust also provides the convenient hatchingPlot function to visualise the different regions that have been demarcated by the clustering. hatchingPlot outputs a ggplot object where the regions are marked by different hatching patterns. In a real biological dataset, this allows us to plot both regions and cell-types on the same visualization.

In the example below, we can visualise our stimulated data where our 2 cell types have been separated neatly into 2 distinct regions based on which cell type each region is dominated by. region_2 is dominated by the red cell type c1, and region_1 is dominated by the blue cell type c2.

hatchingPlot(SCE, useImages = c('s1','s2'))

## Using other clustering methods.

While the lisaClust function is convenient, we have not implemented an exhaustive suite of clustering methods as it is very easy to do this yourself. There are just two simple steps.

3.4.1 Generate LISA curves

We can calculate local indicators of spatial association (LISA) functions using the lisa function. Here the LISA curves are a localised summary of an L-function from a Poisson point process model. The radii that will be calculated over can be set with Rs.

lisaCurves <- lisa(SCE, Rs = c(20, 50, 100))

3.4.2 Perform some clustering

The LISA curves can then be used to cluster the cells. Here we use k-means clustering, other clustering methods like SOM could be used. We can store these cell clusters or cell “regions” in our SingleCellExperiment object using the cellAnnotation() <- function.

# Custom clustering algorithm
kM <- kmeans(lisaCurves,2)

# Storing clusters into colData
colData(SCE)$custom_region <- paste('region',kM$cluster,sep = '_')
colData(SCE) |> head()
## DataFrame with 6 rows and 6 columns
##           x         y cellType     imageID      region custom_region
##   <numeric> <numeric> <factor> <character> <character>   <character>
## 1     36.72     38.58       c1          s1    region_2      region_2
## 2     61.38     41.29       c1          s1    region_2      region_2
## 3     33.59     80.98       c1          s1    region_2      region_2
## 4     50.17     64.91       c1          s1    region_2      region_2
## 5     82.93     35.60       c1          s1    region_2      region_2
## 6     83.13      2.69       c1          s1    region_2      region_2

4 Damond et al. islet data.

Next, we apply our lisaClust framework to three images of pancreatic islets from A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry by Damond et al. (2019).

4.1 Read in data

We will start by reading in the data and storing it as a SingleCellExperiment object. Here the data is in a format consistent with that outputted by CellProfiler.

isletFile <- system.file("extdata","isletCells.txt.gz", package = "spicyR")
cells <- read.table(isletFile, header = TRUE)
damonSCE <- SingleCellExperiment(assay = list(intensities = t(cells[,grepl(names(cells), pattern = "Intensity_")])),
                                 colData = cells[,!grepl(names(cells), pattern = "Intensity_")]
                                 )

4.2 Cluster cell-types

This data does not include annotation of the cell-types of each cell. Here we extract the marker intensities from the SingleCellExperiment object using assay(). We then perform k-means clustering with 10 clusters and store these cell-type clusters in our SingleCellExperiment object using colData() <-.

markers <- t(assay(damonSCE, "intensities"))
kM <- kmeans(markers,10)
colData(damonSCE)$cluster <- paste('cluster', kM$cluster, sep = '')
colData(damonSCE)[, c("ImageNumber", "cluster")] |> head()
## DataFrame with 6 rows and 2 columns
##   ImageNumber     cluster
##     <integer> <character>
## 1           1    cluster9
## 2           1    cluster9
## 3           1    cluster9
## 4           1    cluster5
## 5           1    cluster5
## 6           1    cluster2

4.3 Generate LISA curves

As before, we can perform k-means clustering on the local indicators of spatial association (LISA) functions using the lisaClust function, remembering to specify the imageID, cellType, and spatialCoords columns in colData.


damonSCE <- lisaClust(damonSCE, 
                      k = 2, 
                      Rs = c(10,20,50), 
                      imageID = "ImageNumber", 
                      cellType = "cluster",
                      spatialCoords = c("Location_Center_X", "Location_Center_Y"))

These regions are stored in colData and can be extracted.

colData(damonSCE)[, c("ImageNumber", "region")] |>
  head(20)
## DataFrame with 20 rows and 2 columns
##     ImageNumber      region
##       <integer> <character>
## 1             1    region_1
## 2             1    region_1
## 3             1    region_1
## 4             1    region_1
## 5             1    region_1
## ...         ...         ...
## 16            1    region_1
## 17            1    region_1
## 18            1    region_1
## 19            1    region_1
## 20            1    region_1

4.4 Examine cell type enrichment

lisaClust also provides a convenient function, regionMap, for examining which cell types are located in which regions. In this example, we use this to check which cell types appear more frequently in each region than expected by chance.

Here, we clearly see that clusters 2, 5, 1, and 8 are highly concentrated in region 1, whilst all other clusters are thinly spread out across region 2.

We can further segregate these cells by increasing the number of clusters, ie. increasing the parameter k = in the lisaClust() function, but for the purposes of demonstration, let’s take a look at the hatchingPlot of these regions.

regionMap(damonSCE, 
          imageID = "ImageNumber", 
          cellType = "cluster",
          spatialCoords = c("Location_Center_X", "Location_Center_Y"),
          type = "bubble")

4.5 Plot identified regions

Finally, we can use hatchingPlot to construct a ggplot object where the regions are marked by different hatching patterns. This allows us to visualize the two regions and ten cell-types simultaneously.

hatchingPlot(damonSCE, 
             cellType = "cluster",
             spatialCoords = c("Location_Center_X", "Location_Center_Y"))

5 sessionInfo()

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
##  [3] Biobase_2.62.0              GenomicRanges_1.54.0       
##  [5] GenomeInfoDb_1.38.0         IRanges_2.36.0             
##  [7] S4Vectors_0.40.1            BiocGenerics_0.48.0        
##  [9] MatrixGenerics_1.14.0       matrixStats_1.0.0          
## [11] ggplot2_3.4.4               spicyR_1.14.1              
## [13] lisaClust_1.10.1            BiocStyle_2.30.0           
## 
## loaded via a namespace (and not attached):
##   [1] bitops_1.0-7                deldir_1.0-9               
##   [3] rlang_1.1.1                 magrittr_2.0.3             
##   [5] compiler_4.3.1              spatstat.geom_3.2-7        
##   [7] mgcv_1.9-0                  fftwtools_0.9-11           
##   [9] vctrs_0.6.4                 reshape2_1.4.4             
##  [11] stringr_1.5.0               pkgconfig_2.0.3            
##  [13] SpatialExperiment_1.12.0    crayon_1.5.2               
##  [15] fastmap_1.1.1               backports_1.4.1            
##  [17] magick_2.8.1                XVector_0.42.0             
##  [19] labeling_0.4.3              utf8_1.2.4                 
##  [21] rmarkdown_2.25              nloptr_2.0.3               
##  [23] purrr_1.0.2                 xfun_0.40                  
##  [25] MultiAssayExperiment_1.28.0 zlibbioc_1.48.0            
##  [27] cachem_1.0.8                jsonlite_1.8.7             
##  [29] goftest_1.2-3               DelayedArray_0.28.0        
##  [31] tweenr_2.0.2                spatstat.utils_3.0-4       
##  [33] BiocParallel_1.36.0         broom_1.0.5                
##  [35] parallel_4.3.1              R6_2.5.1                   
##  [37] bslib_0.5.1                 stringi_1.7.12             
##  [39] RColorBrewer_1.1-3          spatstat.data_3.0-3        
##  [41] boot_1.3-28.1               car_3.1-2                  
##  [43] numDeriv_2016.8-1.1         jquerylib_0.1.4            
##  [45] Rcpp_1.0.11                 bookdown_0.36              
##  [47] knitr_1.44                  tensor_1.5                 
##  [49] splines_4.3.1               Matrix_1.6-1.1             
##  [51] tidyselect_1.2.0            abind_1.4-5                
##  [53] yaml_2.3.7                  codetools_0.2-19           
##  [55] spatstat.random_3.2-1       spatstat.explore_3.2-5     
##  [57] curl_5.1.0                  lmerTest_3.1-3             
##  [59] lattice_0.22-5              tibble_3.2.1               
##  [61] plyr_1.8.9                  withr_2.5.1                
##  [63] evaluate_0.22               survival_3.5-7             
##  [65] polyclip_1.10-6             pillar_1.9.0               
##  [67] BiocManager_1.30.22         ggpubr_0.6.0               
##  [69] carData_3.0-5               ClassifyR_3.6.0            
##  [71] generics_0.1.3              RCurl_1.98-1.12            
##  [73] munsell_0.5.0               scales_1.2.1               
##  [75] minqa_1.2.6                 class_7.3-22               
##  [77] glue_1.6.2                  pheatmap_1.0.12            
##  [79] tools_4.3.1                 data.table_1.14.8          
##  [81] lme4_1.1-34                 ggsignif_0.6.4             
##  [83] grid_4.3.1                  tidyr_1.3.0                
##  [85] colorspace_2.1-0            nlme_3.1-163               
##  [87] GenomeInfoDbData_1.2.11     ggforce_0.4.1              
##  [89] cli_3.6.1                   spatstat.sparse_3.0-3      
##  [91] fansi_1.0.5                 S4Arrays_1.2.0             
##  [93] scam_1.2-14                 dplyr_1.1.3                
##  [95] concaveman_1.1.0            V8_4.4.0                   
##  [97] gtable_0.3.4                rstatix_0.7.2              
##  [99] sass_0.4.7                  digest_0.6.33              
## [101] SparseArray_1.2.0           farver_2.1.1               
## [103] rjson_0.2.21                htmltools_0.5.6.1          
## [105] lifecycle_1.0.3             MASS_7.3-60