---
title: "CatsCradle Spatial Vignette"
author: "Anna Laddach and Michael Shapiro"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
    %\VignetteIndexEntry{CatsCradle Spatial Vignette}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

```{r  [Sp1] setup, include = FALSE, warning = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    fig.dim = c(6,6),
    comment = "#>"
)
```
`r BiocStyle::Biocpkg("BiocStyle")`

![](CatsCradleLogo.png){width=2in}


## Introduction

Here we describe the tools that CatsCradle offers for exploiting the
spatial relationships in spatial transcriptomics data.

 We will
be using a subset  of a Xenium data set that
profiles the mouse hippocampus available from 10x genomics
(https://www.10xgenomics.com/datasets/fresh-frozen-mouse-brain-for-xenium-explorer-demo-1-standard)

Here we visualise this subset coloured by cell type.  Here cell
clusters (Louvain cluster) have been assigned cell type identities
using RCTD along with a reference dataset
(https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1)
followed by minimal manual curation. Please note these assignments are
not definitive and are for illustratory purporses only.

```{r [Sp2], message=FALSE}
library(Seurat,quietly=TRUE)
library(CatsCradle,quietly=TRUE)
getExample = make.getExample()
smallXenium = getExample('smallXenium')
ImageDimPlot(smallXenium, cols = "polychrome", size = 1)
```

We will want to answer questions like the following:

Is the expression of a particular gene localised?  Who are the
immediate neighbours of a given cell?  Do certain types of cells tend
to co-localise?  Is the spatial association of two cell types, (say,
glia and macrophages) statistically significant?  For any pair of
neighbouring cells, are they engaged in ligand-repceptor interactions?
Can we classify different types of tissue neighbourhoods?  Are
different neighbourhood types characterised by the cell types found in
them?  The genes expressed in them?  The ligand-receptor interactions
taking place in them?


### Neighbourhoods


A key concept here is that of a _neighbourhood_.  A neighbourhood is a
spatially contiguous set of cells.  In every case we will consider, a
neighbourhood is a set of cells _centred on a particular cell_.

The simplest sort of neighbourhood consists of a cell together with
its immediate neighbours.  We compute these as a Delaunay
triangulation using the centroids of the cells:


```{r [Sp3]}
centroids = GetTissueCoordinates(smallXenium)
rownames(centroids) = centroids$cell
clusters = smallXenium@active.ident
delaunayNeighbours = computeNeighboursDelaunay(centroids)
head(delaunayNeighbours)
```

Two cells are neighbours if they appear on the same row.  The
neighbourhood of the cell 16307 consists of the following 7 cells:

```{r [Sp4]}
idx = (delaunayNeighbours$nodeA == 16307 |
       delaunayNeighbours$nodeB == 16307)
nbhd = unique(c(delaunayNeighbours$nodeA[idx],
                delaunayNeighbours$nodeB[idx]))
nbhd		
```

We can compute extended neighbourhoods by considering (say) each cell's
neighbours, their neighbours, their neighbours and their neighbours.
Here we have such an extended neighbourhood as a list and again as a
data frame with each row giving a pair of cells in a common extended
neighbourhood.  In this case we are building an extended neighbourhood
of combinatorial radius 4.

```{r [Sp5]}
extendedNeighboursList = getExtendedNBHDs(delaunayNeighbours, 4)
extendedNeighbours = collapseExtendedNBHDs(extendedNeighboursList, 4)
```

Now the extended neighbourhood of  cell 16307 consists of 92 cells:

```{r [Sp6]}
idx = (extendedNeighbours$nodeA == 16307 |
       extendedNeighbours$nodeB == 16307)
nbhd = unique(c(extendedNeighbours$nodeA[idx],
                extendedNeighbours$nodeB[idx]))
length(nbhd)		
```

We can also create neighbours based on Euclidean distance in the tissue.


```{r [Sp7]}
euclideanNeighbours = computeNeighboursEuclidean(centroids,
threshold=20)
```

We see two reasons for looking at these extended neighbourhoods.  One
is that while the Delaunay triangulation might be more appropriate for
looking at cell to cell interactions involving direct contact,
extended neighbourhoods might be more appropriate for studying
interactions based on diffusible ligands.  The second is that when we
go on to characterise _types_ of neighbourhoods, these extended
neighbourhoods exhibit less noise.


We offer two viewpoints as to what is "going on" in a neighbourhood.
One is that a neighbourhood is characterised by the cell types that
are found in it.  In this view, just as a cell expresses genes, a
neighbourhood "expresses" cell types.  A second viewpoint is that a
neighbourhood also expresses genes.  That is, for each neighbourhood,
we can compute the total gene expression across all the cells in that
neighbourhood.

In this vignette we will focus on the first viewpoint.  This is not
because we think the first viewpoint is more important or because we
necessarily expect it to be more fruitful.  It's because the second
viewpoint so directly parallels standard Seurat single cell
transcriptomics analyses that it requires less explanation.  So before
we move on to the neighbourhoods / cell types viewpoint, we give an
overview of the neighbourhoods / aggregate gene expression viewpoint.

### Neighbourhoods as characterised by gene expression

Here we create a Seurat object based on the aggregate gene expression
in each of our extended neighbourhoods and display the clustering of
the neighbourhoods (called aggregation_clusters) both on the tissue
plot and on the resulting UMAP.

```{r [Sp8]}
agg = aggregateGeneExpression(smallXenium,extendedNeighbours,
                                    verbose=FALSE)
smallXenium$aggregateNBHDClusters = agg@active.ident
ImageDimPlot(smallXenium,group.by='aggregateNBHDClusters',cols='polychrome')
```

Since neighbourhoods here are indexed by individual cells, the
neighbourhood gene aggregation Seurat object is formally identical to
a standard Seurat object.  In particular, this allows the standard
sorts of analyses including clustering of neighbourhoods into types;
dimension reduction using PCA, UMAP, tSNE; discovery of marker genes
for each neighbourhood type; plotting of aggregate gene expression on
neighbourhood UMAP; use of CatsCradle to discover novel clustering of
genes based on their expression across the different neighbourhoods.


## Neighbourhoods as characterised by cell types


### Calculation of neighbourhood celltype composition 

Given any of the spatial graphs describing neighbourhoods calculated above,
we can now calculate the cell type composition of neighbourhoods.

```{r [Sp9]}
NBHDByCTMatrix = computeNBHDByCTMatrix(delaunayNeighbours, clusters)
```

In the resulting matrix neighbourhoods are rows and cell types are columns. 
The values in the matrix indicate the number of cells of a given type within a 
neighbourhood. 

Let's do the same for our extended neighbourhoods.
```{r [Sp10]}
NBHDByCTMatrixExtended = 
  computeNBHDByCTMatrix(extendedNeighbours, clusters)
```

### Analysis of contact based interactions between cell types

We can go on to calculate a matrix which gives the fraction of contacts cells of
a given type make with cells of another cell type.
```{r [Sp11]}
cellTypesPerCellTypeMatrix = 
  computeCellTypesPerCellTypeMatrix(NBHDByCTMatrix,clusters)
```

Rows and columns both correspond to cell types, but they are playing
different roles.  For a given row (say, cell type A) the entry in
column B represents the fraction of cells in neighbourhoods of cells
of type A that are of type B.

We can then display this as a force directed graph. Here we choose only to 
display contact based interactions that constitute at least 5% of a cell type's
interactions. Of note, this graph is directed as, for example, 50% of cell type
A's interactions might be with cell type B, but only 5% of cell type B's 
interactions might be with cell type A.   
```{r [Sp12]}
colours = DiscretePalette(length(levels(clusters)), palette = "polychrome")
names(colours) = levels(clusters)

cellTypesPerCellTypeGraphFromCellMatrix(cellTypesPerCellTypeMatrix, 
                                    minWeight = 0.05, colours = colours)
```

Here arrows are directed from rows to columns. Thus, we see an arrow
from 12_Pvalb to 18_pyramidal because neighbourhoods of cells of type
12_Pvalb are composed (across the dataset) of ~15% cells of type
18_pyramidal.  We do not see an arrow from 18_pyramidal to 12_Pvalb
because neighbourhoods of cells of type 18_pyramidal have only 3%
cells of type 12_Pvalb, which falls below the chosen cutoff.

It's worth pointing out the following. The number of edges from cells of
type A to cells of type B is the same as the number of edges from
cells of type B to cells of type A.  Thus the matrix of counts of
these edges is symmetric.  However, numbers of cells of types A and B
are not necessarily equal, and this accounts for the assymmetry in the
fractions. 

```{r [Sp13], message=FALSE}
library(pheatmap,quietly=TRUE)
pheatmap(cellTypesPerCellTypeMatrix)
```

Let's do the same for our extended nighbourhoods.

```{r [Sp14]}
cellTypesPerCellTypeMatrixExtended = computeCellTypesPerCellTypeMatrix(NBHDByCTMatrixExtended,clusters)

cellTypesPerCellTypeGraphFromCellMatrix(cellTypesPerCellTypeMatrixExtended,
minWeight = 0.05, colours = colours)
```

We can also calculate p values (upper tail, one-sided) for whether
cell types are more commonly neighbours than expected by chance. To do
this we compare the actual spatial neighbour graph to randomised
spatial neighbour graphs where edges are randomised but the degree of
each vertice is preserved. As this is carried out on the level of
counts of undirected edges between celltypes the p value matrix is
symmetric.

```{r [Sp15]}
cellTypesPerCellTypePValues = computeNeighbourEnrichment(delaunayNeighbours, 
                                          clusters, nSim=100,verbose = FALSE)
```


Let's plot -log10(pvalue)
```{r [Sp16]}
cellTypesPerCellTypePValuesNegLog = -log10(cellTypesPerCellTypePValues)
pheatmap(cellTypesPerCellTypePValuesNegLog)
```

## Analysis of neighbourhoods based on cell type composition

As mentioned above we can create Seurat objects of neighbourhoods
where the underlying counts matrix gives the the number of cells of
each type in a given neighbourhood.  We can now perform dimensionality
reduction and clustering based on neighbourhood composition. As the
dimensionality of the feature space is relatively low (number of cell
types) we calculate the UMAP using features rather than PCs.

```{r [Sp17]}
NBHDByCTSeurat = computeNBHDVsCTObject(NBHDByCTMatrix,verbose=FALSE)
```

Add cell type information to the neighbourhoodSeurat object.
```{r [Sp18]}
NBHDByCTSeurat$cellType = clusters
```

Visualise neighbourhood clusters.
```{r [Sp19]}
DimPlot(NBHDByCTSeurat, group.by = c("cellType"), cols = "polychrome", reduction = "umap")
DimPlot(NBHDByCTSeurat, group.by = c("neighbourhood_clusters"), cols = "polychrome", reduction = "umap")
```


We can now add information on neighbourhood clusters to our original
Xenium object and visualise these on the tissue.


```{r [Sp20]}
smallXenium$NBHDCluster = NBHDByCTSeurat@active.ident
ImageDimPlot(smallXenium, group.by = "NBHDCluster", size = 1, cols = "polychrome")
```

Let's try the same thing with our extended neighbourhoods up to degree 4.
```{r [Sp21]}
NBHDByCTSeuratExtended = computeNBHDVsCTObject(NBHDByCTMatrixExtended,
                                               verbose=FALSE)
```
Add cell type information to the NBHDByCTSeuratExtended object.
```{r [Sp22]}
NBHDByCTSeuratExtended$cellType = clusters
```

Visualise extended neighbourhood clusters.
```{r [Sp23]}
DimPlot(NBHDByCTSeuratExtended, group.by = c("cellType"), cols = "polychrome", reduction = "umap")
DimPlot(NBHDByCTSeuratExtended, group.by = c("neighbourhood_clusters"), cols = "polychrome", reduction = "umap")
```

We can now add information on extended neighbourhood clusters to our
original Xenium object and visualise these on the tissue.

```{r [Sp24]}
smallXenium$NBHDClusterExtended= 
  NBHDByCTSeuratExtended@active.ident
ImageDimPlot(smallXenium, group.by = c("NBHDClusterExtended"), 
             size = 1, cols = "polychrome")
```


Here we retrieve fewer clusters, and these describe tissue architecture 
rather than small variations in cellular niche.  

We leave it to the user to decide which approach is most applicable to the 
biological question at hand.


## Relating cell type clusters to neighbourhood clusters

We can now ask how clusters defined transcriptomically (cell type)
relate to those defined based on neighbourhoods (cell niche).  Put
differently, each cell has its (transcriptomic) cell type and sits at
the center of a neighbourhood with a given neighbourhood type.  For
each cell type A and each neighbourhood type B, we can ask what
percentage of the time a cell of type A sits at the center of a
neighbourhood of type B.

```{r [Sp25]}
CTByNBHDCluster = table(NBHDByCTSeurat$cellType,NBHDByCTSeurat@active.ident)
CTByNBHDCluster = CTByNBHDCluster/rowSums(CTByNBHDCluster)

rownames(CTByNBHDCluster) = paste0("CellType",rownames(CTByNBHDCluster))
colnames(CTByNBHDCluster) = paste0("NBHDCluster",colnames(CTByNBHDCluster))

pheatmap(CTByNBHDCluster,
      fontsize_row=8,
      fontsize_col=8,
      cellheight=10,
      cellwidth=10)

sankeyFromMatrix(CTByNBHDCluster)
```

This allows us to see which cell types share the same niche (neighbourhood 
clusters). Let's also perform this analysis using the extended neighbourhoods. 


```{r [Sp26]}
CTByNBHDClusterExtended = table(NBHDByCTSeuratExtended$cellType,NBHDByCTSeuratExtended@active.ident)
CTByNBHDClusterExtended = CTByNBHDClusterExtended/rowSums(CTByNBHDClusterExtended)

rownames(CTByNBHDClusterExtended) = paste0("CellType",rownames(CTByNBHDClusterExtended))
colnames(CTByNBHDClusterExtended) = paste0("NBHDCluster",colnames(CTByNBHDClusterExtended))

pheatmap(CTByNBHDClusterExtended,
      fontsize_row=8,
      fontsize_col=8,
      cellheight=10,
      cellwidth=10)

sankeyFromMatrix(CTByNBHDClusterExtended)
```

## Analysing cell types based on their neighbourhoods

Now we perform dimensionality reduction and clustering of cell types, based on the 
neighbourhoods they are found in. Note that this is based on the transpose of 
the matrix used in the previous section.

First we create a Seurat object for a cell type by neighbourhood matrix (the 
transposed NBHDByCTMatrix). Here we have a 
large feature space (number of cells) however a low number of observations 
(number of cell types). Therefore we compute the UMAP using PCs, however due to 
the low number of observations we need to set a lower value for n.neighbours.


```{r [Sp27]}
CTByNBHDSeurat = 
  computeNBHDVsCTObject(t(NBHDByCTMatrix), npcs = 10, 
                        transpose = TRUE, resolution = 1, n.neighbors = 5,
			verbose=FALSE)

CTByNBHDSeurat$cellType = colnames(CTByNBHDSeurat)

DimPlot(CTByNBHDSeurat, group.by = "cellType", cols = "polychrome", 
        reduction = "umap", label = TRUE)
```

Note how Pyramidal, Pvalb and Lamp5 neurons are located close together
in UMAP space indicating they are often found in the same
neighbourhoods.

We can also compute a force-directed graph embedding.

```{r [Sp28]}
CTByNBHDSeurat= computeGraphEmbedding(CTByNBHDSeurat)

DimPlot(CTByNBHDSeurat,group.by = "cellType", cols = "alphabet", reduction = "graph", label = TRUE)
```

From this  11_oligo and 5_Astro appear to be in neighbourhoods distinct from the 
other cell types. Looking at cell type localisation in the tissue shows that 
they colocalise.
```{r [Sp29]}
ImageDimPlot(smallXenium, cols = "polychrome", size = 1)
```


Clustering cell types using the Louvain algorithm acheives only one cluster. 
Let's try hierachical clustering.
```{r [Sp30]}
pca = Embeddings(CTByNBHDSeurat, reduction = "pca")
res = pheatmap(pca)
```

Here we decide to cut the tree at the correct level to form 11 clusters.
```{r [Sp31]}
CTClust = cutree(res$tree_row, k = 11)
CTByNBHDSeurat$neighbourhood_clusters = factor(CTClust)
```

Let's look at the celltype composition of these clusters.
```{r [Sp32]}
CTComposition = table(CTByNBHDSeurat$cellType, CTByNBHDSeurat$neighbourhood_clusters)
pheatmap(CTComposition)
```

Although most clusters here are formed of a single cell type, cluster 2
contains several cell types. We anticipate this analysis may be more informative 
when performed on a larger tissue area, as some cell types are poorly 
represented in this subset. To this end we demonstrate how to explore the 
relationship between neighbourhood clusters and cell type clusters.
```{r [Sp33]}
averageExpMatrix = getAverageExpressionMatrix(NBHDByCTSeurat,
                           CTByNBHDSeurat,
			    clusteringName='neighbourhood_clusters')
averageExpMatrix = tagRowAndColNames(averageExpMatrix,
                                     ccTag='neighbourhoodClusters_',
                                     gcTag='cellTypeClusters_')
pheatmap(averageExpMatrix,
      fontsize_row=8,
      fontsize_col=8,
      cellheight=10,
      cellwidth=10)


sankeyFromMatrix(averageExpMatrix)
```

## Detection of genes with spatially variable expression.

Given a neighbour graph describing the spatial relationships between
cells, we can compute spatial autocorrelation (here we use Moran's
I). This describes how clustered gene expression values are in
space. We derive an upper tail p value based on permutation testing
where expression values for genes are permuted. N.B in this
implementation equal weights are given to all neighbours of a cell.
Minimum values are limited by nSim, the number of permutations used.

```{r [Sp34]}
moransI = runMoransI(smallXenium, delaunayNeighbours, assay = "SCT", 
                     layer = "data", nSim = 20, verbose = FALSE)
```
Look at most spatially autocorrelated genes.
```{r [Sp35]}
head(moransI)
```

Look at least spatially autocorrelated genes.
```{r [Sp36]}
tail(moransI)
```

Here we visualise the most spatially autocorrelated gene.
```{r [Sp37]}
ImageFeaturePlot(smallXenium, "Nwd2")
```

Here we visualise the least spatially autocorrelated gene.
```{r [Sp38]}
ImageFeaturePlot(smallXenium, "Trbc2")
```

## Ligand receptor analysis

We can perform an analysis of ligand receptor interactions in order to
infer communication between cells.  Here we calculate whether
interactions between ligand receptor pairs occur on edges of the
spatial graph, i.e. between neighbouring cells, where cell A expresses
a ligand and cell B expresses a receptor. Note that the distance
between cell A and cell B will vary depending on how the graph has
been constructed, larger distances may be desired for analyses
involving diffusable ligands. We also analyse the number
of ligand receptor interactions that occur between cells from the same
cluster and between cells from different clusters. Note that clusters
may represent cell type, or another property of the cells such as
their neighbourhood type. We leave this for the user to decide. We
then calculate how frequently the number of ligand-receptor
interactions observed between/within clusters is higher than in
simulated data - this allows us to assign a pvalue for the enrichment
of these interactions.


```{r [Sp39]}

## Running this example takes a while:
## ligandReceptorResults = performLigandReceptorAnalysis(smallXenium, delaunayNeighbours, 
##                                                "mouse", clusters,verbose=FALSE)
## Accordingly, we retrieve precomputed results:
ligandReceptorResults = getExample('ligandReceptorResults')
```


We look at interactions on edges.  For concision, we display the
first six of our 28 ligand-receptor pairs.  Since ligand-receptor
interactions are directed, each undirected edge will show up both as
cell A - cell B and as cell B - cell A.
```{r [Sp40]}
head(ligandReceptorResults$interactionsOnEdges[,1:10],10)
```

Look at total interactions between/within clusters:
```{r [Sp41]}
head(ligandReceptorResults$totalInteractionsByCluster[,1:10],10)
```

Look at mean interactions per edge between/within clusters (sum of ligand 
receptor interactions on A-B edges)/(total A-B edges):
```{r [Sp42]}
head(ligandReceptorResults$meanInteractionsByCluster[,1:10],10)
```

Look at number of times observed interactions are more frequent than expected:
```{r [Sp43]}
head(ligandReceptorResults$simResults[,1:10],10)
```

Look at pvalues (upper tail) for the enrichment of interactions.
```{r [Sp44]}
head(ligandReceptorResults$pValues,10)
```


Plot a heatmap showing -log10(pvalues) for the enrichment of ligand-receptor 
interactions between pairs of clusters. 

```{r [Sp45]}
ligRecMatrix = makeLRInteractionHeatmap(ligandReceptorResults, clusters, colours = colours, labelClusterPairs = FALSE)
```

Make a heatmap of total ligand-receptor interactions between clusters.
```{r [Sp46]}
cellTypePerCellTypeLigRecMatrix = makeSummedLRInteractionHeatmap(ligandReceptorResults, clusters, "total")
```

Make a heatmap of summed mean ligand/receptor interactions between
clusters i.e., (sum of all ligand receptor interactions on A-B edges)/(total
A-B edges)

```{r [Sp47]}
cellTypePerCellTypeLigRecMatrix = makeSummedLRInteractionHeatmap(ligandReceptorResults, clusters, "mean", logScale = TRUE)
```

Look at a histogram of values for summed mean ligand/receptor interactions between clusters.
```{r [Sp48]}
hist(cellTypePerCellTypeLigRecMatrix)
```

Visualise in graph format.
```{r [Sp49]}
cellTypesPerCellTypeGraphFromCellMatrix(cellTypePerCellTypeLigRecMatrix, 
                                    minWeight = 0.4, colours = colours)
```
Here the arrows are very thick - we can scale by a constant factor to improve the
presentation.

```{r [Sp50]}
scaleFactor = 3
cellTypesPerCellTypeGraphFromCellMatrix(cellTypePerCellTypeLigRecMatrix/scaleFactor, 
                                    minWeight = 0.4/scaleFactor, colours = colours)
```

### Visualising ligand-receptor interactions: Seurat objects of edges

We wish to create a Seurat object where the points are edges between
cells as a way of visualising ligand-receptor interactions.  Since
these interactions are asymmetric we need to consider directed edges.
We will place the spatial coordinates of the edges A-B and B-A
slightly separate but both between the centroids of cells A and B. The
"expression matrix" is the binarised presence/absence of an
interaction (ligand receptor pair) on an edge. This is useful for
visualising where ligand receptor interactions occur spatially.

```{r [Sp51]}
edgeSeurat = computeEdgeObject(ligandReceptorResults, centroids)
```


For example we can visualise the Pdyn-Npy2r interaction.
```{r [Sp52]}
ImageFeaturePlot(edgeSeurat, features = "Pdyn-Npy2r")
```

In principle, the edge Seurat object allows for the computation of an
edge UMAP and Louvain clustering of the edges based on ligand-receptor
interaction.  However some of these analyses are computationally
expensive due to large number of edges.


### Spatial autocorrelation of ligand-receptor interactions

We can compute a spatial graph where edges in the original delaunayNeighbours 
become nodes and A-B edges (in the original graph) become connected to all 
A- edges and all B- edges. This allows us to perfom graph-based
analyses associated with 
the spatial localisation of ligand receptor pairs on edges.  
```{r [Sp53]}
edgeNeighbours = computeEdgeGraph(delaunayNeighbours)
```

Compute Moran's I for the spatial autocorrelation of ligand-receptor interactions.
```{r [Sp54] , eval=FALSE}
moransILigandReceptor = runMoransI(edgeSeurat, edgeNeighbours, assay = "RNA", 
                     layer = "counts", nSim = 100)
```

View most spatially autocorrelated ligand-receptor interactions.
```{r [Sp55]}
moransILigandReceptor = getExample('moransILigandReceptor')
head(moransILigandReceptor)
```

View least spatially autocorrelated ligand-receptor interactions.
```{r [Sp56]}
tail(moransILigandReceptor)
```


View ligand-receptor interaction with the highest spatial autocorrelation (Moran's I).
```{r [Sp57]}
ImageFeaturePlot(edgeSeurat, "Penk-Htr1f")
```

View ligand-receptor interaction with the lowest spatial autocorrelation (Moran's I).
```{r [Sp58]}
ImageFeaturePlot(edgeSeurat, "Sst-Gpr17")
```

## Quality control of Delaunay neighbours

We use Delaunay triangulation to estimate neighbour to neighbour
contact between cells.  However, a cell which happens to sit next to a
tissue void is likely to have one or more Delaunay neighbours which
are cells that sit on the other side of this void.  We hope to detect
these spurious edges based on their lengths.  However, the distances
between neighbouring cells' centroids tend to depend on the types of
the neighbouring cells.  Accordingly, we have supplied functions to
annotate edges with their distances and cell types; compute estimates
for edge-length cutoffs based on cell type pair; plot the
distributions of lengths together with proposed cutoffs; and subset
the edges based on cutoffs.

### edgeLengthsAndCellTypePairs

This functions annotates edges with their lengths and the cell types
of the two cells in question.

```{r [Sp59]}
annEdges =
edgeLengthsAndCellTypePairs(delaunayNeighbours,clusters,centroids)
head(annEdges)
```

### Visualising the edge lengths

Here we visualise the distribution of edge lengths together with
proposed cutoffs.  We will say more in a moment about how these
proposed cutoffs are calculated.

```{r [Sp60]}
cutoffDF = edgeCutoffsByPercentile(annEdges,percentileCutof=95)
g = edgeLengthPlot(annEdges,cutoffDF,whichPairs=60)
print(g)
```

In this case, there are 111 types of edges based on the cell types of
the cells they connect.  By setting whichPairs=60, we are restricting
to those types for which there are at least 60 such edges.  Here we
are also using a default xlim of 100.  Passing the value NULL for
cutoffDF will produce a plot without these cutoffs.

### Estimating edge length cutoffs

We offer several methods for estimating edge length cutoffs.  We
recommend visual inspection before settling on any specific set of
results. Each of these returns a data frame with two columns,
cellTypePair and cutoff.  As we have just seen this can be used for
the plotting function and it can also be used for culling the edges.
Since these results are returned as a data frame, they can be tuned
"by hand" by modifying the cutoff value for any individual cell type
pair.

#### edgeCutoffsByClustering

```{r [Sp61]}
cutoffDF = edgeCutoffsByClustering(annEdges)
```


This method works well in a small percentage of cases.  For each cell
type pair it performs k-means clustering of the lengths with k=2 and
produces a cutoff halfway between the two clusters.  This works well
when there really are two clusters of edges, those between truly
neighbouring cells and those across tissue voids.  However, in the
case where all the edges are between neighbouring cells, this will
produce a spurious cutoff.

#### edgeCutoffsByPercentile


```{r [Sp62]}
cutoffDF = edgeCutoffsByPercentile(annEdges,percentileCutoff=95)
```


This method produces cutoffs based on a chosen percentile applied
across all cell type pairs.  It has the advantages that its results
are easily comprehensible in terms of the chosen parameter.  The user
my want to "mix and match" using different percentiles for different
cell type pairs.

#### edgeCutoffsByZScore


```{r [Sp63]}
cutoffDF = edgeCutoffsByZScore(annEdges,zCutoff=1.5)
```


Very similar to the previous method, using z-score instead of
percentile.

#### edgeCutoffsByWatershed


```{r [Sp64]}
cutoffDF = edgeCutoffsByWatershed(annEdges,nbins=15,tolerance=10)
```

For each cell type pair, this function first computes a histogram of
the distances.  It then uses watershed segmentation to discover the
"humps" of this histogram.  The most important of these will be the
one containing the median of the distances.  This reports as a cutoff
the midpoint of the first histogram bin which is past this main hump.
The parameter nbins specifies the number of bins for the histogram.
The parameter tolerance controls the sensitivity of the watershed
algorithm.  

#### cullEdges

```{r [Sp65]}
cutoffDF = edgeCutoffsByWatershed(annEdges,nbins=15,tolerance=10)
culledEdges = cullEdges(annEdges,cutoffDF)
nrow(annEdges)
nrow(culledEdges)
```

This subsets the edges according to the parameter cutoffSpec.  This
can be a cutoff data frame as produced by any of the edgeCutoff
functions or can be a single number in which case all the cell type
pairs will be trimmed to the same cutoff.

```{r [Sp66]}
sessionInfo()
```