--- title: "CatsCradle Spatial Vignette" author: "Anna Laddach and Michael Shapiro" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{CatsCradle Spatial Vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r [Sp1] setup, include = FALSE, warning = FALSE} knitr::opts_chunk$set( collapse = TRUE, fig.dim = c(6,6), comment = "#>" ) ``` `r BiocStyle::Biocpkg("BiocStyle")` ![](CatsCradleLogo.png){width=2in} ## Introduction Here we describe the tools that CatsCradle offers for exploiting the spatial relationships in spatial transcriptomics data. We will be using a subset of a Xenium data set that profiles the mouse hippocampus available from 10x genomics (https://www.10xgenomics.com/datasets/fresh-frozen-mouse-brain-for-xenium-explorer-demo-1-standard) Here we visualise this subset coloured by cell type. Here cell clusters (Louvain cluster) have been assigned cell type identities using RCTD along with a reference dataset (https://www.dropbox.com/s/cuowvm4vrf65pvq/allen_cortex.rds?dl=1) followed by minimal manual curation. Please note these assignments are not definitive and are for illustratory purporses only. ```{r [Sp2], message=FALSE} library(Seurat,quietly=TRUE) library(CatsCradle,quietly=TRUE) getExample = make.getExample() smallXenium = getExample('smallXenium') ImageDimPlot(smallXenium, cols = "polychrome", size = 1) ``` We will want to answer questions like the following: Is the expression of a particular gene localised? Who are the immediate neighbours of a given cell? Do certain types of cells tend to co-localise? Is the spatial association of two cell types, (say, glia and macrophages) statistically significant? For any pair of neighbouring cells, are they engaged in ligand-repceptor interactions? Can we classify different types of tissue neighbourhoods? Are different neighbourhood types characterised by the cell types found in them? The genes expressed in them? The ligand-receptor interactions taking place in them? ### Neighbourhoods A key concept here is that of a _neighbourhood_. A neighbourhood is a spatially contiguous set of cells. In every case we will consider, a neighbourhood is a set of cells _centred on a particular cell_. The simplest sort of neighbourhood consists of a cell together with its immediate neighbours. We compute these as a Delaunay triangulation using the centroids of the cells: ```{r [Sp3]} centroids = GetTissueCoordinates(smallXenium) rownames(centroids) = centroids$cell clusters = smallXenium@active.ident delaunayNeighbours = computeNeighboursDelaunay(centroids) head(delaunayNeighbours) ``` Two cells are neighbours if they appear on the same row. The neighbourhood of the cell 16307 consists of the following 7 cells: ```{r [Sp4]} idx = (delaunayNeighbours$nodeA == 16307 | delaunayNeighbours$nodeB == 16307) nbhd = unique(c(delaunayNeighbours$nodeA[idx], delaunayNeighbours$nodeB[idx])) nbhd ``` We can compute extended neighbourhoods by considering (say) each cell's neighbours, their neighbours, their neighbours and their neighbours. Here we have such an extended neighbourhood as a list and again as a data frame with each row giving a pair of cells in a common extended neighbourhood. In this case we are building an extended neighbourhood of combinatorial radius 4. ```{r [Sp5]} extendedNeighboursList = getExtendedNBHDs(delaunayNeighbours, 4) extendedNeighbours = collapseExtendedNBHDs(extendedNeighboursList, 4) ``` Now the extended neighbourhood of cell 16307 consists of 92 cells: ```{r [Sp6]} idx = (extendedNeighbours$nodeA == 16307 | extendedNeighbours$nodeB == 16307) nbhd = unique(c(extendedNeighbours$nodeA[idx], extendedNeighbours$nodeB[idx])) length(nbhd) ``` We can also create neighbours based on Euclidean distance in the tissue. ```{r [Sp7]} euclideanNeighbours = computeNeighboursEuclidean(centroids, threshold=20) ``` We see two reasons for looking at these extended neighbourhoods. One is that while the Delaunay triangulation might be more appropriate for looking at cell to cell interactions involving direct contact, extended neighbourhoods might be more appropriate for studying interactions based on diffusible ligands. The second is that when we go on to characterise _types_ of neighbourhoods, these extended neighbourhoods exhibit less noise. We offer two viewpoints as to what is "going on" in a neighbourhood. One is that a neighbourhood is characterised by the cell types that are found in it. In this view, just as a cell expresses genes, a neighbourhood "expresses" cell types. A second viewpoint is that a neighbourhood also expresses genes. That is, for each neighbourhood, we can compute the total gene expression across all the cells in that neighbourhood. In this vignette we will focus on the first viewpoint. This is not because we think the first viewpoint is more important or because we necessarily expect it to be more fruitful. It's because the second viewpoint so directly parallels standard Seurat single cell transcriptomics analyses that it requires less explanation. So before we move on to the neighbourhoods / cell types viewpoint, we give an overview of the neighbourhoods / aggregate gene expression viewpoint. ### Neighbourhoods as characterised by gene expression Here we create a Seurat object based on the aggregate gene expression in each of our extended neighbourhoods and display the clustering of the neighbourhoods (called aggregation_clusters) both on the tissue plot and on the resulting UMAP. ```{r [Sp8]} agg = aggregateGeneExpression(smallXenium,extendedNeighbours, verbose=FALSE) smallXenium$aggregateNBHDClusters = agg@active.ident ImageDimPlot(smallXenium,group.by='aggregateNBHDClusters',cols='polychrome') ``` Since neighbourhoods here are indexed by individual cells, the neighbourhood gene aggregation Seurat object is formally identical to a standard Seurat object. In particular, this allows the standard sorts of analyses including clustering of neighbourhoods into types; dimension reduction using PCA, UMAP, tSNE; discovery of marker genes for each neighbourhood type; plotting of aggregate gene expression on neighbourhood UMAP; use of CatsCradle to discover novel clustering of genes based on their expression across the different neighbourhoods. ## Neighbourhoods as characterised by cell types ### Calculation of neighbourhood celltype composition Given any of the spatial graphs describing neighbourhoods calculated above, we can now calculate the cell type composition of neighbourhoods. ```{r [Sp9]} NBHDByCTMatrix = computeNBHDByCTMatrix(delaunayNeighbours, clusters) ``` In the resulting matrix neighbourhoods are rows and cell types are columns. The values in the matrix indicate the number of cells of a given type within a neighbourhood. Let's do the same for our extended neighbourhoods. ```{r [Sp10]} NBHDByCTMatrixExtended = computeNBHDByCTMatrix(extendedNeighbours, clusters) ``` ### Analysis of contact based interactions between cell types We can go on to calculate a matrix which gives the fraction of contacts cells of a given type make with cells of another cell type. ```{r [Sp11]} cellTypesPerCellTypeMatrix = computeCellTypesPerCellTypeMatrix(NBHDByCTMatrix,clusters) ``` Rows and columns both correspond to cell types, but they are playing different roles. For a given row (say, cell type A) the entry in column B represents the fraction of cells in neighbourhoods of cells of type A that are of type B. We can then display this as a force directed graph. Here we choose only to display contact based interactions that constitute at least 5% of a cell type's interactions. Of note, this graph is directed as, for example, 50% of cell type A's interactions might be with cell type B, but only 5% of cell type B's interactions might be with cell type A. ```{r [Sp12]} colours = DiscretePalette(length(levels(clusters)), palette = "polychrome") names(colours) = levels(clusters) cellTypesPerCellTypeGraphFromCellMatrix(cellTypesPerCellTypeMatrix, minWeight = 0.05, colours = colours) ``` Here arrows are directed from rows to columns. Thus, we see an arrow from 12_Pvalb to 18_pyramidal because neighbourhoods of cells of type 12_Pvalb are composed (across the dataset) of ~15% cells of type 18_pyramidal. We do not see an arrow from 18_pyramidal to 12_Pvalb because neighbourhoods of cells of type 18_pyramidal have only 3% cells of type 12_Pvalb, which falls below the chosen cutoff. It's worth pointing out the following. The number of edges from cells of type A to cells of type B is the same as the number of edges from cells of type B to cells of type A. Thus the matrix of counts of these edges is symmetric. However, numbers of cells of types A and B are not necessarily equal, and this accounts for the assymmetry in the fractions. ```{r [Sp13], message=FALSE} library(pheatmap,quietly=TRUE) pheatmap(cellTypesPerCellTypeMatrix) ``` Let's do the same for our extended nighbourhoods. ```{r [Sp14]} cellTypesPerCellTypeMatrixExtended = computeCellTypesPerCellTypeMatrix(NBHDByCTMatrixExtended,clusters) cellTypesPerCellTypeGraphFromCellMatrix(cellTypesPerCellTypeMatrixExtended, minWeight = 0.05, colours = colours) ``` We can also calculate p values (upper tail, one-sided) for whether cell types are more commonly neighbours than expected by chance. To do this we compare the actual spatial neighbour graph to randomised spatial neighbour graphs where edges are randomised but the degree of each vertice is preserved. As this is carried out on the level of counts of undirected edges between celltypes the p value matrix is symmetric. ```{r [Sp15]} cellTypesPerCellTypePValues = computeNeighbourEnrichment(delaunayNeighbours, clusters, nSim=100,verbose = FALSE) ``` Let's plot -log10(pvalue) ```{r [Sp16]} cellTypesPerCellTypePValuesNegLog = -log10(cellTypesPerCellTypePValues) pheatmap(cellTypesPerCellTypePValuesNegLog) ``` ## Analysis of neighbourhoods based on cell type composition As mentioned above we can create Seurat objects of neighbourhoods where the underlying counts matrix gives the the number of cells of each type in a given neighbourhood. We can now perform dimensionality reduction and clustering based on neighbourhood composition. As the dimensionality of the feature space is relatively low (number of cell types) we calculate the UMAP using features rather than PCs. ```{r [Sp17]} NBHDByCTSeurat = computeNBHDVsCTObject(NBHDByCTMatrix,verbose=FALSE) ``` Add cell type information to the neighbourhoodSeurat object. ```{r [Sp18]} NBHDByCTSeurat$cellType = clusters ``` Visualise neighbourhood clusters. ```{r [Sp19]} DimPlot(NBHDByCTSeurat, group.by = c("cellType"), cols = "polychrome", reduction = "umap") DimPlot(NBHDByCTSeurat, group.by = c("neighbourhood_clusters"), cols = "polychrome", reduction = "umap") ``` We can now add information on neighbourhood clusters to our original Xenium object and visualise these on the tissue. ```{r [Sp20]} smallXenium$NBHDCluster = NBHDByCTSeurat@active.ident ImageDimPlot(smallXenium, group.by = "NBHDCluster", size = 1, cols = "polychrome") ``` Let's try the same thing with our extended neighbourhoods up to degree 4. ```{r [Sp21]} NBHDByCTSeuratExtended = computeNBHDVsCTObject(NBHDByCTMatrixExtended, verbose=FALSE) ``` Add cell type information to the NBHDByCTSeuratExtended object. ```{r [Sp22]} NBHDByCTSeuratExtended$cellType = clusters ``` Visualise extended neighbourhood clusters. ```{r [Sp23]} DimPlot(NBHDByCTSeuratExtended, group.by = c("cellType"), cols = "polychrome", reduction = "umap") DimPlot(NBHDByCTSeuratExtended, group.by = c("neighbourhood_clusters"), cols = "polychrome", reduction = "umap") ``` We can now add information on extended neighbourhood clusters to our original Xenium object and visualise these on the tissue. ```{r [Sp24]} smallXenium$NBHDClusterExtended= NBHDByCTSeuratExtended@active.ident ImageDimPlot(smallXenium, group.by = c("NBHDClusterExtended"), size = 1, cols = "polychrome") ``` Here we retrieve fewer clusters, and these describe tissue architecture rather than small variations in cellular niche. We leave it to the user to decide which approach is most applicable to the biological question at hand. ## Relating cell type clusters to neighbourhood clusters We can now ask how clusters defined transcriptomically (cell type) relate to those defined based on neighbourhoods (cell niche). Put differently, each cell has its (transcriptomic) cell type and sits at the center of a neighbourhood with a given neighbourhood type. For each cell type A and each neighbourhood type B, we can ask what percentage of the time a cell of type A sits at the center of a neighbourhood of type B. ```{r [Sp25]} CTByNBHDCluster = table(NBHDByCTSeurat$cellType,NBHDByCTSeurat@active.ident) CTByNBHDCluster = CTByNBHDCluster/rowSums(CTByNBHDCluster) rownames(CTByNBHDCluster) = paste0("CellType",rownames(CTByNBHDCluster)) colnames(CTByNBHDCluster) = paste0("NBHDCluster",colnames(CTByNBHDCluster)) pheatmap(CTByNBHDCluster, fontsize_row=8, fontsize_col=8, cellheight=10, cellwidth=10) sankeyFromMatrix(CTByNBHDCluster) ``` This allows us to see which cell types share the same niche (neighbourhood clusters). Let's also perform this analysis using the extended neighbourhoods. ```{r [Sp26]} CTByNBHDClusterExtended = table(NBHDByCTSeuratExtended$cellType,NBHDByCTSeuratExtended@active.ident) CTByNBHDClusterExtended = CTByNBHDClusterExtended/rowSums(CTByNBHDClusterExtended) rownames(CTByNBHDClusterExtended) = paste0("CellType",rownames(CTByNBHDClusterExtended)) colnames(CTByNBHDClusterExtended) = paste0("NBHDCluster",colnames(CTByNBHDClusterExtended)) pheatmap(CTByNBHDClusterExtended, fontsize_row=8, fontsize_col=8, cellheight=10, cellwidth=10) sankeyFromMatrix(CTByNBHDClusterExtended) ``` ## Analysing cell types based on their neighbourhoods Now we perform dimensionality reduction and clustering of cell types, based on the neighbourhoods they are found in. Note that this is based on the transpose of the matrix used in the previous section. First we create a Seurat object for a cell type by neighbourhood matrix (the transposed NBHDByCTMatrix). Here we have a large feature space (number of cells) however a low number of observations (number of cell types). Therefore we compute the UMAP using PCs, however due to the low number of observations we need to set a lower value for n.neighbours. ```{r [Sp27]} CTByNBHDSeurat = computeNBHDVsCTObject(t(NBHDByCTMatrix), npcs = 10, transpose = TRUE, resolution = 1, n.neighbors = 5, verbose=FALSE) CTByNBHDSeurat$cellType = colnames(CTByNBHDSeurat) DimPlot(CTByNBHDSeurat, group.by = "cellType", cols = "polychrome", reduction = "umap", label = TRUE) ``` Note how Pyramidal, Pvalb and Lamp5 neurons are located close together in UMAP space indicating they are often found in the same neighbourhoods. We can also compute a force-directed graph embedding. ```{r [Sp28]} CTByNBHDSeurat= computeGraphEmbedding(CTByNBHDSeurat) DimPlot(CTByNBHDSeurat,group.by = "cellType", cols = "alphabet", reduction = "graph", label = TRUE) ``` From this 11_oligo and 5_Astro appear to be in neighbourhoods distinct from the other cell types. Looking at cell type localisation in the tissue shows that they colocalise. ```{r [Sp29]} ImageDimPlot(smallXenium, cols = "polychrome", size = 1) ``` Clustering cell types using the Louvain algorithm acheives only one cluster. Let's try hierachical clustering. ```{r [Sp30]} pca = Embeddings(CTByNBHDSeurat, reduction = "pca") res = pheatmap(pca) ``` Here we decide to cut the tree at the correct level to form 11 clusters. ```{r [Sp31]} CTClust = cutree(res$tree_row, k = 11) CTByNBHDSeurat$neighbourhood_clusters = factor(CTClust) ``` Let's look at the celltype composition of these clusters. ```{r [Sp32]} CTComposition = table(CTByNBHDSeurat$cellType, CTByNBHDSeurat$neighbourhood_clusters) pheatmap(CTComposition) ``` Although most clusters here are formed of a single cell type, cluster 2 contains several cell types. We anticipate this analysis may be more informative when performed on a larger tissue area, as some cell types are poorly represented in this subset. To this end we demonstrate how to explore the relationship between neighbourhood clusters and cell type clusters. ```{r [Sp33]} averageExpMatrix = getAverageExpressionMatrix(NBHDByCTSeurat, CTByNBHDSeurat, clusteringName='neighbourhood_clusters') averageExpMatrix = tagRowAndColNames(averageExpMatrix, ccTag='neighbourhoodClusters_', gcTag='cellTypeClusters_') pheatmap(averageExpMatrix, fontsize_row=8, fontsize_col=8, cellheight=10, cellwidth=10) sankeyFromMatrix(averageExpMatrix) ``` ## Detection of genes with spatially variable expression. Given a neighbour graph describing the spatial relationships between cells, we can compute spatial autocorrelation (here we use Moran's I). This describes how clustered gene expression values are in space. We derive an upper tail p value based on permutation testing where expression values for genes are permuted. N.B in this implementation equal weights are given to all neighbours of a cell. Minimum values are limited by nSim, the number of permutations used. ```{r [Sp34]} moransI = runMoransI(smallXenium, delaunayNeighbours, assay = "SCT", layer = "data", nSim = 20, verbose = FALSE) ``` Look at most spatially autocorrelated genes. ```{r [Sp35]} head(moransI) ``` Look at least spatially autocorrelated genes. ```{r [Sp36]} tail(moransI) ``` Here we visualise the most spatially autocorrelated gene. ```{r [Sp37]} ImageFeaturePlot(smallXenium, "Nwd2") ``` Here we visualise the least spatially autocorrelated gene. ```{r [Sp38]} ImageFeaturePlot(smallXenium, "Trbc2") ``` ## Ligand receptor analysis We can perform an analysis of ligand receptor interactions in order to infer communication between cells. Here we calculate whether interactions between ligand receptor pairs occur on edges of the spatial graph, i.e. between neighbouring cells, where cell A expresses a ligand and cell B expresses a receptor. Note that the distance between cell A and cell B will vary depending on how the graph has been constructed, larger distances may be desired for analyses involving diffusable ligands. We also analyse the number of ligand receptor interactions that occur between cells from the same cluster and between cells from different clusters. Note that clusters may represent cell type, or another property of the cells such as their neighbourhood type. We leave this for the user to decide. We then calculate how frequently the number of ligand-receptor interactions observed between/within clusters is higher than in simulated data - this allows us to assign a pvalue for the enrichment of these interactions. ```{r [Sp39]} ## Running this example takes a while: ## ligandReceptorResults = performLigandReceptorAnalysis(smallXenium, delaunayNeighbours, ## "mouse", clusters,verbose=FALSE) ## Accordingly, we retrieve precomputed results: ligandReceptorResults = getExample('ligandReceptorResults') ``` We look at interactions on edges. For concision, we display the first six of our 28 ligand-receptor pairs. Since ligand-receptor interactions are directed, each undirected edge will show up both as cell A - cell B and as cell B - cell A. ```{r [Sp40]} head(ligandReceptorResults$interactionsOnEdges[,1:10],10) ``` Look at total interactions between/within clusters: ```{r [Sp41]} head(ligandReceptorResults$totalInteractionsByCluster[,1:10],10) ``` Look at mean interactions per edge between/within clusters (sum of ligand receptor interactions on A-B edges)/(total A-B edges): ```{r [Sp42]} head(ligandReceptorResults$meanInteractionsByCluster[,1:10],10) ``` Look at number of times observed interactions are more frequent than expected: ```{r [Sp43]} head(ligandReceptorResults$simResults[,1:10],10) ``` Look at pvalues (upper tail) for the enrichment of interactions. ```{r [Sp44]} head(ligandReceptorResults$pValues,10) ``` Plot a heatmap showing -log10(pvalues) for the enrichment of ligand-receptor interactions between pairs of clusters. ```{r [Sp45]} ligRecMatrix = makeLRInteractionHeatmap(ligandReceptorResults, clusters, colours = colours, labelClusterPairs = FALSE) ``` Make a heatmap of total ligand-receptor interactions between clusters. ```{r [Sp46]} cellTypePerCellTypeLigRecMatrix = makeSummedLRInteractionHeatmap(ligandReceptorResults, clusters, "total") ``` Make a heatmap of summed mean ligand/receptor interactions between clusters i.e., (sum of all ligand receptor interactions on A-B edges)/(total A-B edges) ```{r [Sp47]} cellTypePerCellTypeLigRecMatrix = makeSummedLRInteractionHeatmap(ligandReceptorResults, clusters, "mean", logScale = TRUE) ``` Look at a histogram of values for summed mean ligand/receptor interactions between clusters. ```{r [Sp48]} hist(cellTypePerCellTypeLigRecMatrix) ``` Visualise in graph format. ```{r [Sp49]} cellTypesPerCellTypeGraphFromCellMatrix(cellTypePerCellTypeLigRecMatrix, minWeight = 0.4, colours = colours) ``` Here the arrows are very thick - we can scale by a constant factor to improve the presentation. ```{r [Sp50]} scaleFactor = 3 cellTypesPerCellTypeGraphFromCellMatrix(cellTypePerCellTypeLigRecMatrix/scaleFactor, minWeight = 0.4/scaleFactor, colours = colours) ``` ### Visualising ligand-receptor interactions: Seurat objects of edges We wish to create a Seurat object where the points are edges between cells as a way of visualising ligand-receptor interactions. Since these interactions are asymmetric we need to consider directed edges. We will place the spatial coordinates of the edges A-B and B-A slightly separate but both between the centroids of cells A and B. The "expression matrix" is the binarised presence/absence of an interaction (ligand receptor pair) on an edge. This is useful for visualising where ligand receptor interactions occur spatially. ```{r [Sp51]} edgeSeurat = computeEdgeObject(ligandReceptorResults, centroids) ``` For example we can visualise the Pdyn-Npy2r interaction. ```{r [Sp52]} ImageFeaturePlot(edgeSeurat, features = "Pdyn-Npy2r") ``` In principle, the edge Seurat object allows for the computation of an edge UMAP and Louvain clustering of the edges based on ligand-receptor interaction. However some of these analyses are computationally expensive due to large number of edges. ### Spatial autocorrelation of ligand-receptor interactions We can compute a spatial graph where edges in the original delaunayNeighbours become nodes and A-B edges (in the original graph) become connected to all A- edges and all B- edges. This allows us to perfom graph-based analyses associated with the spatial localisation of ligand receptor pairs on edges. ```{r [Sp53]} edgeNeighbours = computeEdgeGraph(delaunayNeighbours) ``` Compute Moran's I for the spatial autocorrelation of ligand-receptor interactions. ```{r [Sp54] , eval=FALSE} moransILigandReceptor = runMoransI(edgeSeurat, edgeNeighbours, assay = "RNA", layer = "counts", nSim = 100) ``` View most spatially autocorrelated ligand-receptor interactions. ```{r [Sp55]} moransILigandReceptor = getExample('moransILigandReceptor') head(moransILigandReceptor) ``` View least spatially autocorrelated ligand-receptor interactions. ```{r [Sp56]} tail(moransILigandReceptor) ``` View ligand-receptor interaction with the highest spatial autocorrelation (Moran's I). ```{r [Sp57]} ImageFeaturePlot(edgeSeurat, "Penk-Htr1f") ``` View ligand-receptor interaction with the lowest spatial autocorrelation (Moran's I). ```{r [Sp58]} ImageFeaturePlot(edgeSeurat, "Sst-Gpr17") ``` ## Quality control of Delaunay neighbours We use Delaunay triangulation to estimate neighbour to neighbour contact between cells. However, a cell which happens to sit next to a tissue void is likely to have one or more Delaunay neighbours which are cells that sit on the other side of this void. We hope to detect these spurious edges based on their lengths. However, the distances between neighbouring cells' centroids tend to depend on the types of the neighbouring cells. Accordingly, we have supplied functions to annotate edges with their distances and cell types; compute estimates for edge-length cutoffs based on cell type pair; plot the distributions of lengths together with proposed cutoffs; and subset the edges based on cutoffs. ### edgeLengthsAndCellTypePairs This functions annotates edges with their lengths and the cell types of the two cells in question. ```{r [Sp59]} annEdges = edgeLengthsAndCellTypePairs(delaunayNeighbours,clusters,centroids) head(annEdges) ``` ### Visualising the edge lengths Here we visualise the distribution of edge lengths together with proposed cutoffs. We will say more in a moment about how these proposed cutoffs are calculated. ```{r [Sp60]} cutoffDF = edgeCutoffsByPercentile(annEdges,percentileCutof=95) g = edgeLengthPlot(annEdges,cutoffDF,whichPairs=60) print(g) ``` In this case, there are 111 types of edges based on the cell types of the cells they connect. By setting whichPairs=60, we are restricting to those types for which there are at least 60 such edges. Here we are also using a default xlim of 100. Passing the value NULL for cutoffDF will produce a plot without these cutoffs. ### Estimating edge length cutoffs We offer several methods for estimating edge length cutoffs. We recommend visual inspection before settling on any specific set of results. Each of these returns a data frame with two columns, cellTypePair and cutoff. As we have just seen this can be used for the plotting function and it can also be used for culling the edges. Since these results are returned as a data frame, they can be tuned "by hand" by modifying the cutoff value for any individual cell type pair. #### edgeCutoffsByClustering ```{r [Sp61]} cutoffDF = edgeCutoffsByClustering(annEdges) ``` This method works well in a small percentage of cases. For each cell type pair it performs k-means clustering of the lengths with k=2 and produces a cutoff halfway between the two clusters. This works well when there really are two clusters of edges, those between truly neighbouring cells and those across tissue voids. However, in the case where all the edges are between neighbouring cells, this will produce a spurious cutoff. #### edgeCutoffsByPercentile ```{r [Sp62]} cutoffDF = edgeCutoffsByPercentile(annEdges,percentileCutoff=95) ``` This method produces cutoffs based on a chosen percentile applied across all cell type pairs. It has the advantages that its results are easily comprehensible in terms of the chosen parameter. The user my want to "mix and match" using different percentiles for different cell type pairs. #### edgeCutoffsByZScore ```{r [Sp63]} cutoffDF = edgeCutoffsByZScore(annEdges,zCutoff=1.5) ``` Very similar to the previous method, using z-score instead of percentile. #### edgeCutoffsByWatershed ```{r [Sp64]} cutoffDF = edgeCutoffsByWatershed(annEdges,nbins=15,tolerance=10) ``` For each cell type pair, this function first computes a histogram of the distances. It then uses watershed segmentation to discover the "humps" of this histogram. The most important of these will be the one containing the median of the distances. This reports as a cutoff the midpoint of the first histogram bin which is past this main hump. The parameter nbins specifies the number of bins for the histogram. The parameter tolerance controls the sensitivity of the watershed algorithm. #### cullEdges ```{r [Sp65]} cutoffDF = edgeCutoffsByWatershed(annEdges,nbins=15,tolerance=10) culledEdges = cullEdges(annEdges,cutoffDF) nrow(annEdges) nrow(culledEdges) ``` This subsets the edges according to the parameter cutoffSpec. This can be a cutoff data frame as produced by any of the edgeCutoff functions or can be a single number in which case all the cell type pairs will be trimmed to the same cutoff. ```{r [Sp66]} sessionInfo() ```