Chapter 8 Cross-annotating human pancreas
8.1 Loading the data
We load the Muraro et al. (2016) dataset as our reference, removing unlabelled cells or cells without a clear label.
library(scRNAseq)
sceM <- MuraroPancreasData()
sceM <- sceM[,!is.na(sceM$label) & sceM$label!="unclear"]
We compute log-expression values for use in marker detection inside SingleR()
.
We examine the distribution of labels in this reference.
##
## acinar alpha beta delta duct endothelial
## 219 812 448 193 245 21
## epsilon mesenchymal pp
## 3 80 101
We load the Grun et al. (2016) dataset as our test, applying some basic quality control to remove low-quality cells in some of the batches (see here for details).
sceG <- GrunPancreasData()
sceG <- addPerCellQC(sceG)
qc <- quickPerCellQC(colData(sceG),
percent_subsets="altexps_ERCC_percent",
batch=sceG$donor,
subset=sceG$donor %in% c("D17", "D7", "D2"))
sceG <- sceG[,!qc$discard]
Technically speaking, the test dataset does not need log-expression values but we compute them anyway for convenience.
8.2 Applying the annotation
We apply SingleR()
with Wilcoxon rank sum test-based marker detection to annotate the Grun dataset with the Muraro labels.
We examine the distribution of predicted labels:
##
## acinar alpha beta delta duct endothelial
## 277 203 181 50 306 5
## epsilon mesenchymal pp
## 1 22 19
We can also examine the number of discarded cells for each label:
## Lost
## Label FALSE TRUE
## acinar 251 26
## alpha 198 5
## beta 180 1
## delta 49 1
## duct 301 5
## endothelial 4 1
## epsilon 1 0
## mesenchymal 22 0
## pp 17 2
8.3 Diagnostics
We visualize the assignment scores for each label in Figure 8.1.
The delta for each cell is visualized in Figure 8.2.
Finally, we visualize the heatmaps of the marker genes for each label in Figure 8.3.
library(scater)
collected <- list()
all.markers <- metadata(pred.grun)$de.genes
sceG$labels <- pred.grun$labels
for (lab in unique(pred.grun$labels)) {
collected[[lab]] <- plotHeatmap(sceG, silent=TRUE,
order_columns_by="labels", main=lab,
features=unique(unlist(all.markers[[lab]])))[[4]]
}
do.call(gridExtra::grid.arrange, collected)
8.4 Comparison to clusters
For comparison, we will perform a quick unsupervised analysis of the Grun dataset.
We model the variances using the spike-in data and we perform graph-based clustering
(increasing the resolution by dropping k=5
).
library(scran)
decG <- modelGeneVarWithSpikes(sceG, "ERCC")
set.seed(1000100)
sceG <- denoisePCA(sceG, decG)
library(bluster)
sceG$cluster <- clusterRows(reducedDim(sceG), NNGraphParam(k=5))
We see that the clusters map reasonably well to the labels in Figure 8.4.
We proceed to the most important part of the analysis. Yes, that’s right, the \(t\)-SNE plot (Figure 8.5).
set.seed(101010100)
sceG <- runTSNE(sceG, dimred="PCA")
plotTSNE(sceG, colour_by="cluster", text_colour="red",
text_by=I(pred.grun$labels))
Session information
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] bluster_1.16.0 scran_1.34.0
[3] SingleR_2.8.0 scater_1.34.0
[5] ggplot2_3.5.1 scuttle_1.16.0
[7] scRNAseq_2.19.1 SingleCellExperiment_1.28.0
[9] SummarizedExperiment_1.36.0 Biobase_2.66.0
[11] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
[13] IRanges_2.40.0 S4Vectors_0.44.0
[15] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
[17] matrixStats_1.4.1 BiocStyle_2.34.0
[19] rebook_1.16.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 jsonlite_1.8.9
[3] CodeDepends_0.6.6 magrittr_2.0.3
[5] ggbeeswarm_0.7.2 GenomicFeatures_1.58.0
[7] gypsum_1.2.0 farver_2.1.2
[9] rmarkdown_2.28 BiocIO_1.16.0
[11] zlibbioc_1.52.0 vctrs_0.6.5
[13] DelayedMatrixStats_1.28.0 memoise_2.0.1
[15] Rsamtools_2.22.0 RCurl_1.98-1.16
[17] htmltools_0.5.8.1 S4Arrays_1.6.0
[19] AnnotationHub_3.14.0 curl_5.2.3
[21] BiocNeighbors_2.0.0 Rhdf5lib_1.28.0
[23] SparseArray_1.6.0 rhdf5_2.50.0
[25] sass_0.4.9 alabaster.base_1.6.0
[27] bslib_0.8.0 alabaster.sce_1.6.0
[29] httr2_1.0.5 cachem_1.1.0
[31] GenomicAlignments_1.42.0 igraph_2.1.1
[33] lifecycle_1.0.4 pkgconfig_2.0.3
[35] rsvd_1.0.5 Matrix_1.7-1
[37] R6_2.5.1 fastmap_1.2.0
[39] GenomeInfoDbData_1.2.13 digest_0.6.37
[41] colorspace_2.1-1 AnnotationDbi_1.68.0
[43] dqrng_0.4.1 irlba_2.3.5.1
[45] ExperimentHub_2.14.0 RSQLite_2.3.7
[47] beachmat_2.22.0 labeling_0.4.3
[49] filelock_1.0.3 fansi_1.0.6
[51] httr_1.4.7 abind_1.4-8
[53] compiler_4.4.1 bit64_4.5.2
[55] withr_3.0.2 BiocParallel_1.40.0
[57] viridis_0.6.5 DBI_1.2.3
[59] highr_0.11 HDF5Array_1.34.0
[61] alabaster.ranges_1.6.0 alabaster.schemas_1.6.0
[63] rappdirs_0.3.3 DelayedArray_0.32.0
[65] rjson_0.2.23 tools_4.4.1
[67] vipor_0.4.7 beeswarm_0.4.0
[69] glue_1.8.0 restfulr_0.0.15
[71] rhdf5filters_1.18.0 grid_4.4.1
[73] Rtsne_0.17 cluster_2.1.6
[75] generics_0.1.3 gtable_0.3.6
[77] ensembldb_2.30.0 metapod_1.14.0
[79] BiocSingular_1.22.0 ScaledMatrix_1.14.0
[81] utf8_1.2.4 XVector_0.46.0
[83] ggrepel_0.9.6 BiocVersion_3.20.0
[85] pillar_1.9.0 limma_3.62.0
[87] dplyr_1.1.4 BiocFileCache_2.14.0
[89] lattice_0.22-6 rtracklayer_1.66.0
[91] bit_4.5.0 tidyselect_1.2.1
[93] locfit_1.5-9.10 Biostrings_2.74.0
[95] knitr_1.48 gridExtra_2.3
[97] bookdown_0.41 ProtGenerics_1.38.0
[99] edgeR_4.4.0 xfun_0.48
[101] statmod_1.5.0 pheatmap_1.0.12
[103] UCSC.utils_1.2.0 lazyeval_0.2.2
[105] yaml_2.3.10 evaluate_1.0.1
[107] codetools_0.2-20 tibble_3.2.1
[109] alabaster.matrix_1.6.0 BiocManager_1.30.25
[111] graph_1.84.0 cli_3.6.3
[113] munsell_0.5.1 jquerylib_0.1.4
[115] Rcpp_1.0.13 dir.expiry_1.14.0
[117] dbplyr_2.5.0 png_0.1-8
[119] XML_3.99-0.17 parallel_4.4.1
[121] blob_1.2.4 AnnotationFilter_1.30.0
[123] sparseMatrixStats_1.18.0 bitops_1.0-9
[125] viridisLite_0.4.2 alabaster.se_1.6.0
[127] scales_1.3.0 crayon_1.5.3
[129] rlang_1.1.4 cowplot_1.1.3
[131] KEGGREST_1.46.0
Bibliography
Grun, D., M. J. Muraro, J. C. Boisset, K. Wiebrands, A. Lyubimova, G. Dharmadhikari, M. van den Born, et al. 2016. “De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data.” Cell Stem Cell 19 (2): 266–77.
Muraro, M. J., G. Dharmadhikari, D. Grun, N. Groen, T. Dielen, E. Jansen, L. van Gurp, et al. 2016. “A Single-Cell Transcriptome Atlas of the Human Pancreas.” Cell Syst 3 (4): 385–94.