BiocStyle 2.22.0
Single-cell ’omics analysis enables high-resolution characterization of heterogeneous populations of cells by quantifying measurements in individual cells and thus provides a fuller, more nuanced picture into the complexity and heterogeneity between cells. However, the data also present new and significant challenges as compared to previous approaches, especially as single-cell data are much larger and sparser than data generated from bulk sequencing methods. Dimensionality reduction is a key step in the single-cell analysis to address the high dimensionality and sparsity of these data, and to enable the application of more complex, computationally expensive downstream pipelines.
Correspondence analysis (CA) is a matrix factorization method, and is similar to
principal components analysis (PCA). Whereas PCA is designed for application to
continuous, approximately normally distributed data, CA is appropriate for
non-negative, count-based data that are in the same additive scale. corral
implements CA for dimensionality reduction of a single matrix of single-cell data.
See the vignette for corralm
for the multi-table adaptation of CA for single-cell batch alignment/integration.
corral can be used with various types of input. When called on a matrix (or other matrix-like object), it returns a list with the SVD output, principal coordinates, and standard coordinates. When called on a SingleCellExperiment, it returns the SingleCellExperiment with the corral embeddings in the reducedDim
slot named corral
. To retrieve the full list output from a SingleCellExperiment
input, the fullout
argument can be set to TRUE
.
We will use the Zhengmix4eq
dataset from the DuoClustering2018 package.
library(corral)
library(SingleCellExperiment)
library(ggplot2)
library(DuoClustering2018)
zm4eq.sce <- sce_full_Zhengmix4eq()
This dataset includes approximately 4,000 pre-sorted and annotated cells of 4 types mixed by Duo et al. in approximately equal proportions (Duò, Robinson, and Soneson, n.d.). The cells were sampled from a “Massively parallel digital transcriptional profiling of single cells” (Zheng et al. 2017).
zm4eq.sce
## class: SingleCellExperiment
## dim: 15568 3994
## metadata(1): log.exprs.offset
## assays(3): counts logcounts normcounts
## rownames(15568): ENSG00000237683 ENSG00000228327 ... ENSG00000215700
## ENSG00000215699
## rowData names(10): id symbol ... total_counts log10_total_counts
## colnames(3994): b.cells1147 b.cells6276 ... regulatory.t1084
## regulatory.t9696
## colData names(14): dataset barcode ... libsize.drop feature.drop
## reducedDimNames(2): PCA TSNE
## mainExpName: NULL
## altExpNames(0):
table(colData(zm4eq.sce)$phenoid)
##
## b.cells cd14.monocytes naive.cytotoxic regulatory.t
## 999 1000 998 997
corral
on SingleCellExperimentWe will run corral
directly on the raw count data:
zm4eq.sce <- corral(inp = zm4eq.sce,
whichmat = 'counts')
zm4eq.sce
## class: SingleCellExperiment
## dim: 15568 3994
## metadata(1): log.exprs.offset
## assays(3): counts logcounts normcounts
## rownames(15568): ENSG00000237683 ENSG00000228327 ... ENSG00000215700
## ENSG00000215699
## rowData names(10): id symbol ... total_counts log10_total_counts
## colnames(3994): b.cells1147 b.cells6276 ... regulatory.t1084
## regulatory.t9696
## colData names(14): dataset barcode ... libsize.drop feature.drop
## reducedDimNames(3): PCA TSNE corral
## mainExpName: NULL
## altExpNames(0):
We can use plot_embedding
to visualize the output:
plot_embedding_sce(sce = zm4eq.sce,
which_embedding = 'corral',
plot_title = 'corral on Zhengmix4eq',
color_attr = 'phenoid',
color_title = 'cell type',
saveplot = FALSE)
Using the scater
package, we can also add and visualize umap
and tsne
embeddings based on the corral
output:
library(scater)
## Loading required package: scuttle
library(gridExtra) # so we can arrange the plots side by side
zm4eq.sce <- runUMAP(zm4eq.sce,
dimred = 'corral',
name = 'corral_UMAP')
zm4eq.sce <- runTSNE(zm4eq.sce,
dimred = 'corral',
name = 'corral_TSNE')
ggplot_umap <- plot_embedding_sce(sce = zm4eq.sce,
which_embedding = 'corral_UMAP',
plot_title = 'Zhengmix4eq corral with UMAP',
color_attr = 'phenoid',
color_title = 'cell type',
returngg = TRUE,
showplot = FALSE,
saveplot = FALSE)
ggplot_tsne <- plot_embedding_sce(sce = zm4eq.sce,
which_embedding = 'corral_TSNE',
plot_title = 'Zhengmix4eq corral with tSNE',
color_attr = 'phenoid',
color_title = 'cell type',
returngg = TRUE,
showplot = FALSE,
saveplot = FALSE)
multiplot(ggplot_umap, ggplot_tsne, cols = 2)
## Warning: 'multiplot' is deprecated.
## Use 'gridExtra::grid.arrange' instead.
## See help("Deprecated")
The corral
embeddings stored in the reducedDim
slot can be used in
downstream analysis, such as for clustering or trajectory analysis.
corral
can also be run on a SummarizedExperiment
object.
corral
on matrixcorral
can also be performed on a matrix (or matrix-like) input.
zm4eq.countmat <- assay(zm4eq.sce,'counts')
zm4eq.countcorral <- corral(zm4eq.countmat)
The output is in a list
format, including the SVD output (u
,d
,v
),
the standard coordinates (SCu
,SCv
), and the principal coordinates (PCu
,PCv
).
zm4eq.countcorral
## corral output summary===========================================
## Output "list" includes standard coordinates (SCu, SCv),
## principal coordinates (PCu, PCv), & SVD output (u, d, v)
## Variance explained----------------------------------------------
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## percent.Var.explained 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## cumulative.Var.explained 0.01 0.02 0.02 0.02 0.03 0.03 0.03 0.03
##
## Dimensions of output elements-----------------------------------
## Singular values (d) :: 30
## Left singular vectors & coordinates (u, SCu, PCu) :: 15568 30
## Right singular vectors & coordinates (v, SCv, PCv) :: 3994 30
## See corral help for details on each output element.
## Use plot_embedding to visualize; see docs for details.
## ================================================================
We can use plot_embedding
to visualize the output:
(the embeddings are in the v
matrix because these data are by genes in the
rows and have cells in the columns; if this were reversed, with cells in the
rows and genes/features in the column, then the cell embeddings would instead
be in the u
matrix.)
celltype_vec <- colData(zm4eq.sce)$phenoid
plot_embedding(embedding = zm4eq.countcorral$v,
plot_title = 'corral on Zhengmix4eq',
color_vec = celltype_vec,
color_title = 'cell type',
saveplot = FALSE)
The output is the same as above with the SingleCellExperiment
, and can be
passed as the low-dimension embedding for downstream analysis. Similarly,
UMAP and tSNE can be computed for visualization. (Note that in performing SVD,
the direction of the axes doesn’t matter so they may be flipped between runs,
as corral
and corralm
use irlba
to perform fast approximation.)
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.14-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.14-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.22.0 scuttle_1.4.0
## [3] DuoClustering2018_1.11.0 ggplot2_3.3.5
## [5] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
## [7] Biobase_2.54.0 GenomicRanges_1.46.0
## [9] GenomeInfoDb_1.30.0 IRanges_2.28.0
## [11] S4Vectors_0.32.0 BiocGenerics_0.40.0
## [13] MatrixGenerics_1.6.0 matrixStats_0.61.0
## [15] corral_1.4.0 gridExtra_2.3
## [17] BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 ggbeeswarm_0.6.0
## [3] colorspace_2.0-2 ellipsis_0.3.2
## [5] mclust_5.4.7 XVector_0.34.0
## [7] BiocNeighbors_1.12.0 dichromat_2.0-0
## [9] farver_2.1.0 ggrepel_0.9.1
## [11] MultiAssayExperiment_1.20.0 bit64_4.0.5
## [13] RSpectra_0.16-0 interactiveDisplayBase_1.32.0
## [15] AnnotationDbi_1.56.0 fansi_0.5.0
## [17] sparseMatrixStats_1.6.0 cachem_1.0.6
## [19] knitr_1.36 jsonlite_1.7.2
## [21] dbplyr_2.1.1 png_0.1-7
## [23] uwot_0.1.10 shiny_1.7.1
## [25] BiocManager_1.30.16 mapproj_1.2.7
## [27] compiler_4.1.1 httr_1.4.2
## [29] assertthat_0.2.1 Matrix_1.3-4
## [31] fastmap_1.1.0 BiocSingular_1.10.0
## [33] later_1.3.0 htmltools_0.5.2
## [35] tools_4.1.1 rsvd_1.0.5
## [37] gtable_0.3.0 glue_1.4.2
## [39] GenomeInfoDbData_1.2.7 reshape2_1.4.4
## [41] dplyr_1.0.7 ggthemes_4.2.4
## [43] maps_3.4.0 rappdirs_0.3.3
## [45] Rcpp_1.0.7 jquerylib_0.1.4
## [47] vctrs_0.3.8 Biostrings_2.62.0
## [49] ExperimentHub_2.2.0 DelayedMatrixStats_1.16.0
## [51] xfun_0.27 stringr_1.4.0
## [53] beachmat_2.10.0 mime_0.12
## [55] lifecycle_1.0.1 irlba_2.3.3
## [57] AnnotationHub_3.2.0 zlibbioc_1.40.0
## [59] scales_1.1.1 promises_1.2.0.1
## [61] parallel_4.1.1 yaml_2.2.1
## [63] curl_4.3.2 memoise_2.0.0
## [65] sass_0.4.0 stringi_1.7.5
## [67] RSQLite_2.2.8 BiocVersion_3.14.0
## [69] highr_0.9 ScaledMatrix_1.2.0
## [71] filelock_1.0.2 BiocParallel_1.28.0
## [73] pals_1.7 rlang_0.4.12
## [75] pkgconfig_2.0.3 bitops_1.0-7
## [77] evaluate_0.14 lattice_0.20-45
## [79] purrr_0.3.4 labeling_0.4.2
## [81] transport_0.12-2 bit_4.0.4
## [83] tidyselect_1.1.1 plyr_1.8.6
## [85] magrittr_2.0.1 bookdown_0.24
## [87] R6_2.5.1 magick_2.7.3
## [89] generics_0.1.1 DelayedArray_0.20.0
## [91] DBI_1.1.1 pillar_1.6.4
## [93] withr_2.4.2 KEGGREST_1.34.0
## [95] RCurl_1.98-1.5 tibble_3.1.5
## [97] crayon_1.4.1 utf8_1.2.2
## [99] BiocFileCache_2.2.0 rmarkdown_2.11
## [101] viridis_0.6.2 grid_4.1.1
## [103] data.table_1.14.2 FNN_1.1.3
## [105] blob_1.2.2 digest_0.6.28
## [107] xtable_1.8-4 tidyr_1.1.4
## [109] httpuv_1.6.3 munsell_0.5.0
## [111] beeswarm_0.4.0 viridisLite_0.4.0
## [113] vipor_0.4.5 bslib_0.3.1
Duò, A, MD Robinson, and C Soneson. n.d. “A Systematic Performance Evaluation of Clustering Methods for Single-Cell Rna-Seq Data [Version 2; Peer Review: 2 Approved], Journal = F1000Research, Volume = 7, Year = 2018, Number = 1141, Doi = 10.12688/f1000research.15666.2.”
Zheng, Grace X. Y., Jessica M. Terry, Phillip Belgrader, Paul Ryvkin, Zachary W. Bent, Ryan Wilson, Solongo B. Ziraldo, et al. 2017. “Massively Parallel Digital Transcriptional Profiling of Single Cells.” Nature Communications 8 (1): 14049. https://doi.org/10.1038/ncomms14049.