This tutorial demonstrates how to perform vertical integrative clustering (iCluster) analysis of single cell multi-omics data. The iCluster2b function in the iClusterPlus package will be used as an example. The iCluster2b implements an updated iCluster method originally reported by Shen, Wang and Mo. 2023. The CITE-seq dataset (Stuart, Butler et al., Cell 2019) consisting of scRNA-seq and ADT (antibody-derived tag) data of 30,672 cells from human bone marrow is used for the demonstration. The dataset can be obtained by installing the R package SeuratData.
We use the R scripts of the Seurat’s tutorial Weighted Nearest Neighbor Analysis to process the scRNA-seq and ADT data. The iCluster analysis of single cell multi-omics data can be carried out in 4 steps as described in the following.
library(iClusterPlus)
library(lattice)
library(gplots)
library(Seurat)
library(SeuratData)
library(cowplot)
library(dplyr)
library(Rphenograph)
library(umap)
library(mclust)
library(knitr)
The scRNA-seq and ADT data are processed separately, which includes normalization, feature selection, scaling and dimension reduction.
InstallData("bmcite")
bm <- LoadData(ds = "bmcite")
DefaultAssay(bm) <- 'RNA'
bm <- NormalizeData(bm) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA()
# process ADT data; all the 25 features are used for dimensional reduction
DefaultAssay(bm) <- 'ADT'
VariableFeatures(bm) <- rownames(bm[["ADT"]])
bm <- NormalizeData(bm, normalization.method = 'CLR', margin = 2) %>% ScaleData() %>% RunPCA(reduction.name = 'apca')
The iCluster modeling generates a latent factor matrix (meanZ), which capture the major sources of variation of the multi-omics data (scRNA-seq and ADT). As a result, the latent factor matrix can be used for sample clustering and visualization.
# input data for iCluster2b
BMlist <- list(ADT=t(bm[['ADT']]$scale.data),RNA=t(bm[['RNA']]$scale.data))
# PCA variance plot
pcaVarPlot(BMlist,K=50)
From the variance plot, we can see that the explained variance starts to level off around the 10th PC. For iCluster modeling of single cell multi-omics data, we suggest selecting the PC at which the explained variance levels off. As an example, we select K=20 as the number of latent factors for iCluster modeling. It can be seen that the clustering results are quite consistent for a range of numbers (e.g., K = 20 - 30).
# since features have been selected, we set lambda to a small value to minimize feature selection
date()
## [1] "Wed Mar 11 16:05:19 2026"
bm.ic = iCluster2b(xList=BMlist,K=20,lambda=list(0.1,0.1),method=c("lasso","lasso"))
date()
## [1] "Wed Mar 11 16:05:36 2026"
We use the Louvain method implemented in the Rphenograph package to perform cell clustering analysis. Then, we project the cells to a 2-dimensional space using UMAP for visualization.
## clustering of cells using Louvain method and the latent factor matrix generated by iCluster
ic.graph <- Rphenograph(bm.ic$meanZ, k=30)
## Finding nearest neighbors...DONE ~ 4.322 s
## Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 4.947 s
## Build undirected graph from the weighted links...DONE ~ 0.911 s
## Run louvain clustering on the graph ...DONE ~ 1.36 s
## Return a community class
## -Modularity value: 0.9018461
## -Number of clusters: 25
tabclst <- table(membership(ic.graph[[2]]))
## order the clusters based on cluster size
clstlevels <- as.numeric(names(tabclst)[order(tabclst,decreasing = TRUE)])
bm.iClusters <- factor(membership(ic.graph[[2]]),levels=clstlevels)
names(bm.iClusters) <- rownames(BMlist$ADT)
# Projecting cells to 2-dimensional space using UMAP
ic.umap <- umap(bm.ic$meanZ)
rownames(ic.umap$layout) <- rownames(BMlist$ADT)
colnames(ic.umap$layout) <- c("umap1","umap2")
## put umap layout and cluster IDs into the Seurat object bm
bm[["ic.umap"]] <- CreateDimReducObject(embeddings = ic.umap$layout, key = "ic_umap")
bm$iCluster <- bm.iClusters
For comparison, we perform unsupervised clustering analysis using Seurat’s WNN method. For more details of WNN analysis, we refer users to the Seurat’s tutorial (Weighted Nearest Neighbor Analysis).
# Generate WNN graph
bm <- FindMultiModalNeighbors(bm, reduction.list = list("pca", "apca"),dims.list <- list(1:30, 1:18), k.nn = 20,
knn.graph.name = "wknn",snn.graph.name = "wsnn",modality.weight.name = "RNA.weight")
# multimodal neighbor matrix used for UMAP can be accessed using bm[['weighted.nn']]
# WNN graph can be accessed using bm[["wknn"]]
# SNN graph used for clustering can be accessed at bm[["wsnn"]]
bm <- RunUMAP(bm, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
bm <- FindClusters(bm, graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE)
We use Seurat’s function DimPlot to visualize the cells on a 2-dimensional UMAP space.
p1 <- DimPlot(bm, reduction = "ic.umap", group.by = "iCluster", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("iCluster")
p2 <- DimPlot(bm, reduction = "ic.umap", group.by = "celltype.l2", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("celltype")
p1 + p2
col.scheme <- alist()
col.scheme[[1]] <- bluered(256)
col.scheme[[2]] <- bluered(256)
bm.ic$clusters <- bm.iClusters
ic.hm <- plotHeatmap(fit=bm.ic,xList=BMlist,type=c("gaussian","gaussian"),
threshold=rep(0.25, 2),feature.order=c(T,T),feature.scale=c(F,F),
dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,width=10,
sparse=c(F,T),cap=c(0.99,0.99))
p3 <- DimPlot(bm, reduction = "wnn.umap", group.by = "seurat_clusters", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("Seurat")
p4 <- DimPlot(bm, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("celltype")
p3 + p4
bm.ic$clusters <- bm$seurat_clusters
wnn.hm <- plotHeatmap(fit=bm.ic,xList=BMlist,type=c("gaussian","gaussian"),
threshold=rep(0.25, 2),feature.order=c(T,T),feature.scale=c(F,F),
dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,width=10,
sparse=c(F,T),cap=c(0.99,0.99))
Finally, we compare the integrative clustering results of the iCluster and Seurat WNN using adjusted rand index. In general, the higher of the index, the better the results.
# iCluster vs. cell type annotation
adjustedRandIndex(bm$iCluster, bm$celltype.l2)
## [1] 0.7788463
# seurat WNN clusters vs. cell type annotation
adjustedRandIndex(bm$seurat_clusters, bm$celltype.l2)
## [1] 0.7188821
# iClusters vs. seurat WNN clusters
adjustedRandIndex(bm$iCluster, bm$seurat_clusters)
## [1] 0.6997139
# table of annotated cell types and iClusters
icTable <- table(bm$celltype.l2,bm.iClusters)
knitr::kable(icTable)
| 3 | 4 | 6 | 2 | 9 | 5 | 8 | 17 | 10 | 16 | 11 | 24 | 7 | 14 | 12 | 1 | 18 | 15 | 20 | 13 | 21 | 19 | 23 | 22 | 25 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CD14 Mono | 0 | 4127 | 0 | 0 | 2232 | 0 | 1 | 0 | 0 | 1 | 12 | 0 | 0 | 4 | 6 | 2 | 5 | 0 | 0 | 3 | 0 | 28 | 0 | 40 | 25 |
| CD16 Mono | 0 | 0 | 0 | 0 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 420 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| CD4 Memory | 221 | 0 | 0 | 3085 | 0 | 0 | 2 | 1 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 22 | 19 | 0 | 0 |
| CD4 Naive | 4345 | 1 | 1 | 56 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 73 | 20 | 0 | 0 |
| CD56 bright NK | 3 | 1 | 0 | 3 | 0 | 0 | 9 | 0 | 103 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 20 | 0 | 0 |
| CD8 Effector_1 | 0 | 0 | 2 | 0 | 0 | 0 | 552 | 0 | 2 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 4 | 0 | 0 |
| CD8 Effector_2 | 0 | 1 | 3 | 1 | 1 | 0 | 294 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 |
| CD8 Memory_1 | 0 | 0 | 10 | 0 | 0 | 0 | 406 | 0 | 0 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 |
| CD8 Memory_2 | 0 | 1 | 1 | 1 | 0 | 0 | 503 | 0 | 1 | 37 | 1 | 0 | 0 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 |
| CD8 Naive | 1 | 1 | 3889 | 0 | 1 | 0 | 62 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 | 0 |
| cDC2 | 0 | 13 | 0 | 0 | 120 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 347 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| gdT | 0 | 0 | 0 | 2 | 0 | 0 | 13 | 0 | 0 | 349 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 |
| GMP | 0 | 23 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 573 | 8 | 0 | 2 | 0 | 0 | 136 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
| HSC | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 247 | 0 | 0 | 0 | 12 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 |
| LMPP | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 264 | 0 | 1 | 0 | 8 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| MAIT | 0 | 1 | 0 | 1 | 0 | 0 | 26 | 0 | 0 | 488 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 0 | 0 |
| Memory B | 0 | 0 | 1 | 0 | 0 | 229 | 0 | 1389 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 1 | 0 | 0 |
| Naive B | 0 | 0 | 0 | 0 | 0 | 1756 | 0 | 117 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | 15 | 0 | 0 | 4 |
| NK | 0 | 1 | 1 | 0 | 0 | 0 | 6 | 0 | 1237 | 8 | 1 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 4 | 1 | 0 | 0 |
| pDC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 6 | 0 | 1 | 0 | 318 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Plasmablast | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 211 | 1 | 1 | 0 | 0 |
| Prog_B 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 31 | 0 | 0 | 0 | 0 | 0 | 0 | 111 | 1 | 0 | 0 | 1 | 0 | 0 |
| Prog_B 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 125 | 0 | 0 | 0 | 0 | 0 | 0 |
| Prog_DC | 0 | 2 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 4 | 2 | 0 | 91 | 0 | 0 | 155 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Prog_Mk | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 119 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 |
| Prog_RBC | 1 | 0 | 0 | 0 | 2 | 0 | 2 | 1 | 0 | 0 | 0 | 1 | 468 | 0 | 0 | 223 | 1 | 0 | 0 | 213 | 0 | 1 | 2 | 0 | 0 |
| Treg | 163 | 0 | 1 | 126 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 |
sessionInfo()
## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.7.2
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] future_1.68.0 knitr_1.50 mclust_6.1.2
## [4] umap_0.2.10.0 Rphenograph_0.99.1 igraph_2.2.1
## [7] ggplot2_4.0.1 dplyr_1.1.4 cowplot_1.2.0
## [10] bmcite.SeuratData_0.3.0 SeuratData_0.2.2.9002 Seurat_5.3.1
## [13] SeuratObject_5.2.0 sp_2.2-0 gplots_3.2.0
## [16] lattice_0.22-7 iClusterPlus_1.47.2 irlba_2.3.5.1
## [19] Matrix_1.7-4
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.17.1 jsonlite_2.0.0
## [4] magrittr_2.0.4 spatstat.utils_3.2-0 farver_2.1.2
## [7] rmarkdown_2.30 vctrs_0.6.5 ROCR_1.0-11
## [10] spatstat.explore_3.6-0 askpass_1.2.1 htmltools_0.5.8.1
## [13] sass_0.4.10 sctransform_0.4.2 parallelly_1.45.1
## [16] KernSmooth_2.23-26 bslib_0.9.0 htmlwidgets_1.6.4
## [19] ica_1.0-3 plyr_1.8.9 plotly_4.11.0
## [22] zoo_1.8-14 cachem_1.1.0 mime_0.13
## [25] lifecycle_1.0.4 pkgconfig_2.0.3 R6_2.6.1
## [28] fastmap_1.2.0 fitdistrplus_1.2-4 shiny_1.11.1
## [31] digest_0.6.39 patchwork_1.3.2 tensor_1.5.1
## [34] RSpectra_0.16-2 labeling_0.4.3 progressr_0.18.0
## [37] spatstat.sparse_3.1-0 httr_1.4.7 polyclip_1.10-7
## [40] abind_1.4-8 compiler_4.5.2 withr_3.0.2
## [43] S7_0.2.1 fastDummies_1.7.5 MASS_7.3-65
## [46] openssl_2.3.4 rappdirs_0.3.3 gtools_3.9.5
## [49] caTools_1.18.3 tools_4.5.2 lmtest_0.9-40
## [52] otel_0.2.0 httpuv_1.6.16 future.apply_1.20.0
## [55] goftest_1.2-3 glue_1.8.0 nlme_3.1-168
## [58] promises_1.5.0 grid_4.5.2 Rtsne_0.17
## [61] cluster_2.1.8.1 reshape2_1.4.5 generics_0.1.4
## [64] gtable_0.3.6 spatstat.data_3.1-9 tidyr_1.3.1
## [67] data.table_1.17.8 spatstat.geom_3.6-1 RcppAnnoy_0.0.22
## [70] ggrepel_0.9.6 RANN_2.6.2 pillar_1.11.1
## [73] stringr_1.6.0 spam_2.11-1 RcppHNSW_0.6.0
## [76] later_1.4.4 splines_4.5.2 survival_3.8-3
## [79] deldir_2.0-4 tidyselect_1.2.1 miniUI_0.1.2
## [82] pbapply_1.7-4 gridExtra_2.3 scattermore_1.2
## [85] xfun_0.54 matrixStats_1.5.0 stringi_1.8.7
## [88] lazyeval_0.2.2 yaml_2.3.10 evaluate_1.0.5
## [91] codetools_0.2-20 tibble_3.3.0 cli_3.6.5
## [94] uwot_0.2.4 xtable_1.8-4 reticulate_1.44.1
## [97] jquerylib_0.1.4 Rcpp_1.1.0 globals_0.18.0
## [100] spatstat.random_3.4-3 png_0.1-8 spatstat.univar_3.1-5
## [103] dotCall64_1.2 bitops_1.0-9 listenv_0.10.0
## [106] viridisLite_0.4.2 scales_1.4.0 ggridges_0.5.7
## [109] purrr_1.2.0 crayon_1.5.3 rlang_1.1.6