This tutorial demonstrates how to perform vertical integrative clustering (iCluster) analysis of single cell multi-omics data. The iCluster2b function in the iClusterPlus package will be used as an example. The iCluster2b implements an updated iCluster method originally reported by Shen, Wang and Mo. 2023. The CITE-seq dataset (Stuart, Butler et al., Cell 2019) consisting of scRNA-seq and ADT (antibody-derived tag) data of 30,672 cells from human bone marrow is used for the demonstration. The dataset can be obtained by installing the R package SeuratData.

We use the R scripts of the Seurat’s tutorial Weighted Nearest Neighbor Analysis to process the scRNA-seq and ADT data. The iCluster analysis of single cell multi-omics data can be carried out in 4 steps as described in the following.

library(iClusterPlus)
library(lattice)
library(gplots)
library(Seurat)
library(SeuratData)
library(cowplot)
library(dplyr)
library(Rphenograph)
library(umap)
library(mclust)
library(knitr)

Step 1: Processing data using the Seurat package

The scRNA-seq and ADT data are processed separately, which includes normalization, feature selection, scaling and dimension reduction.

InstallData("bmcite")
bm <- LoadData(ds = "bmcite")
DefaultAssay(bm) <- 'RNA'
bm <- NormalizeData(bm) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA()

# process ADT data; all the 25 features are used for dimensional reduction
DefaultAssay(bm) <- 'ADT'
VariableFeatures(bm) <- rownames(bm[["ADT"]])
bm <- NormalizeData(bm, normalization.method = 'CLR', margin = 2) %>% ScaleData() %>% RunPCA(reduction.name = 'apca')

Step 2: Perform iCluster modeling

The iCluster modeling generates a latent factor matrix (meanZ), which capture the major sources of variation of the multi-omics data (scRNA-seq and ADT). As a result, the latent factor matrix can be used for sample clustering and visualization.

# input data for iCluster2b 
BMlist <- list(ADT=t(bm[['ADT']]$scale.data),RNA=t(bm[['RNA']]$scale.data))

# PCA variance plot 
pcaVarPlot(BMlist,K=50)

From the variance plot, we can see that the explained variance starts to level off around the 10th PC. For iCluster modeling of single cell multi-omics data, we suggest selecting the PC at which the explained variance levels off. As an example, we select K=20 as the number of latent factors for iCluster modeling. It can be seen that the clustering results are quite consistent for a range of numbers (e.g., K = 20 - 30).

# since features have been selected, we set lambda to a small value to minimize feature selection
date()

## [1] "Fri Mar 20 15:17:15 2026"

bm.ic = iCluster2b(xList=BMlist,K=20,lambda=list(0.1,0.1),method=c("lasso","lasso"))
date()

## [1] "Fri Mar 20 15:17:32 2026"

Step 3: Clustering of cells using the latent factor matrix

We use the Louvain method implemented in the Rphenograph package to perform cell clustering analysis. Then, we project the cells to a 2-dimensional space using UMAP for visualization.

## clustering of cells using Louvain method and the latent factor matrix generated by iCluster
ic.graph <- Rphenograph(bm.ic$meanZ, k=30)

##   Finding nearest neighbors...DONE ~ 4.307 s
##   Compute jaccard coefficient between nearest-neighbor sets...DONE ~ 5.078 s
##   Build undirected graph from the weighted links...DONE ~ 0.906 s
##   Run louvain clustering on the graph ...DONE ~ 1.35 s

##   Return a community class
##   -Modularity value: 0.9018461 
##   -Number of clusters: 25

tabclst <- table(membership(ic.graph[[2]]))

## order the clusters based on cluster size
clstlevels <- as.numeric(names(tabclst)[order(tabclst,decreasing = TRUE)])
bm.iClusters <- factor(membership(ic.graph[[2]]),levels=clstlevels)
names(bm.iClusters) <- rownames(BMlist$ADT)

# Projecting cells to 2-dimensional space using UMAP
ic.umap <- umap(bm.ic$meanZ)
rownames(ic.umap$layout) <- rownames(BMlist$ADT)
colnames(ic.umap$layout) <- c("umap1","umap2")

## put umap layout and cluster IDs into the Seurat object bm 
bm[["ic.umap"]] <- CreateDimReducObject(embeddings = ic.umap$layout, key = "ic_umap")
bm$iCluster <- bm.iClusters

For comparison, we perform unsupervised clustering analysis using Seurat’s WNN method. For more details of WNN analysis, we refer users to the Seurat’s tutorial (Weighted Nearest Neighbor Analysis).

# Generate WNN graph 
bm <- FindMultiModalNeighbors(bm, reduction.list = list("pca", "apca"),dims.list <- list(1:30, 1:18), k.nn = 20,
   knn.graph.name = "wknn",snn.graph.name = "wsnn",modality.weight.name = "RNA.weight")
# multimodal neighbor matrix used for UMAP can be accessed using bm[['weighted.nn']]
# WNN graph can be accessed using bm[["wknn"]]
# SNN graph used for clustering can be accessed at bm[["wsnn"]]

bm <- RunUMAP(bm, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
bm <- FindClusters(bm, graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE)

Step 4: Visualization of cell clusters

We use Seurat’s function DimPlot to visualize the cells on a 2-dimensional UMAP space.

Visualization of cell clusters generated by iCluster analysis, alongside cell annotations

p1 <- DimPlot(bm, reduction = "ic.umap", group.by = "iCluster", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("iCluster")
p2 <- DimPlot(bm, reduction = "ic.umap", group.by = "celltype.l2", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("celltype")
p1 + p2

Heatmap of cell clusters generated by iCluster analysis

col.scheme <- alist()
col.scheme[[1]] <- bluered(256)
col.scheme[[2]] <- bluered(256)

bm.ic$clusters <- bm.iClusters
ic.hm <- plotHeatmap(fit=bm.ic,xList=BMlist,type=c("gaussian","gaussian"), 
            threshold=rep(0.25, 2),feature.order=c(T,T),feature.scale=c(F,F),
            dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,width=10,
            sparse=c(F,T),cap=c(0.99,0.99))

Visualization of cell clusters generated by Seurat WNN analysis, alongside cell annotations

p3 <- DimPlot(bm, reduction = "wnn.umap", group.by = "seurat_clusters", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("Seurat")
p4 <- DimPlot(bm, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("celltype")
p3 + p4

Heatmap of cell clusters generated by Seurat WNN analysis

bm.ic$clusters <- bm$seurat_clusters
wnn.hm <- plotHeatmap(fit=bm.ic,xList=BMlist,type=c("gaussian","gaussian"), 
            threshold=rep(0.25, 2),feature.order=c(T,T),feature.scale=c(F,F),
            dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,width=10,
            sparse=c(F,T),cap=c(0.99,0.99))

Finally, we compare the integrative clustering results of the iCluster and Seurat WNN using adjusted rand index. In general, the higher of the index, the better the results.

# iCluster vs. cell type annotation
adjustedRandIndex(bm$iCluster, bm$celltype.l2)

## [1] 0.7788463

# seurat WNN clusters vs. cell type annotation
adjustedRandIndex(bm$seurat_clusters, bm$celltype.l2)

## [1] 0.7188821

# iClusters vs. seurat WNN clusters 
adjustedRandIndex(bm$iCluster, bm$seurat_clusters)

## [1] 0.6997139

# table of annotated cell types and iClusters 
icTable <- table(bm$celltype.l2,bm.iClusters)
knitr::kable(icTable)

	3	4	6	2	9	5	8	17	10	16	11	24	7	14	12	1	18	15	20	13	21	19	23	22	25
CD14 Mono	0	4127	0	0	2232	0	1	0	0	1	12	0	0	4	6	2	5	0	0	3	0	28	0	40	25
CD16 Mono	0	0	0	0	12	0	0	0	0	0	0	0	0	0	420	0	0	0	0	0	0	1	0	0	0
CD4 Memory	221	0	0	3085	0	0	2	1	0	5	0	0	0	0	0	0	5	0	0	0	0	22	19	0	0
CD4 Naive	4345	1	1	56	0	0	0	1	0	1	0	0	0	0	0	0	1	0	1	0	0	73	20	0	0
CD56 bright NK	3	1	0	3	0	0	9	0	103	1	0	0	0	0	0	0	1	0	0	0	0	2	20	0	0
CD8 Effector_1	0	0	2	0	0	0	552	0	2	14	0	0	0	0	0	0	0	0	0	0	0	3	4	0	0
CD8 Effector_2	0	1	3	1	1	0	294	0	0	5	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0
CD8 Memory_1	0	0	10	0	0	0	406	0	0	21	0	0	0	0	0	0	0	0	0	0	0	0	7	0	0
CD8 Memory_2	0	1	1	1	0	0	503	0	1	37	1	0	0	1	0	0	3	0	0	0	0	6	0	0	0
CD8 Naive	1	1	3889	0	1	0	62	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	20	0	0
cDC2	0	13	0	0	120	0	0	0	0	0	0	0	1	347	0	0	0	0	0	0	0	1	0	0	0
gdT	0	0	0	2	0	0	13	0	0	349	0	0	0	0	0	0	0	0	0	0	0	3	0	0	0
GMP	0	23	0	0	4	0	0	0	0	0	573	8	0	2	0	0	136	0	0	2	0	0	0	0	0
HSC	1	0	0	0	0	0	0	0	0	1	0	247	0	0	0	12	0	0	0	0	0	2	0	0	0
LMPP	1	0	0	0	0	0	0	0	0	0	15	264	0	1	0	8	2	0	1	0	0	0	0	0	0
MAIT	0	1	0	1	0	0	26	0	0	488	0	0	0	0	0	0	0	0	0	0	0	1	3	0	0
Memory B	0	0	1	0	0	229	0	1389	1	1	0	0	0	0	0	0	0	0	0	0	0	8	1	0	0
Naive B	0	0	0	0	0	1756	0	117	0	0	0	0	0	0	0	0	0	0	8	0	0	15	0	0	4
NK	0	1	1	0	0	0	6	0	1237	8	1	0	0	0	0	0	8	0	0	0	0	4	1	0	0
pDC	0	0	0	0	0	0	0	0	0	0	0	3	0	6	0	1	0	318	0	0	0	0	0	0	0
Plasmablast	0	1	0	0	0	0	0	7	0	0	0	0	0	0	0	0	1	0	0	1	211	1	1	0	0
Prog_B 1	0	0	0	0	0	0	0	1	0	0	0	31	0	0	0	0	0	0	111	1	0	0	1	0	0
Prog_B 2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	4	0	125	0	0	0	0	0	0
Prog_DC	0	2	0	0	6	0	0	0	0	0	4	2	0	91	0	0	155	2	0	1	0	0	0	0	0
Prog_Mk	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	119	0	0	0	0	0	3	0	0	0
Prog_RBC	1	0	0	0	2	0	2	1	0	0	0	1	468	0	0	223	1	0	0	213	0	1	2	0	0
Treg	163	0	1	126	0	0	1	0	0	1	0	0	0	0	0	0	2	0	0	0	0	3	0	0	0

Session Info

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.7.2
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] future_1.68.0           knitr_1.50              mclust_6.1.2           
##  [4] umap_0.2.10.0           Rphenograph_0.99.1      igraph_2.2.1           
##  [7] ggplot2_4.0.1           dplyr_1.1.4             cowplot_1.2.0          
## [10] bmcite.SeuratData_0.3.0 SeuratData_0.2.2.9002   Seurat_5.3.1           
## [13] SeuratObject_5.2.0      sp_2.2-0                gplots_3.2.0           
## [16] lattice_0.22-7          iClusterPlus_1.47.2     irlba_2.3.5.1          
## [19] Matrix_1.7-4           
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     rstudioapi_0.17.1      jsonlite_2.0.0        
##   [4] magrittr_2.0.4         spatstat.utils_3.2-0   farver_2.1.2          
##   [7] rmarkdown_2.30         vctrs_0.6.5            ROCR_1.0-11           
##  [10] spatstat.explore_3.6-0 askpass_1.2.1          htmltools_0.5.8.1     
##  [13] sass_0.4.10            sctransform_0.4.2      parallelly_1.45.1     
##  [16] KernSmooth_2.23-26     bslib_0.9.0            htmlwidgets_1.6.4     
##  [19] ica_1.0-3              plyr_1.8.9             plotly_4.11.0         
##  [22] zoo_1.8-14             cachem_1.1.0           mime_0.13             
##  [25] lifecycle_1.0.4        pkgconfig_2.0.3        R6_2.6.1              
##  [28] fastmap_1.2.0          fitdistrplus_1.2-4     shiny_1.11.1          
##  [31] digest_0.6.39          patchwork_1.3.2        tensor_1.5.1          
##  [34] RSpectra_0.16-2        labeling_0.4.3         progressr_0.18.0      
##  [37] spatstat.sparse_3.1-0  httr_1.4.7             polyclip_1.10-7       
##  [40] abind_1.4-8            compiler_4.5.2         withr_3.0.2           
##  [43] S7_0.2.1               fastDummies_1.7.5      MASS_7.3-65           
##  [46] openssl_2.3.4          rappdirs_0.3.3         gtools_3.9.5          
##  [49] caTools_1.18.3         tools_4.5.2            lmtest_0.9-40         
##  [52] otel_0.2.0             httpuv_1.6.16          future.apply_1.20.0   
##  [55] goftest_1.2-3          glue_1.8.0             nlme_3.1-168          
##  [58] promises_1.5.0         grid_4.5.2             Rtsne_0.17            
##  [61] cluster_2.1.8.1        reshape2_1.4.5         generics_0.1.4        
##  [64] gtable_0.3.6           spatstat.data_3.1-9    tidyr_1.3.1           
##  [67] data.table_1.17.8      spatstat.geom_3.6-1    RcppAnnoy_0.0.22      
##  [70] ggrepel_0.9.6          RANN_2.6.2             pillar_1.11.1         
##  [73] stringr_1.6.0          spam_2.11-1            RcppHNSW_0.6.0        
##  [76] later_1.4.4            splines_4.5.2          survival_3.8-3        
##  [79] deldir_2.0-4           tidyselect_1.2.1       miniUI_0.1.2          
##  [82] pbapply_1.7-4          gridExtra_2.3          scattermore_1.2       
##  [85] xfun_0.54              matrixStats_1.5.0      stringi_1.8.7         
##  [88] lazyeval_0.2.2         yaml_2.3.10            evaluate_1.0.5        
##  [91] codetools_0.2-20       tibble_3.3.0           cli_3.6.5             
##  [94] uwot_0.2.4             xtable_1.8-4           reticulate_1.44.1     
##  [97] jquerylib_0.1.4        Rcpp_1.1.0             globals_0.18.0        
## [100] spatstat.random_3.4-3  png_0.1-8              spatstat.univar_3.1-5 
## [103] dotCall64_1.2          bitops_1.0-9           listenv_0.10.0        
## [106] viridisLite_0.4.2      scales_1.4.0           ggridges_0.5.7        
## [109] purrr_1.2.0            crayon_1.5.3           rlang_1.1.6

iCluster analysis of single cell multi-omics data

Qianxing Mo

Compiled: March 20, 2026

Step 1: Processing data using the Seurat package

Step 2: Perform iCluster modeling

Step 3: Clustering of cells using the latent factor matrix

Step 4: Visualization of cell clusters