This tutorial aims to demonstrate how to use the updated iCluster functions to identify cancer molecular subtypes using multi-omics data. We use the uveal melanoma (UM) multi-omics data generated by The Cancer Genome Atlas (TCGA) project (Robertson et al., Cancer Cell 2018) for the demonstration. The UM dataset consists of somatic mutation, DNA copy number, DNA methylation and RNA-seq gene expression data for 80 UM primary samples. The level-3 UM multi-omics data available at Firebrowse portal were processed to form 4 data matrices (rows representing samples, and columns representing omics features) for iCluster analysis. The data procession were detailed in Mo et al., 2021. In the following, we briefly describe the 4 data matrices.

library(iClusterPlus)
library(patchwork)
library(lattice)
library(umap)
library(ggplot2)
library(gplots)
library(survival)
library(survminer)
library(circlize)
library(ComplexHeatmap)
library(knitr)

TCGA UM multi-omics data matrices: rows and columns correspond to samples and features, respectively.

data(UM)

#Somatic mutation data. 1: mutation, 0: no-mutation.
dim(mut02) 
## [1]  80 116
#Copy number data: merged log2 ratios of chromosome segments using CNregions function
dim(cn) 
## [1]   80 2632
#Methylation data matrix made of the top 25% most variable genes.
dim(methy25)
## [1]   80 4052
#mRNA expression data matrix made of the top 25% most variable genes.
dim(mrna25)
## [1]   80 4776

The iCluster analysis of multi-omics data can be carried out in 4 steps as described in the following.

Step 1: Make PCA variance plot to determine the dimension of latent factor used for iCluster modeling.

The iCluster methods are based on latent factor model. We suggest using PCA variance plot to determine the number of latent factors used for iCluster modeling.

# PCA variance plot for continuous multi-omics data 
pcaVarPlot(xList=list(cn,methy25,mrna25),K=30)

From the PCA variance plot, we see that the rate of decrease of the explained variance starts to slow down significantly after the 5th PC. An optimal number of latent factors used for iCluster modeling can be chosen around the change point where the rate of decrease of the explained variance of the PCs starts to slow down significantly. As an example, here, we choose K=7 as the number of latent factors for iCluster modeling of the UM multi-omics data. It can be found the sample clustering results are quite consistent when K >= 7.

Step 2: Perform iCluster modeling

We first apply iClusterPlus2, an updated iClusterPlus function to the UM multi-omics data. The iCluster modeling generates a latent factor matrix (meanZ), which capture the major sources of variation of the multi-omics data. As a result, the latent factor matrix can be used for sample clustering and visualization.

set.seed(321)
date()
## [1] "Wed Mar 11 16:00:54 2026"
fit.um <- iClusterPlus2(xList=list(mut02,cn,methy25,mrna25),
                          type=c("binomial","gaussian","gaussian","gaussian"),
                          K=7,n.burnin=100,n.draw=200,maxiter=25,sdev=0.05)
## 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
date()
## [1] "Wed Mar 11 16:01:50 2026"
# check the number of genes (variables) whose coefficients are all 0
betaID <- list()
for(i in 1:4){
  betaID[[i]] <- apply(fit.um$beta[[i]],1,function(x){all(x==0)})
  cat(sum(betaID[[i]]), " ")
}
## 111  62  382  297

Step 3: Visualize sample clusters on a 2-dimensional space

Before clustering analysis, we project the samples to a 2-dimensional space to estimate the number of clusters.

## clustering of cells using Louvain method and the latent factor matrix generated by iCluster
umap.um <- umap(fit.um$meanZ)
colnames(umap.um$layout) <- c("UMAP1","UMAP2")
plot(umap.um$layout)

Step 4: Perform sample clustering

From the UMAP plot, it seems that there are 3 or 4 major clusters. Therefore, we perform k-means clustering on the UM samples using the latent factor matrix meanZ and let the number of clusters be 3 or 4. The user can also try other clustering methods (e.g., k-medoids) on the latent factor matrix meanZ.

km3 <- kmeans(fit.um$meanZ, 3, nstart = 100)
km4 <- kmeans(fit.um$meanZ, 4, nstart = 100)

# compare the 3 and 4 cluster solution
table(km3$cluster, km4$cluster)
##    
##      1  2  3  4
##   1  0  0  0 29
##   2  0 30  0  0
##   3 13  0  8  0
um.clusters <- data.frame(umap.um$layout,ic3=factor(km3$cluster),ic4=factor(km4$cluster))
p1 <- ggplot(um.clusters, aes(x=UMAP1, y=UMAP2, col=ic3)) + geom_point(size = 3)+theme_bw()
p2 <- ggplot(um.clusters, aes(x=UMAP1, y=UMAP2, col=ic4)) + geom_point(size = 3)+theme_bw()
p1+p2

Step 5: Draw the heatmap of the sample clusters

It is necessary to make a heatmap of the multi-omics data to see if the clusters are biologically meaningful. We draw the heatmaps for the 3 and 4 clusters separately. On the following heatmaps, the columns are the samples ordered by the clusters and the rows are the features ordered by the clustering methods specified by the plotHeatmap function.

Heatmap for the 3 iClusters

col.scheme <- alist()
col.scheme[[1]] <- colorpanel(2,low="white",high="black")
col.scheme[[2]] <- bluered(256)
col.scheme[[3]] <- bluered(256)
col.scheme[[4]] <- bluered(256)

chr <- unlist(strsplit(colnames(cn),"\\."))
chr <- chr[seq(1,length(chr),by=2)]
chr <- gsub("chr","",chr)
chr <- as.numeric(chr)

### heatmap for the 3 iClusters
fit.um$clusters <- factor(um.clusters$ic3) 
hm3c <- plotHeatmap(fit=fit.um,xList=list(mut02,cn,methy25,mrna25),
                    type=c("binomial","gaussian","gaussian","gaussian"), 
            threshold=rep(0.25, 4),feature.order=c(F,F,T,T),feature.scale=c(F,F,T,T),
            dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,chr=chr,
            plot.chr=c(F,T,F,F),sparse=c(T,F,T,T),cap=c(0,0.99,0.99,0.99))

Heatmap for the 4 iClusters.

fit.um$clusters <- factor(um.clusters$ic4)
hm4c <- plotHeatmap(fit=fit.um,xList=list(mut02,cn,methy25,mrna25),
                    type=c("binomial","gaussian","gaussian","gaussian"), 
            threshold=rep(0.25, 4),feature.order=c(F,F,T,T),feature.scale=c(F,F,T,T),
            dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,chr=chr,
            plot.chr=c(F,T,F,F),sparse=c(T,F,T,T),cap=c(0,0.99,0.99,0.99))

Comparing the 3 and 4 iClusters, we can see that the cluster characterized by chromosome 6 loss in the 3-cluster solution is further divided into 2 smaller clusters in the 4-cluster solution.

We can also use the ComplexHeatmap to make the multi-omics heatmap. The following is the heatmap of the 4 iClusters on which the columns are the features ordered by the clustering methods specified by the Heatmap function and the rows are the samples ordered by the clusters.

sample.ord <- order(um.clusters$ic4)
meth <- scale(methy25)
expr <- scale(mrna25)

image.datasets <- list(mut02,cn,meth,expr)
top.features <- list()
for(i in 1:4){
  ### keep all the CN features to show the full image of copy number variation
  if(i == 2){
    top.features[[i]] <- (image.datasets[[i]])[sample.ord,]
    cat(ncol(top.features[[i]]), " ")
  }else{ # select the top features for mutation, methylation, and gene expression data 
    sum.abs.coef <- apply(abs(fit.um$beta[[i]]),1,sum)
    top.features[[i]] <- (image.datasets[[i]])[sample.ord,sum.abs.coef>quantile(sum.abs.coef,0.5)]
    cat(ncol(top.features[[i]]), " ")
  }
}
## 5  2632  2026  2388
clusters.ord <- um.clusters$ic4[sample.ord]
colfun0 <- c("0"="white","1"="black")
colfun1 <- colorRamp2(c(-2, 0, 2), c("blue", "white", "red"))
colfun2 <- colorRamp2(c(-3, 0, 3), c("blue", "white", "red"))
col_letters <- c("1" = "red", "2" = "purple", "3" = "gold", "4"="green")

ht0 <- Heatmap(clusters.ord, name="iCluster",col = col_letters,width = unit(5, "mm"),cluster_rows = FALSE,cluster_columns=FALSE,
              column_labels="",row_labels=rep("",nrow(top.features[[1]])))
ht1 <- Heatmap(top.features[[1]], name = "Mutation",col=colfun0,cluster_rows = FALSE,clustering_method_columns="ward.D", column_labels=rep("",ncol(top.features[[1]])),row_labels=rep("",nrow(top.features[[1]])),use_raster=TRUE,width = unit(2, "cm"))
ht2 <- Heatmap(top.features[[2]], name = "Copy number",col=colfun1,cluster_rows = FALSE,cluster_columns = FALSE,clustering_method_columns="ward.D", column_labels=rep("",ncol(top.features[[2]])),row_labels=rep("",nrow(top.features[[2]])),use_raster=TRUE,width = unit(5, "cm"))
ht3 <- Heatmap(top.features[[3]], name = "Methylation",col=colfun2,cluster_rows = FALSE,clustering_method_columns="ward.D", column_labels=rep("",ncol(top.features[[3]])),row_labels=rep("",nrow(top.features[[3]])),use_raster=TRUE,width = unit(5, "cm"))
ht4 <- Heatmap(top.features[[4]], name = "mRNA",col=colfun2,cluster_rows = FALSE,clustering_method_columns="ward.D", column_labels=rep("",ncol(top.features[[4]])),row_labels=rep("",nrow(top.features[[4]])),use_raster=TRUE,width = unit(5, "cm"))

ht0 + ht1 + ht2 + ht3 + ht4

Step 6: check the overall survival of the iClusters

clin4 <- cbind(clin4[,1:15],um.clusters)

# OS of 3 clusters 
survdiff(Surv(osmonth,vital_status)~ic3,data=clin4)
## Call:
## survdiff(formula = Surv(osmonth, vital_status) ~ ic3, data = clin4)
## 
##        N Observed Expected (O-E)^2/E (O-E)^2/V
## ic3=1 29       13     7.49      4.05      6.05
## ic3=2 30        2    10.04      6.44     11.80
## ic3=3 21        8     5.47      1.17      1.58
## 
##  Chisq= 12  on 2 degrees of freedom, p= 0.002
surv.ic3 <- survfit(Surv(osmonth,vital_status)~ic3,data=clin4)
p3 <- ggsurvplot(surv.ic3, pval=TRUE)$plot

# OS of 4 clusters
survdiff(Surv(osmonth,vital_status)~ic4,data=clin4)
## Call:
## survdiff(formula = Surv(osmonth, vital_status) ~ ic4, data = clin4)
## 
##        N Observed Expected (O-E)^2/E (O-E)^2/V
## ic4=1 13        8     2.46     12.52     14.61
## ic4=2 30        2    10.04      6.44     11.80
## ic4=3  8        0     3.01      3.01      3.51
## ic4=4 29       13     7.49      4.05      6.05
## 
##  Chisq= 26.9  on 3 degrees of freedom, p= 6e-06
surv.ic4 <- survfit(Surv(osmonth,vital_status)~ic4,data=clin4)
p4 <- ggsurvplot(surv.ic4,pval=TRUE)$plot

p3 + p4

iCluster analysis using iCluster2b

If multi-omics data are continuous or can be transformed to a continuous format, the user may use function iCluster2b to perform iCluster analysis. The iCluster2b is an updated integrative clustering method originally reported by Shen, Wang and Mo. 2023. The user can use function tune.iCluster2b to search optimal lasso shrinkage parameters for multi-omics data. In the following analysis, we exclude the somatic mutation data, which is binary.

A. Perform iCluster modeling using iCluster2b on continuous multi-omics data.

# tune.iCluster2b search optimal lasso parameter lambda from min.lambda to max.lambda for iCluster2b function
date()
## [1] "Wed Mar 11 16:01:59 2026"
fit2 <- tune.iCluster2b(xList=list(cn,methy25, mrna25), min.lambda=10,max.lambda=1000,K=7,
                       method=c("lasso","lasso","lasso"),min.shrinkage.rate=rep(0.05,3),
                       lambda.iter=25,EM.iter=25)
## 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
date()
## [1] "Wed Mar 11 16:02:06 2026"
## check the number of coefficients are all 0
betaID <- list()
for(i in 1:3){
  betaID[[i]] <- apply(fit2$beta[[i]],1,function(x){all(x==0)})
  cat(sum(betaID[[i]]), " ")
}
## 233  687  505

B. Visualize sample clusters on a 2-dimensional UMAP space and perform k-means clustering

umap2 <- umap(fit2$meanZ)
colnames(umap2$layout) <- c("UMAP1","UMAP2")

km3.2 <- kmeans(fit2$meanZ, 3, nstart = 100)
km4.2 <- kmeans(fit2$meanZ, 4, nstart = 100)
um.clusters2 <- data.frame(umap2$layout,ic3=factor(km3.2$cluster),ic4=factor(km4.2$cluster))

# compare sample clusters generated by iClusterPlus2 and iCluster2b
table(um.clusters$ic3, um.clusters2$ic3)
##    
##      1  2  3
##   1 28  1  0
##   2  0 28  2
##   3  0  1 20
table(um.clusters$ic4, um.clusters2$ic4)
##    
##      1  2  3  4
##   1 12  1  0  0
##   2  0  2  0 28
##   3  0  8  0  0
##   4  0  0 28  1
p1.2 <- ggplot(um.clusters2, aes(x=UMAP1, y=UMAP2, col=ic3)) + geom_point(size = 3)+theme_bw()
p2.2 <- ggplot(um.clusters2, aes(x=UMAP1, y=UMAP2, col=ic4)) + geom_point(size = 3)+theme_bw()
p1.2 + p2.2

C. Make heatmap for the 4 iClusters produced by tune.iCluster2b

col.scheme2 = list()
col.scheme2[[1]] <- bluered(256)
col.scheme2[[2]] <- bluered(256)
col.scheme2[[3]] <- bluered(256)

fit2$clusters <- factor(um.clusters2$ic4,levels=c(1,2,3,4))
hm4c.2 <- plotHeatmap(fit=fit2,xList=list(cn,methy25,mrna25),
                      type=c("gaussian","gaussian","gaussian"), 
            threshold=rep(0.5, 3),feature.order=c(F,T,T),feature.scale=c(F,T,T),
            dist.method = "euclidean", hclust.method="ward.D",col.scheme=col.scheme2,chr=chr,
            plot.chr=c(T,F,F),sparse=c(F,T,T),cap=c(0.99,0.99,0.99))

Session Info
sessionInfo()
## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.7.2
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] grid      parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] knitr_1.50            ComplexHeatmap_2.26.0 circlize_0.4.16      
##  [4] survminer_0.5.1       ggpubr_0.6.2          survival_3.8-3       
##  [7] gplots_3.2.0          ggplot2_4.0.1         umap_0.2.10.0        
## [10] lattice_0.22-7        patchwork_1.3.2       iClusterPlus_1.47.2  
## [13] irlba_2.3.5.1         Matrix_1.7-4         
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1    dplyr_1.1.4         farver_2.1.2       
##  [4] S7_0.2.1            bitops_1.0-9        fastmap_1.2.0      
##  [7] digest_0.6.39       lifecycle_1.0.4     cluster_2.1.8.1    
## [10] magrittr_2.0.4      compiler_4.5.2      rlang_1.1.6        
## [13] sass_0.4.10         tools_4.5.2         yaml_2.3.10        
## [16] data.table_1.17.8   ggsignif_0.6.4      labeling_0.4.3     
## [19] askpass_1.2.1       reticulate_1.44.1   RColorBrewer_1.1-3 
## [22] abind_1.4-8         KernSmooth_2.23-26  withr_3.0.2        
## [25] purrr_1.2.0         BiocGenerics_0.56.0 stats4_4.5.2       
## [28] caTools_1.18.3      xtable_1.8-4        colorspace_2.1-2   
## [31] scales_1.4.0        gtools_3.9.5        iterators_1.0.14   
## [34] cli_3.6.5           crayon_1.5.3        rmarkdown_2.30     
## [37] generics_0.1.4      rstudioapi_0.17.1   RSpectra_0.16-2    
## [40] km.ci_0.5-6         rjson_0.2.23        cachem_1.1.0       
## [43] splines_4.5.2       matrixStats_1.5.0   survMisc_0.5.6     
## [46] vctrs_0.6.5         jsonlite_2.0.0      carData_3.0-5      
## [49] car_3.1-3           S4Vectors_0.48.0    IRanges_2.44.0     
## [52] GetoptLong_1.0.5    rstatix_0.7.3       clue_0.3-66        
## [55] Formula_1.2-5       magick_2.9.0        foreach_1.5.2      
## [58] tidyr_1.3.1         jquerylib_0.1.4     glue_1.8.0         
## [61] codetools_0.2-20    gtable_0.3.6        shape_1.4.6.1      
## [64] tibble_3.3.0        pillar_1.11.1       htmltools_0.5.8.1  
## [67] openssl_2.3.4       R6_2.6.1            KMsurv_0.1-6       
## [70] doParallel_1.0.17   evaluate_1.0.5      png_0.1-8          
## [73] backports_1.5.0     broom_1.0.10        bslib_0.9.0        
## [76] Rcpp_1.1.0          gridExtra_2.3       xfun_0.54          
## [79] zoo_1.8-14          pkgconfig_2.0.3     GlobalOptions_0.1.2