\name{ConsensusClusterPlus} \alias{ConsensusClusterPlus} \alias{calcICL} \title{ run ConsensusClusterPlus} \description{ ConsensusClusterPlus function for determing cluster number and class membership by stability evidence. calcICL function for calculating cluster-consensus and item-consensus. } \usage{ ConsensusClusterPlus( d=NULL, maxK = 3, reps=10, pItem=0.8, pFeature=1, clusterAlg="hc",title="untitled_consensus_cluster", innerLinkage="average", finalLinkage="average", distance="pearson", ml=NULL, tmyPal=NULL,seed=NULL,plot=NULL,writeTable=FALSE,weightsItem=NULL,weightsFeature=NULL,verbose=F) calcICL(res,title="untitled_consensus_cluster",plot=NULL,writeTable=FALSE) } \arguments{ \item{d}{matrix where columns=items/samples and rows are features. For example, a gene expression matrix of genes in rows and microarrays in columns. OR ExpressionSet object.} \item{maxK}{integer value. maximum cluster number to evaluate. } \item{reps}{integer value. number of subsamples. } \item{pItem}{numerical value. proportion of items to sample. } \item{pFeature}{numerical value. proportion of features to sample. } \item{clusterAlg}{character value. cluster algorithm. "hc" heirarchical (hclust) or "km" for k-means. See Note. } \item{title}{ character value for output directory. Directory is created only if plot is not NULL or writeTable is TRUE. This title can be an abosulte or relative path. } \item{innerLinkage}{heirarchical linkage method for subsampling. } \item{finalLinkage}{heirarchical linkage method for consensus matrix. } \item{distance}{character value. sample distance measures: "pearson","spearman", or "euclidean". } \item{ml}{optional. prior result, if supplied then only do graphics and tables.} \item{tmyPal}{optional character vector of colors for consensus matrix} \item{seed}{optional numerical value. sets random seed for reproducible results.} \item{plot}{character value. NULL - print to screen, 'pdf', 'png'.} \item{writeTable}{logical value. TRUE - write ouput and log to csv.} \item{weightsItem}{optional numerical vector. weights to be used for sampling items.} \item{weightsFeature}{optional numerical vector. weights to be used for sampling features.} \item{res}{ result of consensusClusterPlus.} \item{verbose}{ boolean. If TRUE, print messages to the screen to indicate progress. This is useful for large datasets.} } \details{ ConsensusClusterPlus implements the Consensus Clustering algorithm of Monti, et al (2003) and extends this method with new functionality and visualizations. Its utility is to provide quantitative stability evidence for determing a cluster count and cluster membership in an unsupervised analysis. ConsensusClusterPlus takes a numerical data matrix of items as columns and rows as features. This function subsamples this matrix according to pItem, pFeature, weightsItem, and weightsFeature, and clusters the data into 2 to maxK clusters by clusterArg clusteringAlgorithm. Agglomerative heirarchical (hclust) and kmeans clustering are supported by an option see above. For users wishing to use a different clustering algorithm for which many are available in R, one can supply their own clustering algorithm as a simple programming hook - see the second commented-out example that uses divisive heirarchical clustering. For a detailed description of usage, output and images, see the vignette by: openVignette(). } \value{ ConsensusClusterPlus returns a list of length maxK. Each element is a list containing consensusMatrix (numerical matrix), consensusTree (hclust), consensusClass (consensus class asssignments). ConsensusClusterPlus also produces images. calcICL returns a list of two elements clusterConsensus and itemConsensus corresponding to cluster-consensus and item-consensus. See Monti, et al (2003) for formulas. } \author{ Matt Wilkerson mwilkers@med.unc.edu } \references{ Monti, S., Tamayo, P., Mesirov, J., Golub, T. (2003) Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52, 91-118. } \examples{ ## obtain gene expression data library(Biobase) data(geneData) d=geneData #median center genes d = sweep(d,1, apply(d,1,median)) ## run consensus cluster rcc = ConsensusClusterPlus(d,maxK=4,reps=100,pItem=0.8,pFeature=1,title="example") ## ICL resICL = calcICL(rcc,title="example") ##example of programming hook for clusterAlg: #library(cluster) #dianaHook = function(this_dist,k){ #tmp = diana(this_dist,diss=TRUE) #assignment = cutree(tmp,k) #return(assignment) #} #ConsensusClusterPlus(d,maxK=6,reps=25,pItem=0.8,pFeature=1,title="example",plot="png",clusterAlg="dianaHook") } \keyword{ methods }