--- title: "A Quick Start of cola Package" author: "Zuguang Gu ( z.gu@dkfz.de )" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{1. A Quick Start of cola Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE, message = FALSE} library(markdown) library(knitr) knitr::opts_chunk$set( error = FALSE, tidy = FALSE, message = FALSE, fig.align = "center") options(width = 100) options(rmarkdown.html_vignette.check_title = FALSE) library(cola) ``` Assume your matrix is stored in an object called `mat`, to perform consensus partitioning with *cola*, you only need to run following code: ```{r, eval = FALSE} # code only for demonstration mat = adjust_matrix(mat) # optional rl = run_all_consensus_partition_methods(mat, mc.cores = ...) cola_report(rl, output_dir = ..., mc.cores = ...) ``` In above code, there are three steps: 1. Adjust the matrix. In this step, rows with too many `NA`s are removed. Rows with very low variance are removed. `NA` values are imputed if there are less than 50% in each row. Outliers are adjusted in each row. 2. Run consensus partitioning with several methods. Partitioning methods are `hclust` (hierarchical clustering with cutree), `kmeans` (k-means clustering), `skmeans::skmeans` (spherical k-means clustering), `cluster::pam` (partitioning around medoids) and `Mclust::mclust` (model-based clustering). The default methods to extract top n rows are `SD` (standard deviation), `CV` (coefficient of variation), `MAD` (median absolute deviation) and `ATC` (ability to correlate to other rows). 3. Generate a detailed HTML report for the complete analysis. `run_all_consensus_partition_methods()` runs multiple methods in sequence, which might take long time for big datasets. Users can also run consensus partitioining with a specific top-value methods (e.g. SD) and partitioning methods (e.g. skmeans) by `consensus_partition()` function: ```{r, eval = FALSE} res = consensus_partition(mat, top_value_method = ..., partition_method = ...) cola_report(res, output_dir = ..., mc.cores = ...) ``` For extremely large datasets, users can run `consensus_partition_by_down_sampling()` by randomly sampling a subset of samples for classification, later the classes of the remaining samples are predicted by the signatures of the _cola_ classification. More details can be found in the vignette ["Work with Big Datasets"](working_with_big_datasets.html). ```{r, eval = FALSE} res = consensus_partition_by_down_sampling(mat, subset = ..., top_value_method = ..., partition_method = ...) cola_report(res, output_dir = ..., mc.cores = ...) ``` There are examples on real datasets for _cola_ analysis that can be found at https://jokergoo.github.io/cola_collection/.