---
title: "Statistical analysis and visualization of functional profiles for gene and gene clusters"
author: "\\

	Guangchuang Yu (<guangchuangyu@gmail.com>)\\

        School of Public Health, The University of Hong Kong"
date: "`r Sys.Date()`"
bibliography: clusterProfiler.bib
csl: nature.csl
output: 
  BiocStyle::html_document:
    toc: true
  BiocStyle::pdf_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{Statistical analysis and visualization of functional profiles for genes and gene clusters}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteDepends{AnnotationHub}
  %\usepackage[utf8]{inputenc}
---

```{r style, echo=FALSE, results="asis", message=FALSE}
BiocStyle::markdown()
knitr::opts_chunk$set(tidy = FALSE,
                      warning = FALSE,
                      message = FALSE)
```

```{r echo=FALSE, results='hide', message=FALSE}
library(DOSE)
library(GO.db)
library(AnnotationHub)
library(org.Hs.eg.db)
library(clusterProfiler)
```

# Abstract

`r Biocpkg("clusterProfiler")` supports enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of genes and Genomes (KEGG) with either hypergeometric test or Gene Set Enrichment Analysis (GSEA). `r Biocpkg("clusterProfiler")` adjust the estimated significance level to account for multiple hypothesis testing and also _q-values_ were calculated for FDR control. It supports several visualization methods, including `barplot`, `cnetplot`, `enrichMap` and `gseaplot`. `r Biocpkg("clusterProfiler")` also supports comparing functional profiles among gene clusters. It supports comparing biological themes of GO, KEGG, Disease Ontology (via `r Biocpkg("DOSE")`) and Reactome pathways (via `r Biocpkg("ReactomePA")`).

# Citation

If you use `r Biocpkg("clusterProfiler")` in published research, please cite G. Yu(2012). In addition, please cite G. Yu (2010) when using `r Biocpkg("GOSemSim")` for GO semantic similarity analysis, G. Yu (2015) when using `r Biocpkg("DOSE")` for Disease Ontology analysis, G. Yu (2016) when using `r Biocpkg("ReactomePA")` for Reactome pathway analysis and G. Yu (2015) when applying enrichment analysis to NGS data by using `r Biocpkg("ChIPseeker")`.

```
G Yu, LG Wang, Y Han, QY He.
clusterProfiler: an R package for comparing biological themes among gene clusters.
OMICS: A Journal of Integrative Biology 2012, 16(5):284-287.
```

URL: [http://dx.doi.org/10.1089/omi.2011.0118](http://dx.doi.org/10.1089/omi.2011.0118)

```
G Yu, F Li, Y Qin, X Bo, Y Wu, S Wang. 
GOSemSim: an R package for measuring semantic similarity among GO terms and gene products.
Bioinformatics 2010, 26(7):976-978.
```

URL: [http://dx.doi.org/10.1093/bioinformatics/btq064](http://dx.doi.org/10.1093/bioinformatics/btq064)


```
G Yu, LG Wang, GR Yan, QY He.
DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis.
Bioinformatics 2015, 31(4):608-609.
```

URL: [http://dx.doi.org/10.1093/bioinformatics/btu684](http://dx.doi.org/10.1093/bioinformatics/btu684)

```
G Yu, QY He,
ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization.
Molecular BioSystems 2016, 12(2):477-479.
```

URL: [http://dx.doi.org/10.1039/C5MB00663E](http://dx.doi.org/10.1039/C5MB00663E)


```
G Yu, LG Wang, QY He.
ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization.
Bioinformatics 2015, 31(14):2382-2383.
```

URL: [http://dx.doi.org/10.1093/bioinformatics/btv145](http://dx.doi.org/10.1093/bioinformatics/btv145)


# Introduction

In recently years, high-throughput experimental techniques such as
microarray, RNA-Seq and mass spectrometry can detect cellular
molecules at systems-level. These kinds of analyses generate huge
quantitaties of data, which need to be given a biological
interpretation. A commonly used approach is via clustering in the gene
dimension for grouping different genes based on their similarities[@yu2010].

To search for shared functions among genes, a common way is to
incorporate the biological knowledge, such as Gene Ontology (GO) and
Kyoto Encyclopedia of Genes and Genomes (KEGG), for identifying
predominant biological themes of a collection of genes.

After clustering analysis, researchers not only want to determine
whether there is a common theme of a particular gene cluster, but also
to compare the biological themes among gene clusters. The manual step
to choose interesting clusters followed by enrichment analysis on each
selected cluster is slow and tedious. To bridge this gap, we designed
`r Biocpkg("clusterProfiler")`[@yu2012], for comparing and visualizing functional
profiles among gene clusters.

# bitr: Biological Id TranslatoR

Many new R user may find traslating ID is a tedious task and I have received many feedbacks from `r Biocpkg("clusterProfiler")` users that they don't know how to convert gene symbol, uniprot ID or other ID types to Entrez gene ID that used in `r Biocpkg("clusterProfiler")` for most of the species.

To remove this obstacle, We provide `bitr` function for translating among different gene ID types.

```{r}
x <- c("GPX3",  "GLRX",   "LBP",   "CRYAB", "DEFB1", "HCLS1",   "SOD2",   "HSPA2", 
       "ORM1",  "IGFBP1", "PTHLH", "GPC3",  "IGFBP3","TOB1",    "MITF",   "NDRG1", 
       "NR1H4", "FGFR3",  "PVR",   "IL6",   "PTPRM", "ERBB2",   "NID2",   "LAMB1", 
       "COMP",  "PLS3",   "MCAM",  "SPP1",  "LAMC1", "COL4A2",  "COL4A1", "MYOC",  
       "ANXA4", "TFPI2",  "CST6",  "SLPI",  "TIMP2", "CPM",     "GGT1",   "NNMT",
       "MAL",   "EEF1A2", "HGD",   "TCN2",  "CDA",   "PCCA",    "CRYM",   "PDXK",  
       "STC1",  "WARS",  "HMOX1", "FXYD2", "RBP4",   "SLC6A12", "KDELR3", "ITM2B")
eg = bitr(x, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
head(eg)
```

User should provides an annotation package, both _fromType_ and _toType_ can accept any types that supported.

User can use _keytypes_ to list all supporting types.
```{r}
library(org.Hs.eg.db)
keytypes(org.Hs.eg.db)
```

We can translate from one type to other types.
```{r}
ids <- bitr(x, fromType="SYMBOL", toType=c("UNIPROT", "ENSEMBL"), OrgDb="org.Hs.eg.db")
head(ids)
```


# GO Analysis

## Supported organisms

GO analyses (`groupGO()`, `enrichGO()` and `gseGO()`) support organisms that have an `OrgDb` object available.

Bioconductor have already provide `OrgDb` for about 20 species, see <http://bioconductor.org/packages/release/BiocViews.html#___OrgDb>.

We can build our own `OrgDb` via `AnnotationHub`.

For example:

```{r}
library(AnnotationHub)
hub <- AnnotationHub()
query(hub, "Cricetulus")
Cgriseus <- hub[["AH48061"]]
Cgriseus
```

If organism is not supported by `AnnotationHub`, user can use [AnnotationForge](https://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/MakingNewOrganismPackages.html) to build `OrgDb`. 

If user have GO annotation data (in data.frame format with first column of gene ID and second column of GO ID), they can use `enricher()` and `gseGO()` functions to perform over-representation test and gene set enrichment analysis.

If genes are annotated by direction annotation, it should also annotated by its ancestor GO nodes (indirect annation). If user only has direct annotation, they can pass their annotation to `buildGOmap` function, which will infer indirection annotation and generate a `data.frame` that suitable for both `enricher()` and `gseGO()`.

## GO classification

In `r Biocpkg("clusterProfiler")`, `groupGO` is designed for gene classification based on GO distribution at a specific level. Here we use dataset __*geneList*__ provided by `r Biocpkg("DOSE")`. Please refer to vignette of `r Biocpkg("DOSE")` for more details.

```{r warning=FALSE}
data(geneList)
gene <- names(geneList)[abs(geneList) > 2]
gene.df <- bitr(gene, fromType = "ENTREZID", 
	   	toType = c("ENSEMBL", "SYMBOL"),
		OrgDb = org.Hs.eg.db)
head(gene.df)
ggo <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 3,
               readable = TRUE)
head(summary(ggo))
```

The input parameters of _gene_ is a vector of gene IDs (can be any ID type that supported by corresponding `OrgDb`). 

If _readable_ is setting to _TRUE_, the input gene IDs will be converted to gene symbols.


## GO over-representation test


Over-representation test[@boyle2004] were implemented in `r Biocpkg("clusterProfiler")`. For calculation details and explanation of paramters, please refer to the vignette of `r Biocpkg("DOSE")`.

```{r}
ego <- enrichGO(gene          = gene,
                universe      = names(geneList),
                OrgDb         = org.Hs.eg.db,
                ont           = "CC",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.01,
                qvalueCutoff  = 0.05, 
		readable      = TRUE)
head(summary(ego))
```

As I mentioned before, any gene ID type that supported in `OrgDb` can be directly used in GO analyses. User need to specify the `keytype` parameter to specify the input gene ID type.

```{r}
ego2 <- enrichGO(gene         = gene.df$ENSEMBL,
                OrgDb         = org.Hs.eg.db,
		keytype       = 'ENSEMBL',
                ont           = "CC",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.01,
                qvalueCutoff  = 0.05)
head(summary(ego2))
```

Gene ID can be mapped to gene Symbol by using paramter `readable=TRUE` or `setReadable` function.
```{r}
ego2 <- setReadable(ego2, OrgDb = org.Hs.eg.db)
head(summary(ego2))
```

### drop specific GO terms or level

_enrichGO_ test the whole GO corpus and enriched result may contains very general terms. With _dropGO_ function, user can remove specific GO terms or GO level from results obtained from both _enrichGO_ and _compareCluster_.

### test GO at sepcific level

_enrichGO_ doesn't contain parameter to restrict the test at specific GO level. Instead, we provide a function _gofilter_ to restrict the result at specific GO level. It works with results obtained from both _enrichGO_ and _compareCluster_.


### reduce redundancy of enriched GO terms

According to [issue #28](https://github.com/GuangchuangYu/clusterProfiler/issues/28), I implement a _simplify_ method to redundant GO terms obtained from _enrichGO_. An example can be found in [the blog post](http://guangchuangyu.github.io/2015/10/use-simplify-to-remove-redundancy-of-enriched-go-terms/). It internally call `r Biocpkg("GOSemSim")` to calculate similarities among GO terms and remove those highly similar terms by keeping one representative term. The _simplify_ method works with both outputs from _enrichGO_ and _compareCluster_.


## GO Gene Set Enrichment Analysis

A common approach in analyzing gene expression profiles was identifying differential expressed genes that are deemed interesting. The enrichment analysis we demonstrated previous were based on these differential expressed genes. This approach will find genes where the difference is large, but it will not detect a situation where the difference is small, but evidenced in coordinated way in a set of related genes. Gene Set Enrichment Analysis (GSEA)[@subramanian_gene_2005] directly addresses this limitation. All genes can be used in GSEA; GSEA aggregates the per gene statistics across genes within a gene set, therefore making it possible to detect situations where all genes in a predefined set change in a small but coordinated way. Since it is likely that many relevant phenotypic differences are manifested by small but consistent changes in a set of genes.

For algorithm details, please refer to the vignette of `r Biocpkg("DOSE")`.

```{r eval=FALSE} 
ego3 <- gseGO(geneList     = geneList,
              OrgDb        = org.Hs.eg.db,
              ont          = "CC",
              nPerm        = 1000,
              minGSSize    = 120,
              pvalueCutoff = 0.01,
              verbose      = FALSE)
```

GSEA use permutation test, user can set _nPerm_ for number of permutations. Gene Set size below _minGSSize_ will be omitted.


## GO Semantic Similarity Analysis
GO semantic similarity can be calculated by `r Biocpkg("GOSemSim")`[@yu2010]. We can use it to cluster genes/proteins into different clusters based on their functional similarity and can also use it to measure the similarities among GO terms to reduce the redundancy of GO enrichment results.


# KEGG analysis
The annotation package, `r Biocannopkg("KEGG.db")`, is not updated since 2012. It's now pretty old and in `r Biocpkg("clusterProfiler")`, `enrichKEGG` supports downloading latest online version of KEGG data for enrichment analysis. Using `r Biocannopkg("KEGG.db")` is also supported by explicitly setting _use\_internal\_data_ parameter to _TRUE_, but it's not recommended. 

With this new feature, organism is not restricted to those supported in previous release, it can be any species that have KEGG annotation data available in KEGG database. User should pass abbreviation of academic name to the _organism_ parameter. The full list of KEGG supported organisms can be accessed via [http://www.genome.jp/kegg/catalog/org_list.html](http://www.genome.jp/kegg/catalog/org_list.html).

## KEGG over-representation test


```{r}
kk <- enrichKEGG(gene         = gene,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05)
head(summary(kk))
```

## KEGG Gene Set Enrichment Analysis

```{r}
kk2 <- gseKEGG(geneList     = geneList,
               organism     = 'hsa',
               nPerm        = 1000,
               minGSSize    = 120,
               pvalueCutoff = 0.05,
               verbose      = FALSE)
head(summary(kk2))
```


## KEGG Module over-representation test

[KEGG Module](http://www.genome.jp/kegg/module.html) is a collection of manually defined function units. In some situation, KEGG Modules have a more straightforward interpretation.

```{r eval = FALSE}
mkk <- enrichMKEGG(gene = gene,
                   organism = 'hsa')
```
                   
## KEGG Module Gene Set Enrichment Analysis

```{r eval=FALSE}
mkk2 <- gseMKEGG(geneList = geneList,
                 species = 'hsa')
```
                 

# Disease Ontology analysis

`r Biocpkg("DOSE")`[@yu_dose_2015] supports Disease Ontology (DO) Semantic and Enrichment analysis, please refer to the package vignettes. The `enrichDO` function is very useful for identifying disease association of interesting genes, and function `gseAnalyzer` function is designed for gene set enrichment analysis of _DO_.

# Reactome pathway analysis

`r Biocpkg("ReactomePA")`[@yu_reactomepa_2016] uses Reactome as a source of pathway data. The function call of `enrichPathway` and `gsePathway` in `r Biocpkg("ReactomePA")` is consistent with `enrichKEGG` and `gseKEGG`.

# DAVID functional analysis

`r Biocpkg("clusterProfiler")` provides enrichment and GSEA analysis with GO, KEGG, DO and Reactome pathway supported internally, some user may prefer GO and KEGG analysis with DAVID[@huang_david_2007] and still attracted by the visualization methods provided by `r Biocpkg("clusterProfiler")`[@paranjpe_genome_wid_2013]. To bridge the gap between DAVID and clusterProfiler, we implemented `enrichDAVID`. This function query enrichment analysis result from DAVID webserver via RDAVIDWebService[@fresno_rdavidwebservice_2013] and stored the result as an `enrichResult` instance, so that we can use all the visualization functions in `r Biocpkg("clusterProfiler")` to visualize DAVID results. `enrichDAVID` is fully compatible with `compareCluster` function and comparing enrichment results from different gene clusters is now available with DAVID.

```{r eval=FALSE}
david <- enrichDAVID(gene = gene,
                     idType = "ENTREZ_GENE_ID",
                     listType = "Gene",
                     annotation = "KEGG_PATHWAY",
                     david.user = "clusterProfiler@hku.hk")
```

DAVID Web Service has the following limitations:

+ A job with more than 3000 genes to generate gene or term cluster report will not be handled by DAVID due to resource limit.
+ No more than 200 jobs in a day from one user or computer.
+ DAVID Team reserves right to suspend any improper uses of the web service without notice.

For more details, please refer to [http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html](http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html).

As user has limited usage, please [register](http://david.abcc.ncifcrf.gov/webservice/register.htm) and use your own user account to run __*enrichDAVID*__.

# Universal enrichment analysis

`r Biocpkg("clusterProfiler")` supports both hypergeometric test and gene set enrichment analysis of many ontology/pathway, but it's still not enough for users may want to analyze their data with unsupported organisms, slim version of GO, novel functional annotation (e.g. GO via BlastGO or KEGG via KAAS), unsupported ontologies/pathways or customized annotations.

`r Biocpkg("clusterProfiler")` provides `enricher` function for hypergeometric test and `GSEA` function for gene set enrichment analysis that are designed to accept user defined annotation. They accept two additional parameters _TERM2GENE_ and _TERM2NAME_. As indicated in the parameter names, _TERM2GENE_ is a data.frame with first column of term ID and second column of corresponding mapped gene and _TERM2NAME_ is a data.frame with first column of term ID and second column of corresponding term name. _TERM2NAME_ is optional.

An example of using `enricher` and `GSEA` to analyze [DisGeNet](http://www.disgenet.org/web/DisGeNET/menu/home) annotation is presented in the post, [use clusterProfiler as an universal enrichment analysis tool](http://guangchuangyu.github.io/2015/05/use-clusterprofiler-as-an-universal-enrichment-analysis-tool/).

## Using MSigDB gene set collections

Users can use `enricher` and `GSEA` function to analyze gene set collections downloaded from Molecular Signatures Database ([MSigDb](http://www.broadinstitute.org/gsea/msigdb/index.jsp)). `r Biocpkg("clusterProfiler")` provides a function, `read.gmt`, to parse the [gmt file](www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29) into a _TERM2GENE_ `data.frame` that is ready for both `enricher` and `GSEA` functions.

```{r}
gmtfile <- system.file("extdata", "c5.cc.v5.0.entrez.gmt", package="clusterProfiler")
c5 <- read.gmt(gmtfile)
egmt <- enricher(gene, TERM2GENE=c5)
head(summary(egmt))
```

# Functional analysis of NGS data

Functional analysis using NGS data (eg, RNA-Seq and ChIP-Seq) can be performed by linking coding and non-coding regions to coding genes via `r Biocpkg("ChIPseeker")`[@yu_chipseeker_2015] package, which can annotates genomic regions to their nearest genes, host genes, and flanking genes respectivly. In addtion, it provides a function, __*seq2gene*__, that simultaneously considering host genes, promoter region and flanking gene from intergenic region that may under control via cis-regulation. This function maps genomic regions to genes in a many-to-many manner and facilitate functional analysis. For more details, please refer to `r Biocpkg("ChIPseeker")`.


# Visualization

The function calls of `groupGO`, `enrichGO`, `enrichKEGG`, `enrichDO` and `enrichPathway` are consistent and all the output can be visualized by bar plot, enrichment map and category-gene-network plot. It is very common to visualize the enrichment result in bar or pie chart. We believe the pie chart is misleading and only provide bar chart.

## barplot
```{r fig.height=5, fig.width=6}
barplot(ggo, drop=TRUE, showCategory=12)
```


```{r fig.height=5, fig.width=8}
barplot(ego, showCategory=8)
```

## dotplot


[__*dotplot*__](http://guangchuangyu.github.io/2015/06/dotplot-for-enrichment-result/) is a good alternative to __*barplot*__.

```{r}
dotplot(ego)
```

## enrichMap

Enrichment map can be viusalized by `enrichMap`, which also support results obtained from hypergeometric test and gene set enrichment analysis.
```{r fig.cap="enrichment map of enrichment result", fig.align="center", fig.height=16, fig.width=16}
enrichMap(ego)
```

## cnetplot
In order to consider the potentially biological complexities in which a gene may belong to multiple annotation categories and provide information of numeric changes if available, we developed `cnetplot` function to extract the complex association.
```{r fig.height=14, fig.width=14}
cnetplot(ego, categorySize="pvalue", foldChange=geneList)
```

```{r fig.height=14, fig.width=14}
cnetplot(kk, categorySize="geneNum", foldChange=geneList)
```
## plotGOgraph

`plotGOgraph`, which is based on `r Biocpkg("topGO")`, can accept output of `enrichGO` and visualized the enriched GO induced graph.
```{r fig.height=12, fig.width=8}
plotGOgraph(ego)
```

## gseaplot

Running score of gene set enrichment analysis and its association of phenotype can be visualized by `gseaplot`.
```{r fig.cap="plotting gsea result", fig.align="center", fig.height=6, fig.width=8}
gseaplot(kk2, geneSetID = "hsa04145")
```


## pathview from pathview package

`r Biocpkg("clusterProfiler")` users can also use `pathview` from the `r Biocpkg("pathview")`[@luo_pathview] to visualize KEGG pathway.

The following example illustrate how to visualize "hsa04110" pathway, which was enriched in our previous analysis.

```{r eval=FALSE}
library("pathview")
hsa04110 <- pathview(gene.data  = geneList,
                     pathway.id = "hsa04110",
                     species    = "hsa",
                     limit      = list(gene=max(abs(geneList)), cpd=1))
```
![](figures/hsa04110_pathview.png)

For further information, please refer to the vignette of `r Biocpkg("pathview")`[@luo_pathview].


# Biological theme comparison
`r Biocpkg("clusterProfiler")` was developed for biological theme comparison[@yu2012], and it provides a function, `compareCluster`, to automatically calculate enriched functional categories of each gene clusters.

```{r}
data(gcSample)
lapply(gcSample, head)
```

The input for _geneCluster_ parameter should be a named list of gene IDs. To speed up the compilation of this document, we set `use_internal_data = TRUE`.

```{r}
ck <- compareCluster(geneCluster = gcSample, fun = "enrichKEGG")
head(summary(ck))
```


## Formula interface of compareCluster

`compareCluster` also supports passing a formula (the code to support formula has been contributed by Giovanni Dall'Olio) of type $Entrez \sim group$ or $Entrez \sim group + othergroup$. 

```{r}
## formula interface
mydf <- data.frame(Entrez=c('1', '100', '1000', '100101467',
                       '100127206', '100128071'),
                   group = c('A', 'A', 'A', 'B', 'B', 'B'),
                   othergroup = c('good', 'good', 'bad', 'bad',
                       'good', 'bad'))
xx.formula <- compareCluster(Entrez~group, data=mydf, fun='groupGO', OrgDb='org.Hs.eg.db')
head(summary(xx.formula))

## formula interface with more than one grouping variable
xx.formula.twogroups <- compareCluster(Entrez~group+othergroup,
                                       data=mydf, fun='groupGO', OrgDb='org.Hs.eg.db')
head(summary(xx.formula.twogroups))
```

## Visualization of profile comparison

We can visualize the result using `plot` method.
```{r fig.height=8, fig.width=8}
plot(ck)
```

By default, only top 5 (most significant) categories of each cluster
was plotted. User can changes the parameter _showCategory_ to
specify how many categories of each cluster to be plotted, and if
_showCategory_ was set to _NULL_, the whole result will
be plotted.

The _plot_ function accepts a parameter _by_ for setting the scale of dot sizes. The default parameter _by_ is setting to "geneRatio", which corresponding to the "GeneRatio" column of the output. If it was setting to _count_, the comparison will be based on gene counts, while if setting to _rowPercentage_, the dot sizes will be normalized by _count/(sum of each row)_

To provide the full information, we also provide number of identified genes in each category (numbers in parentheses) when _by_ is setting to _rowPercentage_ and number of gene clusters in each cluster label (numbers in parentheses) when _by_ is setting to _geneRatio_, as shown in Figure 3. If the dot sizes were based on _count_, the row numbers will not shown.

The p-values indicate that which categories are more likely to have biological meanings. The dots in the plot are color-coded based on their corresponding p-values. Color gradient ranging from red to blue correspond to in order of increasing p-values. That is, red indicate low p-values (high enrichment), and blue indicate high p-values (low enrichment). P-values and adjusted p-values were filtered out by the threshold giving by
parameter _pvalueCutoff_, and FDR can be estimated by _qvalue_.

User can refer to the example in [@yu2012]; we analyzed the publicly available expression dataset of breast tumour tissues from 200 patients (GSE11121, Gene Expression Omnibus)[@schmidt2008]. We identified 8 gene clusters from differentially expressed genes, and using `compareCluster` to compare these gene clusters by their enriched biological process.


Another example was shown in [@yu2011], we calculated functional similarities among viral miRNAs using method described in [@yu_new_2011], and compared significant KEGG pathways regulated by different viruses using `compareCluster`.

The comparison function was designed as a framework for comparing gene clusters of any kind of ontology associations, not only `groupGO`, `enrichGO`, `enrichKEGG` and `enricher` provided in this package, but also other biological and biomedical ontologies, for instance, `enrichDO` from `r Biocpkg("DOSE")`[@yu_dose_2015] and `enrichPathway` from `r Biocpkg("ReactomePA")` work fine with `compareCluster` for comparing biological themes in disease and reactome pathway perspective. More details can be found in the vignettes of `r Biocpkg("DOSE")`[@yu_dose_2015] and `r Biocpkg("ReactomePA")`.


# External documents

More documents can be found in <http://www.bioconductor.org/packages/DOSE>, <http://www.bioconductor.org/packages/ReactomePA> and <http://guangchuangyu.github.io/tags/clusterprofiler>.


## Bugs/Feature requests ##

If you have any, [let me know](https://github.com/GuangchuangYu/clusterProfiler/issues).


# Session Information

Here is the output of `sessionInfo()` on the system on which this document was compiled:

```{r echo=FALSE}
sessionInfo()
```

# References