--- title: "Evaluate impact of Semantic Similiarity choice" author: - name: Aurelien Brionne affiliation: "Institut national de recherche pour l'agriculture, l'alimentation et l'environnement (INRAE)" - name: Amelie Juanchich affiliation: "Institut national de recherche pour l'agriculture, l'alimentation et l'environnement (INRAE)" - name: Christelle Hennequet-Antier affiliation: "Institut national de recherche pour l'agriculture, l'alimentation et l'environnement (INRAE)" date: "`r format(Sys.time(), '%d %B, %Y')`" output: BiocStyle::html_document: highlight: tango vignette: > %\VignetteIndexEntry{4: SS_choice} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: "`r system.file('extdata','bibliography.bib',package='ViSEAGO')`" csl: "`r system.file('extdata','bmc-genomics.csl',package='ViSEAGO')`" --- ```{r setup,include=FALSE} # load library(ViSEAGO) # knitr document options knitr::opts_chunk$set( eval=FALSE,fig.path='./data/output/',echo=TRUE,fig.pos = 'H', fig.width=8,message=FALSE,comment=NA,warning=FALSE ) ``` # Introduction{-} In the overview (see`utils::vignette("overview", package ="ViSEAGO")`), we explained how to use `r BiocStyle::Biocpkg("ViSEAGO")` package. In this vignette we explain how to explore the effect of the GO semantic similarity algorithms on the tree structure, and the effect of the trees clustering based on the mouse_bioconductor vignette dataset (see `utils::vignette("2_mouse_bioconductor", package ="ViSEAGO")`). # Data{-} Vignette build convenience (for less build time and size) need that data were pre-calculated (provided by the package), and that illustrations were not interactive. ```{r vignette_data_used} # load vignette data data( myGOs, package="ViSEAGO" ) ``` # Clusters-heatmap of GO terms The GO annotations of genes created and enriched GO terms are combined using `ViSEAGO::build_GO_SS`. The Semantic Similarity (SS) between enriched GO terms are calculated using `ViSEAGO::compute_SS_distances` method. We compute all distances methods with *Resnik*, *Lin*, *Rel*, *Jiang*, and *Wang* algorithms implemented in the `r BiocStyle::Biocpkg("GOSemSim")` package @pmid20179076. The built object `myGOs` contains all informations of enriched GO terms and the SS distances between them. Then, a hierarchical clustering method using `ViSEAGO::GOterms_heatmap` is performed based on each SS distance between the enriched GO terms using the `ward.D2` aggregation criteria. Clusters of enriched GO terms are obtained by cutting branches off the dendrogram. Here, we choose a dynamic branch cutting method based on the shape of clusters using `r BiocStyle::CRANpkg("dynamicTreeCut")` [@pmid18024473; @dynamicTreeCut]. ```{r SS_build,eval=FALSE} # compute Semantic Similarity (SS) myGOs<-ViSEAGO::compute_SS_distances( myGOs, distance=c("Resnik","Lin","Rel","Jiang","Wang") ) ``` 1. Resnik distance ```{r SS_terms_Resnik-wardD2} # GO terms heatmap Resnik_clusters_wardD2<-ViSEAGO::GOterms_heatmap( myGOs, showIC=TRUE, showGOlabels=TRUE, GO.tree=list( tree=list( distance="Resnik", aggreg.method="ward.D2" ), cut=list( dynamic=list( deepSplit=2, minClusterSize =2 ) ) ), samples.tree=NULL ) ``` 2. Lin distance ```{r SS_Lin-wardD2} # GO terms heatmap Lin_clusters_wardD2<-ViSEAGO::GOterms_heatmap( myGOs, showIC=TRUE, showGOlabels=TRUE, GO.tree=list( tree=list( distance="Lin", aggreg.method="ward.D2" ), cut=list( dynamic=list( deepSplit=2, minClusterSize =2 ) ) ), samples.tree=NULL ) ``` 3. Rel distance ```{r SS_ Rel-wardD2} # GO terms heatmap Rel_clusters_wardD2<-ViSEAGO::GOterms_heatmap( myGOs, showIC=TRUE, showGOlabels=TRUE, GO.tree=list( tree=list( distance="Rel", aggreg.method="ward.D2" ), cut=list( dynamic=list( deepSplit=2, minClusterSize =2 ) ) ), samples.tree=NULL ) ``` 4. Jiang distance ```{r SS_Jiang-wardD2} # GO terms heatmap Jiang_clusters_wardD2<-ViSEAGO::GOterms_heatmap( myGOs, showIC=TRUE, showGOlabels=TRUE, GO.tree=list( tree=list( distance="Jiang", aggreg.method="ward.D2" ), cut=list( dynamic=list( deepSplit=2, minClusterSize =2 ) ) ), samples.tree=NULL ) ``` 5. Wang distance ```{r SS_Wang-wardD2} # GO terms heatmap Wang_clusters_wardD2<-ViSEAGO::GOterms_heatmap( myGOs, showIC=TRUE, showGOlabels=TRUE, GO.tree=list( tree=list( distance="Wang", aggreg.method="ward.D2" ), cut=list( dynamic=list( deepSplit=2, minClusterSize =2 ) ) ), samples.tree=NULL ) ``` # Trees comparison ## Global trees comparisons The `r BiocStyle::CRANpkg("dendextend")` package @dendextend, offers a set of functions for extending dendrogram objects in R, letting you visualize and compare trees of hierarchical clusterings (see `utils::vignette("introduction", package ="dendextend")`). In this package we use `dendextend::dendlist` and `dendextend::cor.dendlist` functions in order to calculate a correlation matrix between trees, which is based on the Baker Gamma and cophenetic correlation as mentioned in `r BiocStyle::CRANpkg("dendextend")`. The correlation matrix can be visualized with the nice `corrplot::corrplot` function from `r BiocStyle::CRANpkg("corrplot")` package @corrplot. ```{r parameters_dend_correlation} # build the list of trees dend<- dendextend::dendlist( "Resnik"=slot(Resnik_clusters_wardD2,"dendrograms")$GO, "Lin"=slot(Lin_clusters_wardD2,"dendrograms")$GO, "Rel"=slot(Rel_clusters_wardD2,"dendrograms")$GO, "Jiang"=slot(Jiang_clusters_wardD2,"dendrograms")$GO, "Wang"=slot(Wang_clusters_wardD2,"dendrograms")$GO ) # build the trees matrix correlation dend_cor<-dendextend::cor.dendlist(dend) ``` ```{r parameters_dend_correlation_print} # corrplot corrplot::corrplot( dend_cor, "pie", "lower", is.corr=FALSEALSE, cl.lim=c(0,1) ) ``` Drawing As expected, we can easily tells us that GO semantic similarity algorithms based on the Information Content (IC-based) with *Resnik*, *Lin*, *Rel*, and *Jiang* methods are more similar than the *Wang* method which in based on the topology of the GO graph structure (Graph-based). ## Paired trees comparison We can also compare the dendrograms build with, for example, the *Resnik* and the *Wang* algorithms using `dendextend::dendlist`, `dendextend::untangle`, and `dendextend::tanglegram` functions. The quality of the alignment of the two trees can be calculated with `dendextend::entanglement` (0: good to 1:bad). ```{r parameters_dend_comparison,fig.cap="dendrograms comparison"} # dendrogram list dl<-dendextend::dendlist( slot(Resnik_clusters_wardD2,"dendrograms")$GO, slot(Wang_clusters_wardD2,"dendrograms")$GO ) # untangle the trees (efficient but very highly time consuming) tangle<-dendextend::untangle( dl, "step2side" ) # display the entanglement dendextend::entanglement(tangle) # 0.08362968 # display the tanglegram dendextend::tanglegram( tangle, margin_inner=5, edge.lwd=1, lwd = 1, lab.cex=0.8, columns_width = c(5,2,5), common_subtrees_color_lines=FALSE ) ``` Drawing # Clusters comparison Another possibility concerns the comparison of the dendrograms clusters. ## Multiple clusters comparison We can also explore the GO terms assignation between clusters according the used parameters with `ViSEAGO::clusters_cor` and plot the results with `corrplot::corrplot` using `r BiocStyle::CRANpkg("corrplot")` package. ```{r parameters_clusters_correlation} # clusters to compare clusters=list( Resnik="Resnik_clusters_wardD2", Lin="Lin_clusters_wardD2", Rel="Rel_clusters_wardD2", Jiang="Jiang_clusters_wardD2", Wang="Wang_clusters_wardD2" ) # global dendrogram partition correlation clust_cor<-ViSEAGO::clusters_cor( clusters, method="adjusted.rand" ) ``` ```{r parameters_clusters_correlation_print} # global dendrogram partition correlation corrplot::corrplot( clust_cor, "pie", "lower", is.corr=FALSEALSE, cl.lim=c(0,1) ) ``` Drawing As expected, same as in the global trees comparison, we can easily tells us that GO semantic similarity algorithms based on the Information Content (IC-based) with Resnik, Lin, Rel, and Jiang methods are more similar than the Wang method which in based on the topology of the GO graph structure (Graph-based). ## Paired trees comparison We can also explore *in details* the GO terms assignation between clusters according the used parameters with `ViSEAGO::compare_clusters`. ```{r parameters_clusters_comparison,fig.height=8} # clusters content comparisons ViSEAGO::compare_clusters(clusters) ``` Drawing NB: For this vignette, this illustration is not interactive. # Conclusion `r BiocStyle::Biocpkg("ViSEAGO")` package provides convenient methods to explore the effect of the GO semantic similarity algorithms on the tree structure, and the effect of the trees clustering playing a key role to ensuring functional coherence. # References{-}