--- title: "Tree Annotation" author: "Guangchuang Yu and Tommy Tsan-Yuk Lam\\ School of Public Health, The University of Hong Kong" date: "`r Sys.Date()`" bibliography: ggtree.bib biblio-style: apalike output: prettydoc::html_pretty: toc: true theme: cayman highlight: github pdf_document: toc: true vignette: > %\VignetteIndexEntry{04 Tree Annotation} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ```{r style, echo=FALSE, results="asis", message=FALSE} knitr::opts_chunk$set(tidy = FALSE, message = FALSE) ``` ```{r echo=FALSE, results="hide", message=FALSE} library("ape") library("ggplot2") library("cowplot") library("treeio") library("ggtree") CRANpkg <- function (pkg) { cran <- "https://CRAN.R-project.org/package" fmt <- "[%s](%s=%s)" sprintf(fmt, pkg, cran, pkg) } Biocpkg <- function (pkg) { sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg) } inset <- ggtree::inset ``` # Annotate clades `r Biocpkg("ggtree")` [@yu_ggtree:_2017] implements _`geom_cladelabel`_ layer to annotate a selected clade with a bar indicating the clade with a corresponding label. The _`geom_cladelabel`_ layer accepts a selected internal node number. To get the internal node number, please refer to [Tree Manipulation](treeManipulation.html#internal-node-number) vignette. ```{r} set.seed(2015-12-21) tree = rtree(30) p <- ggtree(tree) + xlim(NA, 6) p+geom_cladelabel(node=45, label="test label") + geom_cladelabel(node=34, label="another clade") ``` Users can set the parameter, `align = TRUE`, to align the clade label, and use the parameter, `offset`, to adjust the position. ```{r} p+geom_cladelabel(node=45, label="test label", align=TRUE, offset=.5) + geom_cladelabel(node=34, label="another clade", align=TRUE, offset=.5) ``` Users can change the color of the clade label via the parameter `color`. ```{r} p+geom_cladelabel(node=45, label="test label", align=T, color='red') + geom_cladelabel(node=34, label="another clade", align=T, color='blue') ``` Users can change the `angle` of the clade label text and relative position from text to bar via the parameter `offset.text`. ```{r} p+geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5) + geom_cladelabel(node=34, label="another clade", align=T, angle=45) ``` The size of the bar and text can be changed via the parameters `barsize` and `fontsize` respectively. ```{r} p+geom_cladelabel(node=45, label="test label", align=T, angle=270, hjust='center', offset.text=.5, barsize=1.5) + geom_cladelabel(node=34, label="another clade", align=T, angle=45, fontsize=8) ``` Users can also use `geom_label` to label the text. ```{r} p+ geom_cladelabel(node=34, label="another clade", align=T, geom='label', fill='lightblue') ``` ## Annotate clades for unrooted tree `r Biocpkg("ggtree")` provides `geom_clade2` for labeling clades of unrooted layout trees. ```{r fig.wdith=7, fig.height=7, fig.align='center', warning=FALSE, message=FALSE} pg <- ggtree(tree, layout = "daylight") pg + geom_cladelabel2(node=45, label="test label", angle = 10) + geom_cladelabel2(node = 34, label="another clade", angle=305) ``` # Labelling associated taxa (Monophyletic, Polyphyletic or Paraphyletic) `geom_cladelabel` is designed for labelling Monophyletic (Clade) while there are related taxa that are not form a clade. `ggtree` provides `geom_strip` to add a strip/bar to indicate the association with optional label (see [the issue](https://github.com/GuangchuangYu/ggtree/issues/52)). ```{r fig.width=5, fig.height=5, fig.align="center", warning=FALSE} nwk <- system.file("extdata", "sample.nwk", package="treeio") tree <- read.tree(nwk) ggtree(tree) + geom_tiplab() + geom_strip(5, 7, barsize=2, color='red') + geom_strip(6, 12, barsize=2, color='blue') ``` # Highlight clades `ggtree` implements _`geom_hilight`_ layer, that accepts an internal node number and add a layer of rectangle to highlight the selected clade. ```{r fig.width=5, fig.height=5, fig.align="center", warning=FALSE} ggtree(tree) + geom_hilight(node=21, fill="steelblue", alpha=.6) + geom_hilight(node=17, fill="darkgreen", alpha=.6) ``` ```{r fig.width=5, fig.height=5, fig.align="center", warning=FALSE} ggtree(tree, layout="circular") + geom_hilight(node=21, fill="steelblue", alpha=.6) + geom_hilight(node=23, fill="darkgreen", alpha=.6) ``` Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in [Tree Manipulation](treeManipulation.html#groupclade) vignette. ## Highlight balances In addition to _`geom_hilight`_, `ggtree` also implements _`geom_balance`_ which is designed to highlight neighboring subclades of a given internal node. ```{r fig.width=4, fig.height=5, fig.align='center', warning=FALSE} ggtree(tree) + geom_balance(node=16, fill='steelblue', color='white', alpha=0.6, extend=1) + geom_balance(node=19, fill='darkgreen', color='white', alpha=0.6, extend=1) ``` ## Highlight clades for unrooted tree `r Biocpkg("ggtree")` provides `geom_hilight_encircle` to support highlight clades for unrooted layout trees. ```{r fig.width=5, fig.height=5, fig.align='center', warning=FALSE, message=FALSE} pg + geom_hilight_encircle(node=45) + geom_hilight_encircle(node=34, fill='darkgreen') ``` # taxa connection Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be modeled by a simple tree. `ggtree` provides `geom_taxalink` layer that allows drawing straight or curved lines between any of two nodes in the tree, allow it to represent evolutionary events by connecting taxa. ```{r fig.width=5, fig.height=5, fig.align="center", warning=FALSE} ggtree(tree) + geom_tiplab() + geom_taxalink('A', 'E') + geom_taxalink('F', 'K', color='red', arrow=grid::arrow(length = grid::unit(0.02, "npc"))) ``` # Tree annotation with output from evolution software The `r Biocpkg("treeio")` package implemented several parser functiions to parse output from commonly used software in evolutionary biology. Here, we used [BEAST](http://beast2.org/)[@bouckaert_beast_2014] output as an example. For details, please refer to the [Importer](https://bioconductor.org/packages/devel/bioc/vignettes/treeio/inst/doc/Importer.html) vignette. ```{r warning=FALSE, fig.width=5, fig.height=5, fig.align='center'} file <- system.file("extdata/BEAST", "beast_mcc.tree", package="treeio") beast <- read.beast(file) ggtree(beast, aes(color = rate)) + geom_range(range='length_0.95_HPD', color='red', alpha=.6, size=2) + geom_nodelab(aes(x=branch, label=round(posterior, 2)), vjust=-.5, size=3) + scale_color_continuous(low="darkgreen", high="red") + theme(legend.position=c(.1, .8)) ``` # Tree annotation with user specified annotation Integrating user data to annotate phylogenetic tree can be done at different levels. The `r Biocpkg("treeio")` package implements `full_join` methods to [combine tree data to phylogenetic tree object](https://bioconductor.org/packages/devel/bioc/vignettes/treeio/inst/doc/Importer.html). The `r CRANpkg("tidytree")` package supports [linking tree data to phylogeny using tidyverse verbs](https://cran.r-project.org/web/packages/tidytree/vignette/tiytree.html). `r Biocpkg("ggtree")` supports mapping external data to phylogeny for visualization and annotation on the fly. ## the `%<+%` operator Suppose we have the following data that associate with the tree and would like to attach the data in the tree. ```{r} nwk <- system.file("extdata", "sample.nwk", package="treeio") tree <- read.tree(nwk) p <- ggtree(tree) dd <- data.frame(taxa = LETTERS[1:13], place = c(rep("GZ", 5), rep("HK", 3), rep("CZ", 4), NA), value = round(abs(rnorm(13, mean=70, sd=10)), digits=1)) ## you don't need to order the data ## data was reshuffled just for demonstration dd <- dd[sample(1:13, 13), ] row.names(dd) <- NULL ``` ```{r eval=FALSE} print(dd) ``` ```{r echo=FALSE, results='asis'} knitr::kable(dd) ``` We can imaging that the _place_ column stores the location that we isolated the species and _value_ column stores numerical values (*e.g.* bootstrap values). We have demonstrated using the operator, `%<%`, to update a tree view with a new tree. Here, we will introduce another operator, `%<+%`, that attaches annotation data to a tree view. The only requirement of the input data is that its first column should be matched with the node/tip labels of the tree. After attaching the annotation data to the tree by `%<+%`, all the columns in the data are visible to `r Biocpkg("ggtree")`. As an example, here we attach the above annotation data to the tree view, _p_, and add a layer that showing the tip labels and colored them by the isolation site stored in _place_ column. ```{r fig.width=6, fig.height=5, warning=FALSE, fig.align="center"} p <- p %<+% dd + geom_tiplab(aes(color=place)) + geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25) p + theme(legend.position="right") ``` Once the data was attached, it is always attached. So that we can add other layers to display these information easily. ```{r fig.width=6, fig.height=5, warning=FALSE, fig.align="center"} p + geom_text(aes(color=place, label=place), hjust=1, vjust=-0.4, size=3) + geom_text(aes(color=place, label=value), hjust=1, vjust=1.4, size=3) ``` # Visualize tree with associated matrix The `gheatmap` function is designed to visualize phylogenetic tree with heatmap of associated matrix. In the following example, we visualized a tree of H3 influenza viruses with their associated genotype. ```{r fig.width=8, fig.height=6, fig.align="center", warning=FALSE, message=FALSE} beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree") beast_tree <- read.beast(beast_file) genotype_file <- system.file("examples/Genotype.txt", package="ggtree") genotype <- read.table(genotype_file, sep="\t", stringsAsFactor=F) colnames(genotype) <- sub("\\.$", "", colnames(genotype)) p <- ggtree(beast_tree, mrsd="2013-01-01") + geom_treescale(x=2008, y=1, offset=2) p <- p + geom_tiplab(size=2) gheatmap(p, genotype, offset = 5, width=0.5, font.size=3, colnames_angle=-45, hjust=0) + scale_fill_manual(breaks=c("HuH3N2", "pdm", "trig"), values=c("steelblue", "firebrick", "darkgreen")) ``` The _width_ parameter is to control the width of the heatmap. It supports another parameter _offset_ for controlling the distance between the tree and the heatmap, for instance to allocate space for tip labels. For time-scaled tree, as in this example, it's more often to use x axis by using `theme_tree2`. But with this solution, the heatmap is just another layer and will change the `x` axis. To overcome this issue, we implemented `scale_x_ggtree` to set the x axis more reasonable. ```{r fig.width=8, fig.height=6, fig.align="center", warning=FALSE} p <- ggtree(beast_tree, mrsd="2013-01-01") + geom_tiplab(size=2, align=TRUE, linesize=.5) + theme_tree2() pp <- (p + scale_y_continuous(expand=c(0, 0.3))) %>% gheatmap(genotype, offset=8, width=0.6, colnames=FALSE) %>% scale_x_ggtree() pp + theme(legend.position="right") ``` # Visualize tree with multiple sequence alignment With `msaplot` function, user can visualize multiple sequence alignment with phylogenetic tree, as demonstrated below: ```{r fig.width=8, fig.height=6, fig.align='center', warning=FALSE} fasta <- system.file("examples/FluA_H3_AA.fas", package="ggtree") msaplot(ggtree(beast_tree), fasta) ``` A specific slice of the alignment can also be displayed by specific _window_ parameter. ```{r fig.width=7, fig.height=7, fig.align='center', warning=FALSE} msaplot(ggtree(beast_tree), fasta, window=c(150, 200)) + coord_polar(theta='y') ``` # Plot tree with associated data For associating phylogenetic tree with different type of plot produced by user's data, `ggtree` provides `facet_plot` function which accepts an input `data.frame` and a `geom` function to draw the input data. The data will be displayed in an additional panel of the plot. ```{r warning=F, fig.width=10, fig.height=6} tr <- rtree(30) d1 <- data.frame(id=tr$tip.label, val=rnorm(30, sd=3)) p <- ggtree(tr) p2 <- facet_plot(p, panel="dot", data=d1, geom=geom_point, aes(x=val), color='firebrick') d2 <- data.frame(id=tr$tip.label, value = abs(rnorm(30, mean=100, sd=50))) facet_plot(p2, panel='bar', data=d2, geom=geom_segment, aes(x=0, xend=value, y=y, yend=y), size=3, color='steelblue') + theme_tree2() ``` # Plot tree with images and suplots Please refer to the following vignettes: + [Annotating phylogenetic tree with images](https://guangchuangyu.github.io/software/ggtree/vignettes/ggtree-ggimage.html) + [Annotate a phylogenetic tree with insets](https://guangchuangyu.github.io/software/ggtree/vignettes/ggtree-inset.html) # References