--- title: "Exporting trees with data" author: "Guangchuang Yu\\ School of Public Health, The University of Hong Kong" date: "`r Sys.Date()`" bibliography: treeio.bib biblio-style: apalike output: prettydoc::html_pretty: toc: true theme: cayman highlight: github pdf_document: toc: true vignette: > %\VignetteIndexEntry{02 Exporting trees with data} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- ```{r style, echo=FALSE, results="asis", message=FALSE} knitr::opts_chunk$set(tidy = FALSE, message = FALSE) ``` ```{r echo=FALSE, results="hide", message=FALSE} library('jsonlite') library("treeio") ``` # Introduction The [treeio](https://bioconductor.org/packages/treeio/) package supports parsing various phylogenetic tree file formats including software outputs that contain evolutionary evidences. Some of the formats are just log file (*e.g.* [PAML](http://abacus.gene.ucl.ac.uk/software/paml.html) and [r8s](http://ginger.ucdavis.edu/r8s) outputs), while some of the others are non-standard formats (*e.g.* [BEAST](http://beast2.org/) and [MrBayes](http://nbisweden.github.io/MrBayes/) outputs that introduce square bracket, which was reserved to store comment in standard Nexus format, to store inferences). With [treeio](https://bioconductor.org/packages/treeio/), we are now able to parse these files to extract phylogenetic tree and map associated data on the tree structure. Exporting tree structure is easy, users can use `as.phyo` method defined [treeio](https://bioconductor.org/packages/treeio/) to convert `treedata` object to `phylo` object then using `write.tree` or `write.nexus` implemented in [ape](https://cran.r-project.org/web/packages/ape/index.html) package [@paradis_ape_2004] to export the tree structure as Newick text or Nexus file. This is quite useful for converting non-standard formats to standard format and for extracting tree from software outputs, such as log file. However, exporting tree with associated data is still challenging. These associated data can be parsed from analysis programs or obtained from external sources (*e.g.* phenotypic data, experimental data and clinical data). The major obstacle here is that there is no standard format that designed for storing tree with data. [NeXML](http://www.nexml.org/) [@vos_nexml:_2012] maybe the most flexible format, however it is currently not widely supported. Most of the analysis programs in this field rely extensively on Newick string and Nexus format. In my opinion, although BEAST Nexus format^[http://beast.community/nexus_metacomments] may not be the best solution, it is currently a good approach for storing heterogeneous associated data. The beauty of the format is that all the annotate elements are stored within square bracket, which is reserved for comments. So that the file can be parsed as standard Nexus by ignoring annotate elements and existing programs should be able to read them. # Exporting tree data to BEAST Nexus format ## Exporting/converting software output The [treeio](https://bioconductor.org/packages/treeio/) package provides `write.beast` to export `treedata` object as BEAST Nexus file [@bouckaert_beast_2014]. With [treeio](https://bioconductor.org/packages/treeio/), it is easy to convert software output to BEAST format if the output can be parsed by [treeio](https://bioconductor.org/packages/treeio/). For example, we can convert NHX file to BEAST file and use NHX tags to color the tree using FigTree^[http://beast.community/figtree] or convert CODEML output and use *d~N~/d~S~*, *d~N~* or *d~S~* to color the tree in FigTree. ```{r comment=NA} nhxfile <- system.file("extdata/NHX", "phyldog.nhx", package="treeio") nhx <- read.nhx(nhxfile) # write.beast(nhx, file = "phyldog.tree") write.beast(nhx) ``` ![](figures/phyldog.png) ```{r comment=NA} mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="treeio") ml <- read.codeml_mlc(mlcfile) # write.beast(ml, file = "codeml.tree") write.beast(ml) ``` ![](figures/codeml.png) ## Combining tree with external data Using the utilities provided by [treeio](https://bioconductor.org/packages/treeio/), it is easy to link external data onto the corresponding phylogeny. The `write.beast` function enable users to combine the tree with external data to a single tree file. ```{r comment=NA} phylo <- as.phylo(nhx) ## print the newick text write.tree(phylo) N <- Nnode2(phylo) fake_data <- data_frame(node = 1:N, fake_trait = rnorm(N), another_trait = runif(N)) fake_tree <- treedata(phylo = phylo, data = fake_data) write.beast(fake_tree) ``` ## Merging tree data from different sources Not only Newick tree text can be combined with associated data, but also tree data obtained from software output can be combined with external data, as well as different tree objects can be merged together. For details, please refer to the [Importer](Importer.html) vignette. ```{r} ## combine tree object with data tree_with_data <- full_join(nhx, fake_data, by = "node") tree_with_data ## merge two tree object tree2 <- merge_tree(nhx, fake_tree) tree2 identical(tree_with_data, tree2) ``` After merging data from different sources, the tree with the associated data can be exported into a single file. ```{r comment=NA} write.beast(tree2) ``` The output BEAST Nexus file can be imported into R using the `read.beast` function and all the associated data can be used to annotate the tree using [ggtree](https://bioconductor.org/packages/ggtree/) [@yu_ggtree:_2017]. ```{r} outfile <- tempfile(fileext = ".tree") write.beast(tree2, file = outfile) read.beast(outfile) ``` # Exporting tree data to *jtree* format {#jtree} The [treeio](https://bioconductor.org/packages/treeio/) package provides `write.beast` to export `treedata` to BEAST Nexus file. This is quite useful to convert file format, combine tree with data and merging tree data from different sources as we demonstrated in [Exporting tree data to BEAST Nexus format](#exporting-tree-data-to-beast-nexus-format) session. The [treeio](https://bioconductor.org/packages/treeio/) package also supplies `read.beast` function to parse output file of `write.beast`. Although with [treeio](https://bioconductor.org/packages/treeio/), the R community has the ability to manipulate BEAST Nexus format and process tree data, there is still lacking library/package for parsing BEAST file in other programming language. JSON (JavaScript Object Notation)^[https://www.json.org/] is a lightweight data-interchange format and widely supported in almost all modern programming languages. To make it easy to import tree with data in other programming languages, [treeio](https://bioconductor.org/packages/treeio/) supports exporting tree with data in [jtree format](Importer.html#jtree), which is JSON-based and can be easy to parse using any languages that supports JSON. ```{r comment=NA} write.jtree(tree2) ``` The *jtree* format is based on JSON and can be parsed using JSON parser. ```{r comment=NA} jtree_file <- tempfile(fileext = '.jtree') write.jtree(tree2, file = jtree_file) jsonlite::fromJSON(jtree_file) ``` The *jtree* file can be directly imported as a `treedata` object using `read.jtree` provided also in [treeio](https://bioconductor.org/packages/treeio/) package. ```{r} read.jtree(jtree_file) ``` # References