--- title: "Import ontology files" author: "Zuguang Gu ( z.gu@dkfz.de )" date: '`r Sys.Date()`' output: html_vignette: css: main.css toc: true vignette: > %\VignetteIndexEntry{03. Import ontology files} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE, message = FALSE} library(knitr) knitr::opts_chunk$set( error = FALSE, tidy = FALSE, message = FALSE, warning = FALSE, fig.align = "center") ``` ## The .obo format There are several formats for ontology data. The most compact and readable format is the `.obo` format, which was initially developed by the GO consortium. A lot of ontologies in `.obo` format can be found from the [OBO Foundry](http://obofoundry.org/) or [BioPortal](https://bioportal.bioontology.org/). A description of the `.obo` format can be found from https://owlcollab.github.io/oboformat/doc/GO.format.obo-1_4.html. In the **simona** package, the function `import_obo()` can be used to import an `.obo` file to an `ontology_DAG` object. The input is a path on local computer or an URL. In the following example, we use the [Plant Ontology](http://obofoundry.org/ontology/po.html) as an example. The link of `po.obo` can be found from that web package. You can download it or directly provide it as an URL. ```{r, warning = FALSE} library(simona) dag1 = import_obo("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.obo") dag1 ``` There are also several meta columns attached to the object, such as the name and the long definition of terms in the ontology. ```{r} head(mcols(dag1)) ``` Note rows in `mcols(dag1)` corresponds to terms in `dag_all_terms(dag)`. The `is_a` relation between classes is of course saved in the DAG object (specified in the `is_a` tag in the `.obo` file). Additional relation types can also be selected (specified in the `relationship` tag). By default only the relation type `part_of` is used. You can check other values associated with the `relationship` tag and the `[Typedef]` section in the `.obo` file to select proper additional relation types. Just make sure that the selected relation types are transitive and not inversed (e.g. you cannot select `has_part` which is a reversed relation of `part_of`). Relations can also have a DAG structure. In `import_obo()`, if a parent relation type is selected, all its offspring types are automatically selected. For example, in GO, besides relations of `is_a` and `part_of`, there are also `regulates`, `positively_regulates` and `negatively_regulates`, where the latter two are child relations of `regulates`. So if `regulates` is selected as an additional relation type, the other two are automatically selected. The DAG of relation types is automatically recognized and saved from the ontology files. ``` import_obo("file_for_go.obo", relation_type = c("part_of", "regulates")) ``` Finally, all the spaces specified in `relation_type` will be converted to underlines. So it is the same if you specify `"part of"` or `"part_of"`. ## Other ontology formats For ontologies in other formats, **simona** uses an external tool [**ROBOT**](http://robot.obolibrary.org/) to convert them to `.obo` format and later internally uses `import_obo()` to import them. **ROBOT** is already doing a great and professional job of converting between different ontology formats. The file `robot.jar` is needed and it can be downloaded from https://github.com/ontodev/robot/releases (Since this is a tool in Java, you should have Java already available on your machine). The file `po.owl` can also be found from the [Plant Ontology](http://obofoundry.org/ontology/po.html) web page. ```{r, eval = Sys.info()["user"] == "guz"} dag2 = import_ontology("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.owl", robot_jar = "~/Downloads/robot.jar") ``` ```{r, eval = FALSE} dag2 ``` ```{r, echo = FALSE} if(Sys.info()["user"] == "guz") { print(dag2) } else { cat( "An ontology_DAG object: Source: po, releases/2021-08-13 1654 terms / 2510 relations Root: _all_ Terms: PO:0000001, PO:0000002, PO:0000003, PO:0000004, ... Max depth: 13 Aspect ratio: 24.85:1 (based on the longest distance to root) 39.6:1 (based on the shortest distance to root) Relations: is_a, part_of With the following columns in the metadata data frame: id, short_id, name, namespace, definition ") } ``` More conveniently, the path of `robot.jar` can be set as a global option: ``` simona_opt$robot_jar = "~/Downloads/robot.jar" import_ontology("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.owl") ``` **ROBOT** supports the following ontology formats and they are automatically identified according to the file contents. - `json`: OBO Graphs JSON - `obo`: OBO Format - `ofn`: OWL Functional - `omn`: Manchester - `owl`: RDF/XML - `owx`: OWL/XML - `ttl`: Turtle ## The .owl format For some huge ontologies, **ROBOT** requires a huge amount of memory to convert to the `.obo` format. If the ontology is in the `.owl` format (in the RDF/XML seriation format), the function `import_owl()` can be optionally used. `import_owl()` directly parses the `.owl` file and returns an `ontology_DAG` object. The `import_owl()` is written from scratch and it is recommended to use only when `import_ontology()` does not work. ```{r} dag3 = import_owl("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.owl") dag3 ``` ## The .ttl format Similarly, some ontologies may only provide large `.ttl` format files ([the Turtle format](https://www.w3.org/TR/turtle/)). **simona** also provides a function `import_ttl()` which can recognize `.ttl` file with `owl:Class` as objects. The internal parsing script is written in Perl, so you need to make sure Perl is installed on your machine. ```{r, eval = FALSE} # https://bioportal.bioontology.org/ontologies/MSTDE dag4 = import_ttl("https://jokergoo.github.io/simona/MSTDE.ttl") dag4 ``` ## Session info ```{r} sessionInfo() ```