title: "Import ontology files"
author: "Zuguang Gu ( z.gu@dkfz.de )"
date: '`r Sys.Date()`'
css: main.css
toc: true
vignette: >
%\VignetteIndexEntry{03. Import ontology files}
```{r, echo = FALSE, message = FALSE}
error = FALSE,
tidy = FALSE,
message = FALSE,
warning = FALSE,
fig.align = "center")
## The .obo format
There are several formats for ontology data. The most compact and readable
format is the `.obo` format, which was initially developed by the GO consortium. A lot of
ontologies in `.obo` format can be found from the [OBO
Foundry](http://obofoundry.org/) or
[BioPortal](https://bioportal.bioontology.org/). A description of the `.obo`
format can be found from
In the **simona** package, the function `import_obo()` can be used to import
an `.obo` file to an `ontology_DAG` object. The input is a path on local
computer or an URL. In the following example, we use the [Plant
Ontology](http://obofoundry.org/ontology/po.html) as an example.
The link of `po.obo` can be found from that web package. You can download it or
directly provide it as an URL.
```{r, warning = FALSE}
dag1 = import_obo("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.obo")
There are also several meta columns attached to the object, such as
the name and the long definition of terms in the ontology.
Note rows in `mcols(dag1)` corresponds to terms in `dag_all_terms(dag)`.
The `is_a` relation between classes is of course saved in the DAG object
(specified in the `is_a` tag in the `.obo` file). Additional relation types
can also be selected (specified in the `relationship` tag). By default only
the relation type `part_of` is used. You can check other values associated
with the `relationship` tag and the `[Typedef]` section in the `.obo` file to
select proper additional relation types. Just make sure that the selected
relation types are transitive and not inversed (e.g. you cannot select
`has_part` which is a reversed relation of `part_of`).
Relations can also have a DAG structure. In `import_obo()`, if a parent
relation type is selected, all its offspring types are automatically selected.
For example, in GO, besides relations of `is_a` and `part_of`, there are also
`regulates`, `positively_regulates` and `negatively_regulates`, where the
latter two are child relations of `regulates`. So if `regulates` is selected
as an additional relation type, the other two are automatically selected.
The DAG of relation types is automatically recognized and saved from the ontology files.
import_obo("file_for_go.obo", relation_type = c("part_of", "regulates"))
Finally, all the spaces specified in `relation_type` will be converted to
underlines. So it is the same if you specify `"part of"` or `"part_of"`.
## Other ontology formats
For ontologies in other formats, **simona** uses an external tool
[**ROBOT**](http://robot.obolibrary.org/) to convert them to `.obo` format and
later internally uses `import_obo()` to import them. **ROBOT** is already
doing a great and professional job of converting between different ontology
formats. The file `robot.jar` is needed and it can be downloaded from
https://github.com/ontodev/robot/releases (Since this is a tool in Java, you
should have Java already available on your machine).
The file `po.owl` can also be found from the [Plant
Ontology](http://obofoundry.org/ontology/po.html) web page.
```{r, eval = Sys.info()["user"] == "guz"}
dag2 = import_ontology("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.owl",
robot_jar = "~/Downloads/robot.jar")
```{r, eval = FALSE}
```{r, echo = FALSE}
if(Sys.info()["user"] == "guz") {
} else {
"An ontology_DAG object:
Source: po, releases/2021-08-13
1654 terms / 2510 relations
Root: _all_
Terms: PO:0000001, PO:0000002, PO:0000003, PO:0000004, ...
Max depth: 13
Aspect ratio: 24.85:1 (based on the longest distance to root)
39.6:1 (based on the shortest distance to root)
Relations: is_a, part_of
With the following columns in the metadata data frame:
id, short_id, name, namespace, definition
More conveniently, the path of `robot.jar` can be set as a global option:
simona_opt$robot_jar = "~/Downloads/robot.jar"
**ROBOT** supports the following ontology formats and they are automatically
identified according to the file contents.
- `json`: OBO Graphs JSON
- `obo`: OBO Format
- `ofn`: OWL Functional
- `omn`: Manchester
- `owl`: RDF/XML
- `owx`: OWL/XML
- `ttl`: Turtle
## The .owl format
For some huge ontologies, **ROBOT** requires a huge amount of memory to
convert to the `.obo` format. If the ontology is in the `.owl` format (in the
RDF/XML seriation format), the function `import_owl()` can be optionally used.
`import_owl()` directly parses the `.owl` file and returns an `ontology_DAG`
object. The `import_owl()` is written from scratch and it is recommended to
use only when `import_ontology()` does not work.
dag3 = import_owl("https://raw.githubusercontent.com/Planteome/plant-ontology/master/po.owl")
## The .ttl format
Similarly, some ontologies may only provide large `.ttl` format files ([the
Turtle format](https://www.w3.org/TR/turtle/)). **simona** also provides a
function `import_ttl()` which can recognize `.ttl` file with `owl:Class` as
objects. The internal parsing script is written in Perl, so you need to make sure
Perl is installed on your machine.
```{r, eval = FALSE}
# https://bioportal.bioontology.org/ontologies/MSTDE
dag4 = import_ttl("https://jokergoo.github.io/simona/MSTDE.ttl")
## Session info