--- title: "An R interface to the Ontology Lookup Service" author: - name: Laurent Gatto affiliation: Computational Proteomics Unit, Cambridge, UK package: rols abstract: > How to query the Ontology Lookup Service directly from R and how to create and parse controlled vocabulary. output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{An R interface to the Ontology Lookup Service} %\VignetteEngine{knitr::rmarkdown} %\VignetteKeywords{Infrastructure, Bioinformatics, ontology, data} %\VignetteEncoding{UTF-8} --- ```{r env, echo=FALSE} suppressPackageStartupMessages(library("GO.db")) suppressPackageStartupMessages(library("BiocStyle")) suppressPackageStartupMessages(library("rols")) suppressPackageStartupMessages(library("DT")) nonto <- length(ol <- Ontologies()) ``` # Introduction ## Installation `r Biocpkg("rols")` is a Bioconductor package and should hence be installed using the dedicated functionality ```{r install, eval=FALSE} ## try http:// if https:// URLs are not supported source("https://bioconductor.org/biocLite.R") biocLite("rols") ``` ## Getting help To get help, either post your question on the [Bioconductor support site](https://support.bioconductor.org/) or open an issue on the `r Biocpkg("rols")` [github page](https://github.com/lgatto/rols/issues). ## The resource The [Ontology Lookup Service](http://www.ebi.ac.uk/ontology-lookup/) (OLS) [[1](http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-97), [2](http://nar.oxfordjournals.org/content/36/suppl_2/W372)] is originally spin-off of the [PRoteomics IDEntifications](http://www.ebi.ac.uk/pride/archive/) database (PRIDE) service, located at the EBI, and is now developed and maintained by the [Samples, Phenotypes and Ontologies](http://www.ebi.ac.uk/about/spot-team) team at EMBL-EBI. ## The package The OLS provides a REST interface to hundreds of ontologies from a single location with a unified output format. The `r Biocpkg("rols")` package make this possible from within R. Do do so, it relies on the `r CRANpkg("httr")` package to query the REST interface, and access and retrieve data. There are `r nonto` ontologies available in the OLS, listed in the table below. Their name is to be use to defined which ontology to query. ```{r ontTable, echo = FALSE} datatable(as(ol, "data.frame")) ``` # A Brief rols overview The `r Biocpkg('rols')` package is build around a few classes that enable to query the OLS and retrieve, store and manipulate data. Each of these classes are described in more details in their respective manual pages. We start by loading the package. ```{r} library("rols") ``` ## Ontologies The `Ontology` and `Ontologies` classes can store information about single of multiple ontologies. The latter can be easily subset using `[` and `[[`, as one would for lists. ```{r ol} ol <- Ontologies() ol ``` ```{r} head(olsNamespace(ol)) ol[["go"]] ``` It is also possible to initialise a single ontology ```{r} go <- Ontology("go") go ``` ## Terms Single ontology terms are stored in `Term` objects. When more terms need to be manipulated, they are stored as `Terms` objects. It is easy to obtain all terms of an ontology of interest, and the resulting `Terms` object can be subset using `[` and `[[`, as one would for lists. ```{r} gotrms <- terms(go) ## or terms("go") gotrms gotrms[1:10] gotrms[["GO:0090575"]] ``` It is also possible to initialise a single term ```{r} trm <- term(go, "GO:0090575") termId(trm) termLabel(trm) strwrap(termDesc(trm)) ``` It is then possible to extract the `ancestors`, `descendants`, `parents` and `children` terms. Each of these functions return a `Terms` object ```{r} parents(trm) children(trm) ``` Similarly, the `partOf` and `derivesFrom` functions return, for an input term, the terms it is a part of and derived from. Finally, a single term or terms object can be coerced to a `data.frame` using `as(x, "data.frame")`. ## Properties Properties (relationships) of single or multiple terms or complete ontologies can be queries with the `properties` method, as briefly illustrated below. ```{r propex} trm <- term("uberon", "UBERON:0002107") trm p <- properties(trm) p p[[1]] termLabel(p[[1]]) ``` # Use case ```{r, echo=FALSE} alltgns <- OlsSearch(q = "trans-golgi network") ``` A researcher might be interested in the trans-Golgi network. Searching the OLS is assured by the `OlsSearch` and `olsSearch` classes/functions. The first step is to defined the search query with `OlsSearch`, as shown below. This creates an search object of class `OlsSearch` that stores the query and its parameters. In records the number of requested results (default is 20) and the total number of possible results (there are `r alltgns@numFound` results across all ontologies, in this case). At this stage, the results have not yet been downloaded, as shown by the `r nrow(alltgns@response)` responses. ```{r tgnquery, eval = TRUE} OlsSearch(q = "trans-golgi network") ``` `r alltgns@numFound` results are probably too many to be relevant. Below we show how to perform an exact search by setting `exact = TRUE`, and limiting the search the the GO ontology by specifying `ontology = "GO"`, or doing both. ```{r tgnquery1, eval = TRUE} OlsSearch(q = "trans-golgi network", exact = TRUE) OlsSearch(q = "trans-golgi network", ontology = "GO") OlsSearch(q = "trans-golgi network", ontology = "GO", exact = TRUE) ``` One case set the `rows` argument to set the number of desired results. ```{r tgnquery2} OlsSearch(q = "trans-golgi network", ontology = "GO", rows = 200) ``` Alternatively, one can call the `allRows` function to request all results. ```{r tgnquery3} (tgnq <- OlsSearch(q = "trans-golgi network", ontology = "GO")) (tgnq <- allRows(tgnq)) ``` ```{r tgnsear4, echo=FALSE} qry <- OlsSearch(q = "trans-golgi network", exact = TRUE) ``` Let's proceed with the exact search and retrieve the results. Even if we request the default 20 results, only the `r qry@numFound` relevant result will be retrieved. The `olsSearch` function updates the previously created object (called `qry` below) by adding the results to it. ```{r tgnquery5} qry <- OlsSearch(q = "trans-golgi network", exact = TRUE) (qry <- olsSearch(qry)) ``` We can now transform this search result object into a fully fledged `Terms` object or a `data.frame`. ```{r tgnres} (qtrms <- as(qry, "Terms")) str(qdrf <- as(qry, "data.frame")) ``` In this case, we can see that we actually retrieve the same term used across different ontologies. In such cases, it might be useful to keep only non-redundant term instances. Here, this would have been equivalent to searching the `r termOntology(unique(qtrms))` ontology ```{r uterms} qtrms <- unique(qtrms) termOntology(qtrms) termNamespace(qtrms) ``` Below, we execute the same query using the `r Biocpkg("GO.db")` package. ```{r go.db, message=FALSE} library("GO.db") GOTERM[["GO:0005802"]] ``` # On-line vs. off-line data It is possible to observe different results with `r Biocpkg("rols")` and `r Biocpkg("GO.db")`, as a result of the different ways they access the data. `r Biocpkg("rols")` or `r Biocpkg("biomaRt")` perform direct online queries, while `r Biocpkg("GO.db")` and other annotation packages use database snapshot that are updated every release. Both approaches have advantages. While online queries allow to obtain the latest up-to-date information, such approaches rely on network availability and quality. If reproducibility is a major issue, the version of the database to be queried can easily be controlled with off-line approaches. In the case of `r Biocpkg("rols")`, although the load date of a specific ontology can be queried with `olsVersion`, it is not possible to query a specific version of an ontology. # Changes in version 2.0 `r Biocpkg("rols")` 2.0 has substantially changed. While the table below shows some correspondence between the old and new interface, this is not always the case. The new interface relies on the `Ontology`/`Ontologies`, `Term`/`Terms` and `OlsSearch` classes, that need to be instantiated and can then be queried, as described above. | version < 1.99 | version >= 1.99 | |--------------------|-----------------| | `ontologyLoadDate` | `olsLoaded` and `olsUpdated` | | `ontologyNames` | `Ontologies` | | `olsVersion` | `olsVersion` | | `allIds` | `terms` | | `isIdObsolete` | `isObsolete` | | `rootId` | `olsRoot` | | `olsQuery` | `OlsSearch` and `olsSearch` | Not all functionality is currently available. If there is anything that you need but not available in the new version, please contact the maintained by opening an [issue](https://github.com/lgatto/rols/issues) on the package development site. # CVParams The `CVParam` class is used to handle controlled vocabulary. It can be used for user-defined parameters ```{r} CVParam(name = "A user param", value = "the value") ``` or official controlled vocabulary (which triggers a query to the OLS service) ```{r} CVParam(label = "MS", accession = "MS:1000073") CVParam(label = "MS", name ="electrospray ionization") CVParam(label = "MS", name ="ESI") ## using a synonym ``` See `?CVParam` for more details and examples. # Session information ```{r si, echo=FALSE} print(sessionInfo(), locale = FALSE) ```