--- title: "Get started with OmaDB" author: "Klara Kaleb" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Get started with OmaDB} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This little vignette shows you how to get started with the `roma` package. roma is a wrapper for the REST API for the Orthologous MAtrix project (OMA) which is a database for the inference of orthologs among complete genomes. For more details on the OMA project, see https://omabrowser.org/oma/home/. ## Some useful functions The package contains a range of functions that are used to query the database in an R friendly way. This vignette describes some of them, whereas the rest are described in more detail in other vignettes: [Exploring Hierarchical orthologous groups with roma](exploring_hogs.html) [Exploring Taxonomic trees with roma](tree_visualisation.html) [Sequence Analysis with roma](sequence_mapping.html) ###getXref This function searches the OMA database for entries containing the pattern defined and returns the results in a dataframe. Hence, it is usually a good starting place. Sample response is below. ```{r, warning=FALSE, message=FALSE} library(OmaDB) xref = load('../data/xref.rda') head(xref) ``` ###getGenomeAlignment This function serves to obtain the orthologs for 2 whole genomes. The result is a dataframe containing information on each member in the pair and their relationship. ```{r, warning=FALSE, message=FALSE} load('../data/pairs.rda') head(pairs) ``` ###getData This master-function function serves to obtain the information for a single entry in a database - either a group, protein or a genome. The data type is specified by setting the "type" argument and a specific entry by its ID - below are the possible ID's for different object type. * Protein * oma id * canonical id * entry number - release specific * Genome * NCBI taxon id * UniProt species code * Group * group number * group fingerprint * id of one of its members ###getObjectAttributes The result of the getData function is an S3 Object with attributes corresponding to the information requested. This function allows the user to list all the object attributes and their corresponding data types. ###getAttribute The specific attributes of the created object can be accessed via $ or via the getAttribute() function. Below is an example of object containing information about an OMA group. ```{r, warning=FALSE, message=FALSE} load('../data/group.rda') object_attributes = getObjectAttributes(group) group$fingerprint getAttribute(group, 'fingerprint') ``` ###resolveURL In most cases there is great quantity of information available for a given entry and this impacts the data retrival time. Due to this, the information available for such entries is split into a number of endpoints and these are included appropriatelly as redirects. This function allows the user to obtain further information behind those urls. An example of use for the above function would be to obtain the list of orthologs for a given protein. ```{r, warning=FALSE, message=FALSE} load('../data/protein.rda') getAttribute(protein,'orthologs') load('../data/orthologs.rda') orthologs ``` The orthologs for a protein are returned as a data.frame. This structure is also found in other areas of the package (e.g. the data on the members of a particular OMA group or a HOG) and hence features a number of functions to simplify its processing. For example, the user can obtain a set of genomic ranges for the proteins in a dataframe as so: ```{r, warning=FALSE, message=FALSE} gRanges = getGRanges(orthologs) str(gRanges) ``` The user can also obtain the list of sequences for a given dataframe of proteins and well as the list of their corresponding ontologies (that can be plugged into the topGO for further analysis). This can be done using the functions getSequences() and getOntologies() respectively. For further information on the OMA REST API please visit [OMA REST API DOCUMENTATION](http://omadev.cs.ucl.ac.uk/api/docs).