Get started with OmaDB

Klara Kaleb

2019-05-02

This little vignette shows you how to get started with the OmaDB package. OmaDB is a wrapper for the REST API for the Orthologous MAtrix project (OMA) which is a database for the inference of orthologs among complete genomes.

For more details on the OMA project, see https://omabrowser.org/oma/home/.

Note that each function in the package has its own individual documentation, which can be accessed by putting a question mark (?) in front of the function name e.g. ?getProtein() .

Some useful functions

The package contains a range of functions that are used to query the database in an R friendly way. This vignette highlights some of them, whereas some others are described in more detail in other vignettes:

Exploring Hierarchical orthologous groups with roma

Exploring Taxonomic trees with roma

Sequence Analysis with roma

Note that all of the vignettes focus on exploring the example responses generated previously, allowing them to build with or without an internet connection. For each example response, the query that generated it is given.

searchProtein

This function searches the OMA database for entries containing the pattern defined and returns the results in a dataframe. Hence, it is usually a good starting place. Example response, generated via searchProtein(‘MAL’), is below.

## [1] "xref"

getGenomePairs

This function serves to obtain the orthologs for 2 whole genomes. The result is a dataframe containing information on each member in the pair and their relationship. Below is the representation of the example response, generated using getGenomePairs(‘YEAST’,‘ASHGO’).

##   entry_1.entry_nr                           entry_1.entry_url
## 1          6618226 https://omabrowser.org/api/protein/6618226/
## 2          6618227 https://omabrowser.org/api/protein/6618227/
## 3          6618228 https://omabrowser.org/api/protein/6618228/
## 4          6618229 https://omabrowser.org/api/protein/6618229/
## 5          6618230 https://omabrowser.org/api/protein/6618230/
## 6          6618231 https://omabrowser.org/api/protein/6618231/
##   entry_1.omaid entry_1.canonicalid             entry_1.sequence_md5
## 1    ASHGO00001              Q75FB7 a9b1a6dc9afb2b02afe8fdf8029b5f22
## 2    ASHGO00002              Q75FB6 5d186037c4dd0a89b34d70d596fac86d
## 3    ASHGO00003              Q75FB5 3a83b276f0c9034f7cf66277e7d6c983
## 4    ASHGO00004              Q75FB4 a8611c3f24ac6599e6a36a2710f6d24a
## 5    ASHGO00005              Q75FB3 07277f0ca66fcb49d175667272f1547f
## 6    ASHGO00006              Q75FB2 2b666569856af5f068e532ade89c1140
##   entry_1.oma_group entry_1.oma_hog_id entry_1.chromosome
## 1                 0     HOG:0393392.4c                  I
## 2            203915        HOG:0200818                  I
## 3            214367        HOG:0200433                  I
## 4            768456  HOG:0387657.2d.3a                  I
## 5            530479     HOG:0397172.3b                  I
## 6            563083     HOG:0201049.3a                  I
##   entry_1.locus.start entry_1.locus.end entry_1.locus.strand
## 1                8108              9067                    1
## 2                9537             12593                    1
## 3               12906             13244                    1
## 4               13713             14846                    1
## 5               16155             19850                    1
## 6               20056             23721                   -1
##   entry_1.is_main_isoform entry_2.entry_nr
## 1                    TRUE          6637770
## 2                    TRUE          6637359
## 3                    TRUE          6637360
## 4                    TRUE          6637767
## 5                    TRUE          6636211
## 6                    TRUE          6636209
##                             entry_2.entry_url entry_2.omaid
## 1 https://omabrowser.org/api/protein/6637770/    YEAST04806
## 2 https://omabrowser.org/api/protein/6637359/    YEAST04395
## 3 https://omabrowser.org/api/protein/6637360/    YEAST04396
## 4 https://omabrowser.org/api/protein/6637767/    YEAST04803
## 5 https://omabrowser.org/api/protein/6636211/    YEAST03247
## 6 https://omabrowser.org/api/protein/6636209/    YEAST03245
##   entry_2.canonicalid             entry_2.sequence_md5 entry_2.oma_group
## 1          RCE1_YEAST 605098a0697ad8fc7af2101e758033cb            494558
## 2          ZDS2_YEAST 8cc75f16fbfd321833abc48fd1173154            203915
## 3          YMK8_YEAST 783fea3b573632292d89a5a5218b6e90            214367
## 4          SCS7_YEAST 9307c7a6e80ed39d8b6529329fee1819            768456
## 5          SMC3_YEAST 4e8f1295434b44ae8f749cad976b966e            530479
## 6          NET1_YEAST f2ba71aea520ea66f015ba357eb6e8c6            563083
##   entry_2.oma_hog_id entry_2.chromosome entry_2.locus.start
## 1     HOG:0393392.4c               XIII              814364
## 2     HOG:0200818.1a               XIII               51640
## 3        HOG:0200433               XIII               54793
## 4  HOG:0387657.2d.3a               XIII              809623
## 5     HOG:0397172.3b                  X              299157
## 6     HOG:0201049.2a                  X              295245
##   entry_2.locus.end entry_2.locus.strand entry_2.is_main_isoform rel_type
## 1            815311                   -1                    TRUE      1:1
## 2             54468                    1                    TRUE      1:1
## 3             55110                    1                    TRUE      1:1
## 4            810777                   -1                    TRUE      1:1
## 5            302849                   -1                    TRUE      1:1
## 6            298814                    1                    TRUE      1:1
##   distance   score
## 1 122.0000  636.04
## 2  95.0000 1424.24
## 3  50.0000  557.51
## 4  37.7442 2682.46
## 5  58.0000 5548.40
## 6  90.0000 1844.35

getProtein

This function serves to obtain the information for either a single protein entry or multiple protein entries in a database. For more info, see ?getProtein(). There are similar functions to obtain information on genomes, OMA groups and HOGs i.e. getGenome(), getOMAGroup() and getHOG() respectively.

getObjectAttributes

Single entries in the database are represented as S3 objects, with their attributes corresponding to the information requested. These attributes vary greatly from object to object, and the helper function getObjectAttributes() allows the user to list all the object attributes and their corresponding data types.

getAttribute

The specific attributes of the created object can be accessed via $ or via the getAttribute() function. Below is an example of object containing information about an OMA group.

Below is the exploration of the example OMA group entry response, obtained via getOMAGroup(‘737636’).

## [1] "group_nr : integer"
## [1] "fingerprint : character"
## [1] "related_groups : URL"
## [1] "members : data.frame"
## [1] "FPNDKFP"
## [1] "FPNDKFP"

lazy loading

In most cases there is great quantity of information available for a given entry and this impacts the data retrival time. Due to this, the information available for such entries is split into a number of endpoints and these are included appropriatelly as redirects in URL form. These are automatically loaded upon $ or getAttribute() accession.

For further information on the OMA REST API please visit OMA REST API DOCUMENTATION.