We have developed the human disease ontology R package HDO.db, which provides the semantic relationship between human diseases. Relying on the DOSE and GOSemSim packages we developed, we can carry out disease enrichment and semantic similarity analyses. Many biological studies are achieved through mouse models,
and a large number of data indicate the association between genotypes and phenotypes or diseases. The study of model organisms can be transformed into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Organism-specific genotype-phenotypic associations can be applied to cross-species phenotypic studies to clarify previously unknown phenotypic connections in other species. Using the same principle to diseases can identify genetic associations and even help to identify disease associations that are not obvious. Therefore, as a supplement to HDO.db and DOSE, we developed mouse phenotypic ontology R package MPO.db.
MPO.db mainly contains four kinds of annotation information, which come from:
The ontology data contains the id, name, def, and synonym of the ontology, as also as the parent-child relationship between the ontology. The data comes from: MPheno_OBO.ontology file downloaded from http://www.informatics.jax.org/downloads/reports/index.html#pheno.
These data demonstrate the effect of each genotype on the phenotype. The data come from: MGI database(http://www.informatics.jax.org/downloads/reports/index.html#pheno) and IMPC database (http://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/ release-18.0/results/).
These data demonstrate the effect of each genotype on the disease. The data come from: MGI database(http://www.informatics.jax.org/downloads/reports/index.html#pheno), IMPC database(http://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/ release-18.0/results/), and alliance of genome resources(https://www.alliancegenome.org/downloads).
These data demonstrate the effect of each phenotype on the disease. The data come from: https://github.com/DiseaseOntology/HumanDiseaseOntology, and https://github.com/mapping-commons/mh_mapping_initiative.
The released version from Bioconductor
library(AnnotationDbi)
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> I, expand.grid, unname
MPO.db provide these AnnDbBimap object:
ls("package:MPO.db")
#> [1] "MPO" "MPO.db" "MPOALIAS" "MPOANATOMY" "MPOANCESTOR"
#> [6] "MPOCHILDREN" "MPOMAPCOUNTS" "MPOMGIDO" "MPOMPDO" "MPOMPMGI"
#> [11] "MPOOFFSPRING" "MPOPARENTS" "MPOSYNONYM" "MPOTERM" "MPO_dbInfo"
#> [16] "MPO_dbconn" "MPO_dbfile" "MPO_dbschema" "MPOmetadata" "columns"
#> [21] "keys" "keytypes" "select"
packageVersion("MPO.db")
#> [1] '0.99.7'
You can use help
function to get their documents: help(MPOFFSPRING)
toTable(MPOmetadata)
#> name
#> 1 DBSCHEMA
#> 2 DBSCHEMAVERSION
#> 3 MPOSOURCENAME
#> 4 MPOSOURCURL
#> 5 MPOSOURCEDATE
#> 6 Db type
#> value
#> 1 MPO_DB
#> 2 1.0
#> 3 MGI
#> 4 http://www.informatics.jax.org/downloads/reports/index.html#pheno
#> 5 20230302
#> 6 MPODb
MPOMAPCOUNTS
#> MPOANCESTOR MPOCHILDREN MPOOFFSPRING MPOPARENTS MPOTERM
#> "119177" "17358" "119177" "17358" "13708"
In MPO.db, MPOTERM
represet the whole MP terms and their names. The users can also get their aliases and synonyms from MPOALIAS
and MPOSYNONYM
, respectively.
convert MPOTERM to table
doterm <- toTable(MPOTERM)
head(doterm)
#> mpid term
#> 1 MP:0000001 mammalian phenotype
#> 2 MP:0000003 abnormal adipose tissue morphology
#> 3 MP:0000005 increased brown adipose tissue amount
#> 4 MP:0000008 increased white adipose tissue amount
#> 5 MP:0000010 abnormal abdominal fat pad morphology
#> 6 MP:0000013 abnormal adipose tissue distribution
convert MPOTERM to list
dotermlist <- as.list(MPOTERM)
head(dotermlist)
#> $`MP:0000001`
#> [1] "mammalian phenotype"
#>
#> $`MP:0000003`
#> [1] "abnormal adipose tissue morphology"
#>
#> $`MP:0000005`
#> [1] "increased brown adipose tissue amount"
#>
#> $`MP:0000008`
#> [1] "increased white adipose tissue amount"
#>
#> $`MP:0000010`
#> [1] "abnormal abdominal fat pad morphology"
#>
#> $`MP:0000013`
#> [1] "abnormal adipose tissue distribution"
get alias of MP:0000003
get synonym of MP:0000003
Similar to HDO.db
, we provide four Bimap objects to represent relationship between MP terms: MPOANCESTOR,MPOPARENTS,MPOOFFSPRING, and MPOCHILDREN.
MPOANCESTOR describes the association between MP terms and their ancestral terms based on a directed acyclic graph (DAG) defined by the Mouse Phenotype Ontology. We can use toTable
function in AnnotationDbi
package to get a two-column data.frame: the first column means the MP term ids, and the second column means their ancestor terms.
anc_table <- toTable(MPOANCESTOR)
head(anc_table)
#> mpid ancestor
#> 1 MP:0000003 MP:0005375
#> 2 MP:0000003 MP:0000001
#> 3 MP:0000005 MP:0001778
#> 4 MP:0000005 MP:0002971
#> 5 MP:0000005 MP:0005452
#> 6 MP:0000005 MP:0000003
get ancestor of “MP:0000013”
MPOPARENTS describes the association between MP terms and their direct parent terms based on DAG. We can use toTable
function in AnnotationDbi
package to get a two-column data.frame: the first column means the MP term ids, and the second column means their parent terms.
parent_table <- toTable(MPOPARENTS)
head(parent_table)
#> mpid parent
#> 1 MP:0000003 MP:0005375
#> 2 MP:0000005 MP:0001778
#> 3 MP:0000008 MP:0001781
#> 4 MP:0000010 MP:0005334
#> 5 MP:0000013 MP:0000003
#> 6 MP:0000015 MP:0002095
get parent term of “MP:0000013”
MPOPARENTS describes the association between MP terms and their offspring
terms based on DAG. it’s the exact opposite of MPOANCESTOR
, whose usage is similar to it.
get offspring of “MP:0000010”
off_list <- AnnotationDbi::as.list(MPO.db::MPOOFFSPRING)
off_list[["MP:0000010"]]
#> [1] "MP:0005335" "MP:0005336" "MP:0005337" "MP:0006319" "MP:0008900"
#> [6] "MP:0008901" "MP:0008902" "MP:0008903" "MP:0008906" "MP:0009283"
#> [11] "MP:0009285" "MP:0009286" "MP:0009287" "MP:0009288" "MP:0009289"
#> [16] "MP:0009292" "MP:0009293" "MP:0009298" "MP:0009299" "MP:0009300"
#> [21] "MP:0009301" "MP:0009302" "MP:0009303" "MP:0009304" "MP:0009305"
#> [26] "MP:0009306" "MP:0009307" "MP:0010044" "MP:0010045" "MP:0010046"
#> [31] "MP:0020998"
MPOCHILDREN describes the association between MP terms and their direct children terms based on DAG. it’s the exact opposite of MPOPARENTS
, whose usage is similar to it.
get children of “MP:0000003”
child_list <- AnnotationDbi::as.list(MPO.db::MPOCHILDREN)
child_list[["MP:0000003"]]
#> [1] "MP:0000013" "MP:0002970" "MP:0002971" "MP:0005334" "MP:0005452"
#> [6] "MP:0005457" "MP:0009115" "MP:0011167" "MP:0011174" "MP:0012320"
#> [11] "MP:0013249" "MP:0030007" "MP:0030993"
The MPO.db support the select()
, keys()
, keytypes()
, and columns
interface.
columns(MPO.db)
#> [1] "alias" "ancestor" "children" "doid" "mgi" "mpid"
#> [7] "offspring" "parent" "synonym" "term"
## use mpid keys
dokeys <- keys(MPO.db)[1:100]
res <- select(x = MPO.db, keys = dokeys, keytype = "mpid",
columns = c("offspring", "term", "doid", "mgi"))
head(na.omit(res))
#> mpid offspring term doid mgi
#> 16765 MP:0000015 MP:0011278 abnormal ear pigmentation DOID:0060470 14572
#> 16766 MP:0000015 MP:0011278 abnormal ear pigmentation DOID:0060470 14910
#> 16767 MP:0000015 MP:0011278 abnormal ear pigmentation DOID:0060470 16590
#> 16768 MP:0000015 MP:0011278 abnormal ear pigmentation DOID:0060470 17918
#> 16769 MP:0000015 MP:0011278 abnormal ear pigmentation DOID:0060470 192232
#> 16770 MP:0000015 MP:0011278 abnormal ear pigmentation DOID:0060470 192236
key <- keys(MPO.db, "mpid")[1:100]
res <- select(x = MPO.db, keys = key, keytype = "mpid",
columns = c("mpid", "term", "children"))
head(na.omit(res))
#> mpid term children
#> 1 MP:0000001 mammalian phenotype MP:0001186
#> 2 MP:0000001 mammalian phenotype MP:0002006
#> 3 MP:0000001 mammalian phenotype MP:0002873
#> 4 MP:0000001 mammalian phenotype MP:0003012
#> 5 MP:0000001 mammalian phenotype MP:0003631
#> 6 MP:0000001 mammalian phenotype MP:0005367
## use term keys
# dokeys <- head(keys(MPO.db, keytype = "term"))
# res <- select(x = MPO.db, keys = dokeys, keytype = "term",
# columns = c("offspring", "mpid", "parent"))
# head(res)
dokeys <- keys(MPO.db, keytype = "term")[1:100]
res <- select(x = MPO.db, keys = dokeys, keytype = "term",
columns = c("offspring", "mpid", "doid", "mgi"))
head(na.omit(res))
#> mpid term offspring doid mgi
#> 16765 MP:0000015 abnormal ear pigmentation MP:0011278 DOID:0060470 14572
#> 16766 MP:0000015 abnormal ear pigmentation MP:0011278 DOID:0060470 14910
#> 16767 MP:0000015 abnormal ear pigmentation MP:0011278 DOID:0060470 16590
#> 16768 MP:0000015 abnormal ear pigmentation MP:0011278 DOID:0060470 17918
#> 16769 MP:0000015 abnormal ear pigmentation MP:0011278 DOID:0060470 192232
#> 16770 MP:0000015 abnormal ear pigmentation MP:0011278 DOID:0060470 192236
## use mgi keys
key <- keys(MPO.db, "mgi")[1:100]
res <- select(x = MPO.db, keys = key, keytype = "mgi",
columns = c("mgi", "mpid", "children"))
head(na.omit(res))
#> mpid mgi children
#> 1 MP:0005367 69865 MP:0000516
#> 2 MP:0005367 69865 MP:0005502
#> 3 MP:0005369 69865 MP:0002106
#> 4 MP:0005369 69865 MP:0002108
#> 5 MP:0005370 69865 MP:0002138
#> 6 MP:0005370 69865 MP:0002139
res <- select(x = MPO.db, keys = key, keytype = "mgi",
columns = c("doid", "mgi"))
head(na.omit(res))
#> mgi doid
#> 1 11666 DOID:10588
#> 2 74591 DOID:0060713
#> 3 18670 DOID:1949
#> 4 18670 DOID:13580
#> 5 11304 DOID:0111013
#> 6 19299 DOID:0111066
Please go to https://yulab-smu.top/biomedical-knowledge-mining-book/ for the vignette.
Please go to https://yulab-smu.top/biomedical-knowledge-mining-book/dose-enrichment.html for the vignette.
sessionInfo()
#> R version 4.3.0 RC (2023-04-18 r84287)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] AnnotationDbi_1.63.1 IRanges_2.35.1 S4Vectors_0.39.1
#> [4] Biobase_2.61.0 BiocGenerics_0.47.0 MPO.db_0.99.7
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.41.0 xfun_0.39
#> [3] bslib_0.4.2 vctrs_0.6.2
#> [5] tools_4.3.0 bitops_1.0-7
#> [7] generics_0.1.3 curl_5.0.0
#> [9] tibble_3.2.1 fansi_1.0.4
#> [11] RSQLite_2.3.1 blob_1.2.4
#> [13] pkgconfig_2.0.3 dbplyr_2.3.2
#> [15] lifecycle_1.0.3 GenomeInfoDbData_1.2.10
#> [17] compiler_4.3.0 Biostrings_2.69.1
#> [19] httpuv_1.6.11 GenomeInfoDb_1.37.1
#> [21] htmltools_0.5.5 sass_0.4.6
#> [23] RCurl_1.98-1.12 yaml_2.3.7
#> [25] interactiveDisplayBase_1.39.0 pillar_1.9.0
#> [27] later_1.3.1 crayon_1.5.2
#> [29] jquerylib_0.1.4 ellipsis_0.3.2
#> [31] cachem_1.0.8 mime_0.12
#> [33] AnnotationHub_3.9.1 tidyselect_1.2.0
#> [35] digest_0.6.31 purrr_1.0.1
#> [37] dplyr_1.1.2 BiocVersion_3.18.0
#> [39] fastmap_1.1.1 cli_3.6.1
#> [41] magrittr_2.0.3 utf8_1.2.3
#> [43] withr_2.5.0 promises_1.2.0.1
#> [45] filelock_1.0.2 rappdirs_0.3.3
#> [47] bit64_4.0.5 rmarkdown_2.21
#> [49] XVector_0.41.1 httr_1.4.6
#> [51] bit_4.0.5 png_0.1-8
#> [53] memoise_2.0.1 shiny_1.7.4
#> [55] evaluate_0.21 knitr_1.43
#> [57] BiocFileCache_2.9.0 rlang_1.1.1
#> [59] Rcpp_1.0.10 xtable_1.8-4
#> [61] glue_1.6.2 DBI_1.1.3
#> [63] BiocManager_1.30.20 jsonlite_1.8.4
#> [65] R6_2.5.1 zlibbioc_1.47.0