--- title: "MeSH Enrichment and Semantic Analyses" author: "\\ Guangchuang Yu ()\\ School of Public Health, The University of Hong Kong" date: "`r Sys.Date()`" bibliography: meshes.bib biblio-style: apalike output: prettydoc::html_pretty: toc: true theme: cayman highlight: github pdf_document: toc: true vignette: > % \VignetteIndexEntry{An introduction to meshes} % \VignetteEngine{knitr::rmarkdown} %\VignetteDepends{MeSH.Hsa.eg.db} %\VignetteDepends{MeSH.db} % \usepackage[utf8]{inputenc} %\VignetteEncoding{UTF-8} --- ```{r style, echo=FALSE, results="asis", message=FALSE} knitr::opts_chunk$set(tidy = FALSE, warning = FALSE, message = FALSE) ``` ```{r echo=FALSE, results="hide", message=FALSE} CRANpkg <- function (pkg) { cran <- "https://CRAN.R-project.org/package" fmt <- "[%s](%s=%s)" sprintf(fmt, pkg, cran, pkg) } Biocpkg <- function (pkg) { sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg) } library(MeSH.Hsa.eg.db) library(MeSH.db) library(DOSE) library(meshes) ``` # Introduction MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and `MeSH.db` contains 16 of them. That is: |Abbreviation |Category | |:------------|:---------------------------------------------------------------| |A |Anatomy | |B |Organisms | |C |Diseases | |D |Chemicals and Drugs | |E |Analytical, Diagnostic and Therapeutic Techniques and Equipment | |F |Psychiatry and Psychology | |G |Phenomena and Processes | |H |Disciplines and Occupations | |I |Anthropology, Education, Sociology and Social Phenomena | |J |Technology and Food and Beverages | |K |Humanities | |L |Information Science | |M |Persons | |N |Health Care | |V |Publication Type | |Z |Geographical Locations | MeSH terms were associated with Entrez Gene ID by three methods, `gendoo`, `gene2pubmed` and `RBBH` (Reciprocal Blast Best Hit). |Method|Way of corresponding Entrez Gene IDs and MeSH IDs| |------|-------------------------------------------------| |Gendoo|Text-mining| |gene2pubmed|Manual curation by NCBI teams| |RBBH|sequence homology with BLASTP search (E-value<10-50)| # Enrichment Analysis `meshes` supports enrichment analysis (over-representation analysis and gene set enrichment analysis) of gene list or whole expression profile using MeSH annotation. Data source from `gendoo`, `gene2pubmed` and `RBBH` are all supported. User can selecte interesting category to test. All 16 categories are supported. The analysis supports >70 species listed in [MeSHDb BiocView](https://bioconductor.org/packages/release/BiocViews.html#___MeSHDb). For algorithm details, please refer to the vignettes of `r Biocpkg("DOSE")`[@yu_dose_2015] package. ```{r} library(meshes) data(geneList, package="DOSE") de <- names(geneList)[1:100] ## minGSSize = 200 for only speed up the compilation of the vignette x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C', minGSSize=200) head(x) ``` In the over-representation analysis, we use data source from `gendoo` and `C` (Diseases) category. In the following example, we use data source from `gene2pubmed` and test category `G` (Phenomena and Processes) using GSEA. ```{r} ## minGSSize = 200 for only speed up the compilation of the vignette y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G", minGSSize=200) head(y) ``` User can use visualization methods implemented in `r Biocpkg("enrichplot")` (i.e.`barplot`, `dotplot`, `cnetplot`, `emapplot` and `gseaplot`) to visualize these enrichment results. With these visualization methods, it's much easier to interpret enriched results. ```{r} dotplot(x) gseaplot(y, y[1,1], title=y[1,2]) ``` # Semantic Similarity `meshes` implemented four IC-based methods (i.e. Resnik[@philip_semantic_1999], Jiang[@jiang_semantic_1997], Lin[@lin_information-theoretic_1998] and Schlicker[@schlicker_new_2006]) and one graph-structure based method (i.e. Wang[@wang_new_2007]). For algorithm details, please refer to the vignette of `r Biocpkg("GOSemSim")` package[@yu2010] `meshSim` function is designed to measure semantic similarity between two MeSH term vectors. ```{r} library(meshes) ## hsamd <- meshdata("MeSH.Hsa.eg.db", category='A', computeIC=T, database="gendoo") data(hsamd) meshSim("D000009", "D009130", semData=hsamd, measure="Resnik") meshSim("D000009", "D009130", semData=hsamd, measure="Rel") meshSim("D000009", "D009130", semData=hsamd, measure="Jiang") meshSim("D000009", "D009130", semData=hsamd, measure="Wang") meshSim(c("D001369", "D002462"), c("D017629", "D002890", "D008928"), semData=hsamd, measure="Wang") ``` `geneSim` function is designed to measure semantic similarity among two gene vectors. ```{r} geneSim("241", "251", semData=hsamd, measure="Wang", combine="BMA") geneSim(c("241", "251"), c("835", "5261","241", "994"), semData=hsamd, measure="Wang", combine="BMA") ``` # Related tools ## Enrichment analysis - `r Biocpkg("DOSE")`[@yu_dose_2015] - `r Biocpkg("clusterProfiler")`[@yu2012] - `r Biocpkg("ReactomePA")`[@yu_reactomepa_2016] ## Semantic similairty measurement - `r Biocpkg("GOSemSim")`[@yu2010] - `r Biocpkg("DOSE")`[@yu_dose_2015] # Need helps? If you have questions/issues, please visit [meshes homepage](https://guangchuangyu.github.io/software/meshes/) first. Your problems are mostly documented. If you think you found a bug, please follow [the guide](https://guangchuangyu.github.io/2016/07/how-to-bug-author/) and provide a reproducible example to be posted on [github issue tracker](https://github.com/GuangchuangYu/meshes/issues). For questions, please post to [Bioconductor support site](https://support.bioconductor.org/) and tag your post with *meshes*. For Chinese user, you can follow me on [WeChat (微信)](https://guangchuangyu.github.io/blog_images/biobabble.jpg). # Session Information Here is the output of `sessionInfo()` on the system on which this document was compiled: ```{r echo=FALSE} sessionInfo() ``` # References