--- title: "SNPediaR" author: "David Montaner" date: "[www.genometra.com](http://www.genometra.com)" output: BiocStyle::html_document: toc: yes fig_width: 5 fig_height: 5 vignette: > %\VignetteIndexEntry{SNPediaR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` About ================================================================================ [SNPedia](http://www.snpedia.com) is a curated database containing information about thousands of [SNPs](https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism). Related diseases, genotypes and references to relevant scientific publications are available trough their web. This site is powered by [MediaWiki](https://www.mediawiki.org) and information about each SNP is written in the corresponding wiki page. The `SNPediaR` library provides tools for automatically search and download such pages. It also implements few functions to scrap some relevant information from the downloaded wiki text, and allows users to extend such parsing functionality. Downloading pages ================================================================================ For a known set of pages, the function `getPages` downloads the corresponding wiki content using the MediaWiki web [API](https://www.mediawiki.org/wiki/API:Query). We can for instance download the page [Rs53576](http://www.snpedia.com/index.php/Rs53576), corresponding to the __rs53576__ SNP doing: ```{r} library (SNPediaR) pg <- getPages (titles = "Rs53576") pg ``` We can use the same function to download several pages at a time, for instance we can download the 3 _genotype_ pages corresponding with the same SNP: [Rs53576(A;A)](http://www.snpedia.com/index.php/Rs53576(A;A)), [Rs53576(A;G)](http://www.snpedia.com/index.php/Rs53576(A;G)) and [Rs53576(G;G)](http://www.snpedia.com/index.php/Rs53576(G;G)) as ```{r} pgs <- getPages (titles = c ("Rs53576(A;A)", "Rs53576(A;G)", "Rs53576(G;G)")) pgs ``` Extracting relevant information requires parsing the wiki text. Some utility functions are already implemented in our library for such purpose and any other can be implemented by users. The function `extractSnpTags` for instance, extracts the "tabular" information from _SNP pages_: ```{r} extractSnpTags (pg$Rs53576) ``` The function `extractGenotypeTags` can be used to get the "tabular" information from _genotype pages_: ```{r} sapply (pgs, extractGenotypeTags) ``` This same parsing can also be done while downloading the pages, including the _wiki processing_ function as an argument of the in the `getPages` query. If for instance we are just interested in the alleles and the _magnitude_ associated with each of the genotypes we can do: ```{r} getPages (titles = c ("Rs53576(A;A)", "Rs53576(A;G)", "Rs53576(G;G)"), wikiParseFunction = extractGenotypeTags, tags = c ("allele1", "allele2", "magnitude")) ``` Customized parsing functions ---------------------------------------- Any _wiki processing_ function can be included in the `getPages`. If a user wants for instance to extract all _PubMed_ IDs from pages [Rs53576](http://www.snpedia.com/index.php/Rs53576) and [Rs1815739](http://www.snpedia.com/index.php/Rs1815739), he or she can first define a parsing function like: ```{r} findPMID <- function (x) { x <- unlist (strsplit (x, split = "\n")) x <- grep ("PMID=", x, value = TRUE) x } ``` and then call `getPages` as: ```{r} getPages (titles = c ("Rs53576", "Rs1815739"), wikiParseFunction = findPMID) ``` Categories ================================================================================ Besides the _SNP_ and the _genotype_ pages, some other interesting SNPedia resources are the __category__ pages. They constitute _indexes_ of all other pages which may be queried. Most used categories are: - Is_a_medical_condition - Is_a_medicine - Is_a_topic - Is_a_snp - In_dbSNP - Is_a_genotype Full list of categories may be found [here](http://www.snpedia.com/index.php/Special:Categories). The function `getCategoryElements` is devised to query all elements under certain category. It can be used explore which is the available information in SNPedia. We can get for instance all _medical conditions_ ```{r} res <- getCategoryElements (category = "Is_a_medical_condition") head (res) ``` and find out those related to _cancer_ ```{r} grep ('cancer', res, value = TRUE) ``` Session info ================================================================================ ```{r} sessionInfo () ``` --------------------------------------------------------------------------------