--- title: "rWikiPathways and BridgeDbR" author: "by Alexander Pico" package: rWikiPathways date: "`r Sys.Date()`" output: BiocStyle::html_document: toc_float: true # pdf_document: # toc: true vignette: > %\VignetteIndexEntry{2. rWikiPathways and BridgeDbR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE} knitr::opts_chunk$set( eval=FALSE ) ``` *WikiPathways* is a well-known repository for biological pathways that provides unique tools to the research community for content creation, editing and utilization [@Pico2008]. **R** is a powerful programming language and environment for statistical and exploratory data analysis. *rWikiPathways* leverages the WikiPathways API to communicate between **R** and WikiPathways, allowing any pathway to be queried, interrogated and downloaded in both data and image formats. Queries are typically performed based on "Xrefs", standardized identifiers for genes, proteins and metabolites. Once you can identified a pathway, you can use the WPID (WikiPathways identifier) to make additional queries. *BridgeDbR* leverages the BridgeDb API to provide a number of functions related to ID mapping and identifiers in general for gene, proteins and metabolites. Together, *BridgeDbR* provides convience to the typical *rWikPathways* user by supplying formal names and codes defined by BridgeDb and used by WikiPathways. # Prerequisites In addition to this **rWikiPathways** package, you'll also need to install **BridgeDbR**: ```{r} if(!"rWikiPathways" %in% installed.packages()){ source("https://bioconductor.org/biocLite.R") biocLite("rWikiPathways") } library(rWikiPathways) if(!"BridgeDbR" %in% installed.packages()){ source("https://bioconductor.org/biocLite.R") biocLite("BridgeDbR") } library(BridgeDbR) ``` # Getting started Lets first check some of the most basic functions from each package. For example, here's how you check to see which species are currently supported by WikiPathways: ```{r} orgNames <- listOrganisms() orgNames ``` You should see 30 or more species listed. This list is useful for subsequent queries that take an *organism* argument, to avoid misspelling. However, some function want the organism code, rather than the full name. Using BridgeDbR's *getOrganismCode* function, we can get those: ```{r} unlist(lapply(orgNames,BridgeDbR::getOrganismCode)) ``` # Identifier System Names and Codes Even more obscure are the various datasources providing official identifiers and how they are named and coded. Fortunately, BridgeDb defines these clearly and simply. And WikiPathways relies on these BridgeDb definitions. For example, this is how we find the system code for Ensembl: ```{r} BridgeDbR::getSystemCode("Ensembl") ``` It's "En"! That's simple enough. But some are less obvious... ```{r} BridgeDbR::getSystemCode("Entrez Gene") ``` It's "L" because the resource used to be named "Locus Link". Sigh... Don't try to guess these codes. Use this function from BridgeDb (above) to get the correct code. By the way, all the systems supported by BridgeDb are here: https://github.com/bridgedb/BridgeDb/blob/master/org.bridgedb.bio/resources/org/bridgedb/bio/datasources.txt # How to use BridgeDbR with rWikiPathways Here are some specific combo functions that are useful. They let you skip worrying about system codes altogether! 1. Getting all the pathways containing the HGNC symbol "TNF": ```{r} tnf.pathways <- findPathwayIdsByXref('TNF', getSystemCode('HGNC')) tnf.pathways ``` 2. Getting all the genes from a pathway as Ensembl identifiers: ```{r} getXrefList('WP554', getSystemCode('Ensembl')) ``` 3. Getting all the metabolites from a pathway as ChEBI identifiers: ```{r} getXrefList('WP554', getSystemCode('ChEBI')) ``` # Other tips And if you ever find yourself with a system code, e.g., from a rWikiPathways return result and you're not sure what it is, then you can use this function: ```{r} BridgeDbR::getFullName('Ce') ```