\documentclass[12pt]{article}

\usepackage[utf8]{inputenc}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{a4wide}

% \VignetteIndexEntry{tutorial}

\title{An R package with BridgeDb for identifier mapping}
\author{
  Egon Willighagen
}
\date{\today}

\begin{document}
\maketitle

<<echo=FALSE, results=hide>>=
library(BridgeDbR)
@

\section{Introduction}

BridgeDb \url{https://github.com/bridgedb/BridgeDb} is a combination of an application programming interface (API), library, and set of data files
for mapping identifiers for identical objects~\cite{VanIersel2010}. Because BridgeDb is use by projects
in bioinformatics, like WikiPathways and PathVisio~\cite{Pico2008,VanIersel2008},
identifier mapping databases are available for gene products and metabolites.

\section{Concepts}

BridgeDb has a few core concepts which are explained in this section. Much of the API requires one to
be familiar with these concepts, though some are not always applicable. The first concept is an example
of that: organisms, which do not apply to metabolites.

\subsection{Organisms}

However, for genes the organism is important: the same gene has different identifiers in different
organisms. BridgeDb identifies organisms by their latin name and with a two character code. Because
identifier mapping files provided by PathVisio have names with these short codes, it can be useful to
have a conversion method:

<<>>=
code = getOrganismCode("Rattus norvegicus")
code
@

\subsection{Data Sources}

Identifiers have a context and this context is often a database. For example, metabolite identfiers
can be provided by the Human Metabolome Database (HMDB), ChemSpider, PubChem, and ChEBI. Similarly,
gene product identifiers can be provided by databases like Ensemble. Such a database providing
identifiers is in BridgeDb called a data source.

Importantly, each such data source is identified by a human readable long name and by a short
system code. This package has methods to interconvert one into the other:

<<>>=
fullName <- getFullName("Ce")
fullName
code <- getSystemCode("ChEBI")
code
@

\subsection{Identifier Patterns}

Another useful aspect of BridgeDb is that it knows about the patterns of identifiers. If this
pattern is unique enough, it can be used used to automatically find the data sources that
match a particular identifier. For example:

<<>>=
getMatchingSources("HMDB00555")
getMatchingSources("ENSG00000100030")
@

\section{Identifier Mapping Databases}

The BridgeDb package primarily provides the software framework, and not identifier mapping
data. Identifier Mapping databases can be downloaded from various websites. The package
knows about the download location provided by PathVisio, and we can query for all gene
product identifier mapping databases:

<<>>=
getBridgeNames()
@

\subsection{Downloading}

The package provides
a convenience method to download such identifier mapping databases. For example, we can save the
identifier mapping database for rat to the current folder with:

<<eval=FALSE>>=
dbLocation <- getDatabase("Rattus norvegicus",location=getwd())
@

The dbLocation variable then contains the location of the identifier mapping file that was
downloaded.

\subsection{Loading Databases}

Once you have downloaded an identifier mapping database, either manually or via the getDatabase()
method, you need to load the database for the identifier mappings to become available.

<<eval=FALSE>>=
mapper <- loadDatabase(dbLocation)
@

\section{Mapping Identifiers}

With a loaded database, identifiers can be mapped. The mapping method uses system codes. So, to
map the human Entrez Gene identifier (system code: L) 196410 to Affy identifiers (system code: X) we
use:

<<eval=FALSE>>=
location <- getDatabase("Homo sapiens")
mapper <- loadDatabase(location)
map(mapper, "L", "196410", "X")
@

Mind you, this returns more than one identifier, as BridgeDb is generally a one to many mapping database.

\bibliographystyle{abbrv}
\bibliography{tutorial}

\end{document}