---
title: "An introduction to biodbNcbi"
author: "Pierrick Roger"
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('biodbNcbi')`"
abstract: |
    How to use the NCBI Gene, CCDS, Pubchem Comp and Pubchem Subst connectors
    and their methods.
vignette: |
    %\VignetteIndexEntry{Introduction to the biodbNcbi package.}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
output:
    BiocStyle::html_document:
        toc: yes
        toc_depth: 4
        toc_float:
            collapsed: false
    BiocStyle::pdf_document: default
bibliography: references.bib
---

# Introduction

biodbNcbi is a *biodb* extension package that implements a connector to the
NCBI databases [@sayers2022_NCBI] Gene, CCDS [@pruitt2009_CCDS; @harte2012_CCDS;
@farrell2014_CCDS], Pubchem Comp and Pubchem Subst [@kim2015_PubChem].

# Installation

Install using Bioconductor:
```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install('biodbNcbi')
```

# Initialization

The first step in using *biodbNcbi*, is to create an instance of the biodb
class `Biodb` from the main *biodb* package. This is done by calling the
constructor of the class:
```{r, results='hide'}
mybiodb <- biodb::newInst()
```
During this step the configuration is set up, the cache system is initialized
and extension packages are loaded.

We will see at the end of this vignette that the *biodb* instance needs to be
terminated with a call to the `terminate()` method.

# Creating a connector to Gene

In *biodb* the connection to a database is handled by a connector instance that
you can get from the factory.
biodbNcbi implements a connector to a remote database.
Here is the code to instantiate a connector:
```{r}
gene <- mybiodb$getFactory()$createConn('ncbi.gene')
```

Creating other connectors follow the same process:
```{r}
ccds <- mybiodb$getFactory()$createConn('ncbi.ccds')
pubchem.comp <- mybiodb$getFactory()$createConn('ncbi.pubchem.comp')
pubchem.subst <- mybiodb$getFactory()$createConn('ncbi.pubchem.subst')
```

# Accessing entries

To get the number of entries stored inside the database, run:
```{r}
gene$getNbEntries()
```

To get some of the first entry IDs (accession numbers) from the database, run:
```{r}
ids <- gene$getEntryIds(2)
ids
```

To retrieve entries, use:
```{r}
entries <- gene$getEntry(ids)
entries
```

To convert a list of entries into a dataframe, run:
```{r}
x <- mybiodb$entriesToDataframe(entries)
x
```

# Accessing efetch web service

**efetch** web service is accessible through the `wsEfetch()` method, available
on Entrez connectors: `ncbi.gene`, `ncbi.pubchem.comp` and `ncbi.pubchem.subst`.

Get the a Gene entry as an XML object and print the `Entrezgene_prot` node:
```{r}
entryxml <- gene$wsEfetch('2833', retmode='xml', retfmt='parsed')
XML::getNodeSet(entryxml, "//Entrezgene_prot")
```
The object returned is an `XML::XMLInternalDocument`.

# Accessing esearch web service

**esearch** web service is accessible through the `wsEsearch()` method,
available on Entrez connectors: `ncbi.gene`, `ncbi.pubchem.comp` and
`ncbi.pubchem.subst`.

Search for Gene entries by name and get the IDs of the matching entries
(equivalent of running `gene$searchForEntries()`:
```{r}
gene$wsEsearch(term='"chemokine"[Gene Name]', retmax=10, retfmt='ids')
```

The same result can be obtained with a call to `searchForEntries()`:
```{r}
gene$searchForEntries(fields=list(name='chemokine'), max.results=10)
```


# Accessing einfo web service

**einfo** web service is accessible through the `wsEinfo()` method, available
on Entrez connectors: `ncbi.gene`, `ncbi.pubchem.comp` and `ncbi.pubchem.subst`.

Get PubChem Comp database information as an XML object and print information on
first field:
```{r}
infoxml <- pubchem.comp$wsEinfo(retfmt='parsed')
XML::getNodeSet(infoxml, "//Field[1]")
```

# Closing biodb instance

When done with your *biodb* instance you have to terminate it, in order to
ensure release of resources (file handles, database connection, etc):
```{r}
mybiodb$terminate()
```

# Session information

```{r}
sessionInfo()
```

# References