--- title: "UniProt.ws: A package for retrieving data from the UniProt web service" author: "Marc Carlson" date: "`r format(Sys.time(), '%B %d, %Y')`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{ UniProt.ws: A package for retrieving data from the UniProt web service } %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r,include=FALSE,results="hide",message=FALSE,warning=FALSE} library(BiocStyle) ``` # `UniProt.ws` ```{r,eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("UniProt.ws") ``` ## Configuring `UniProt.ws` The `r Biocpkg("UniProt.ws")` package provides a `select` interface to the UniProt web service. ```{r loadPkg} suppressPackageStartupMessages({ library(UniProt.ws) }) up <- UniProt.ws(taxId=9606) ``` If you already know about the select interface, you can immediately learn about the various methods for this object by just looking it's the help page. ```{r help,eval=FALSE} help("UniProt.ws") ``` When you load the `r Biocpkg("UniProt.ws")` package, it creates a `UniProt.ws` object. If you look at the object you will see some helpful information about it. ```{r show} up ``` By default, you can see that the `UniProt.ws` object is set to retrieve records from Homo sapiens. But you can change that of course. In order to change it, you first need to look up the appropriate taxonomy ID for the species that you are interested in. Uniprot provides support for over 20 thousand species, so there are a few to choose from! In order to make this easier, we have provided the helper function `availableUniprotSpecies` which will list all the supported species along with their taxonomy ids. When you call the `availableUniprotSpecies` function, it's recommended that you make use of the pattern argument to limit your queries like this: ```{r availSpecies} availableUniprotSpecies(pattern="musculus") ``` Once you have learned the taxonomy ID for the species of interest, you can then change the taxonomy id for the `UniProt.ws` object using `taxId` setter or by calling the constructor for `UniProt.ws` ```{r setTaxID} mouseUp <- UniProt.ws(10090) mouseUp ``` As you can see the species is different for the `mouseUp` new object. ## Using `UniProt.ws` Once you are safisfied that you have an `uniport.ws` that is using the appropriate organsims, you can make use of the standard set of methods in a `select` interface. Specifically: `columns`, `keytypes`, `keys` and `select`. You will probably notice that there are a large number of columns that can be retrieved. ```{r columns} head(keytypes(up)) ``` And most (but not all) of these fields can also be used as keytypes. ```{r keytypes} head(columns(up)) ``` If necessary you can also look up the keys of a given type. But please be warned that the web service is slow at this particular kind of lookup. So if you really want to do this kind of operation you are probably going to want to save the result to your R session. ```{r keys,eval=FALSE} egs <- keys(up, "GeneID") ``` Finally, you can loop up whatever combinations of columns, keytypes and keys that you need when using `select`. *Note*. 'ENTREZ_GENE' is now 'GeneID' ```{r select} keys <- c("1","2") columns <- c("xref_pdb", "xref_hgnc", "sequence") kt <- "GeneID" res <- select(up, keys, columns, kt) res ``` ## `sessionInfo()` ```{r