%\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Query DGIdb using R} <>= knitr::opts_chunk$set(cache=FALSE, echo=FALSE, eval=TRUE) @ \documentclass{article} <>= BiocStyle::latex() @ \bioctitle[Query DGIdb using R]{A wrapper to query DGIdb using R} \author{Thomas Thurnherr\footnote{thomas.thurnherr@gmail.com}, Franziska Singer, Daniel J. Stekhoven, Niko Beerenwinkel} \begin{document} %\SweaveOpts{concordance=TRUE} \maketitle \begin{abstract} Annotation and interpretation of DNA aberrations identified through next-generation sequencing is becoming an increasingly important task, especially in the context of data analysis pipelines for medical applications, where aberrations are associated with phenotypic and clinical features. A possible approach for annotation is to identify drugs as potential targets for aberrated genes or pathways. DGIdb accumulates data from 15 different gene-target interaction resources and allows querying these through their web interface as well as public API. rDGIdb is a wrapper to query DGIdb using R/Bioconductor. The package provides its output in a similar format as the web interface, and thereby allows integration of DGIdb queries into bioinformatic pipelines. \end{abstract} \tableofcontents \section{Standard workflow} To query DGIdb \cite{Wagner2015}, we first load the package. <>= library(rDGIdb) @ Next, we prepare a list of genes for which we want to query drug targets. If you already have a list of genes with variants, these can either be loaded from a file or typed manually. Genes have to be provided as a character vector. Here, we purposely use a non-existant gene name \verb+XYZA+ for illustration. <>= genes <- c("TNF", "AP1", "AP2", "XYZA") @ With a vector of genes, we can query DGIdb using the \verb+queryDGIdb()+ function. The argument \verb+genes+ is a required argument, all other arguments are optional. These optional arguments are used as filters. If they are not provided, the query returns all results for a specific gene. See further below for more details on optional arguments. <>= result <- queryDGIdb(genes) @ \subsection{Accessing query results} After querying, we access the \verb+rDGIdbResult+ object that was returned by \verb+queryDGIdb+. The S4 class \verb+rDGIdbResult+ contains several tables (\verb+data.frame+), which roughly reflect each result tab on the DGIdb web interface at \url{https://dgidb.org}. The results are available in the following four formats: \vspace{\baselineskip} \begin{description} \item[Result summary] Drug-gene interactions summarized by the source(s) that reported them.\\ \item[Detailed results] Search terms matching exactly one gene that has one or more drug interactions.\\ \item[By gene] Drug interaction count and druggable categories associated with each gene.\\ \item[Search term summary] Summary of the attempt to map gene names supplied by the user to gene records in DGIdb. \end{description} \vspace{\baselineskip} The results can be accessed through helper functions. <>= ## Result summary resultSummary(result) ## Detailed results detailedResults(result) ## By gene byGene(result) ## Search term summary searchTermSummary(result) @ \subsection{Using optional query arguments} There are three optional arguments to \verb+queryDGIdb+. <>= queryDGIdb(genes = genes, sourceDatabases = NULL, geneCategories = NULL, interactionTypes = NULL) @ The package provides helper functions to list possible values for these optional arguments. <>= ## Available source databases sourceDatabases() ## Available gene categories geneCategories() ## Available interaction types interactionTypes() @ %' If you prefer to provide all arguments, we can also use these functions to %' query without filtering source databases, gene categories, or interaction %' types. %' %' <>= %' queryDGIdb(genes, %' sourceDatabases = sourceDatabases(), %' geneCategories = geneCategories(), %' interactionTypes = interactionTypes()) %' @ Perhaps we are only intersted in a subset of available source databases, gene categories, or interaction types. Using the list above, we can make the query more specific by adding filters. For example, with a given set of genes, we only want interactions from \verb+DrugBank+ and \verb+MyCancerGenome+. In addition, genes have to have the attribute: \verb+clinically actionable+; and interactions have to show at least one of these labels: \verb+suppressor+, \verb+activator+, or \verb+blocker+. We can query DGIdb using the function call below. <>= resultFilter <- queryDGIdb(genes, sourceDatabases = c("DrugBank","MyCancerGenome"), geneCategories = "CLINICALLY ACTIONABLE", interactionTypes = c("suppressor","activator","blocker")) @ In case no gene-drug interaction satisfies these conditions, the result is returned empty. \subsection{Basic visualization of results} The package also provides basic visualization functionality for query results. \verb+plotInteractionsBySource+ generates a bar plot that shows how many interactions were reported for each source. As input the function requires the query result object of class \verb+rDGIdbResult+. Additional arguments are passed to the \verb+barplot+. <>= h <- '0.5\\linewidth' w <- '0.7\\linewidth' c <- 'center' @ <>= plotInteractionsBySource(result, main = "Number of interactions by source") @ \subsection{Version numbers of DGIdb resources} DGIdb may no use the latest version of each resources it integrates. The current version numbers of all resources can be printed using \verb+resourceVersions+. <>= resourceVersions() @ \subsection{Input in VCF file format} From a variant call format (VCF) file, variants can be annotated within R using the variant annotation workflow provided by the \verb+VariantAnnotation+ package from Bioconductor \cite{Obenchain15072014}. For more information on how to filter variants, please see the package documentation/vignette. <>= library("VariantAnnotation") library("TxDb.Hsapiens.UCSC.hg19.knownGene") library("org.Hs.eg.db") vcf <- readVcf("file.vcf.gz", "hg19") seqlevels(vcf) <- paste("chr", seqlevels(vcf), sep = "") rd <- rowRanges(vcf) loc <- locateVariants(rd, TxDb.Hsapiens.UCSC.hg19.knownGene, CodingVariants()) symbols <- select(x = org.Hs.eg.db, keys = mcols(loc)$GENEID, columns = "SYMBOL", keytype = "ENTREZID") genes <- unique(symbols$SYMBOL) @ \section{How to get help} Please consult the package documentation first. <>= ?queryDGIdb ?rDGIdbFilters ?rDGIdbResult ?plotInteractionsBySource @ If this does not solve your problem, we are happy to help you. Questions regarding the rDGIdb package should be posted to the Bioconductor support site: \url{https://support.bioconductor.org}, which serves as a repository of questions and answers. This way, other people can benefit from questions and corresponding answers, which minimizes efforts by the developers. \section{Session info} <>= toLatex(sessionInfo()) @ \section{Citing this package} If you use this package for published research, please cite the package as well as DGIdb. <>= citation('rDGIdb') @ \bibliography{references} \end{document}