%\VignetteIndexEntry{AnnotationHub: A client package for retrieving data from the AnnotationHub web service} %\VignetteDepends{AnnotationHub} \documentclass[11pt]{article} \usepackage{Sweave} \usepackage[usenames,dvipsnames]{color} \usepackage{graphics} \usepackage{latexsym, amsmath, amssymb} \usepackage{authblk} \usepackage[colorlinks=true, linkcolor=Blue, urlcolor=Black, citecolor=Blue]{hyperref} %% Simple macros \newcommand{\code}[1]{{\texttt{#1}}} \newcommand{\file}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsl{#1}} \newcommand\R{\textsl{R}} \newcommand\Bioconductor{\textsl{Bioconductor}} \newcommand\Rpackage[1]{{\textsl{#1}\index{#1 (package)}}} \newcommand\Biocpkg[1]{% {\href{http://bioconductor.org/packages/devel/bioc/html/#1.html}% {\textsl{#1}}}% \index{#1 (package)}} \newcommand\Rpkg[1]{% {\href{http://cran.fhcrc.org/web/devel/#1/index.html}% {\textsl{#1}}}% \index{#1 (package)}} \newcommand\Biocdatapkg[1]{% {\href{http://bioconductor.org/packages/devel/data/experiment/html/#1.html}% {\textsl{#1}}}% \index{#1 (package)}} \newcommand\Robject[1]{{\small\texttt{#1}}} \newcommand\Rclass[1]{{\textit{#1}\index{#1 (class)}}} \newcommand\Rfunction[1]{{{\small\texttt{#1}}\index{#1 (function)}}} \newcommand\Rmethod[1]{{\texttt{#1}}} \newcommand\Rfunarg[1]{{\small\texttt{#1}}} \newcommand\Rcode[1]{{\small\texttt{#1}}} %% Question, Exercise, Solution \usepackage{theorem} \theoremstyle{break} \newtheorem{Ext}{Exercise} \newtheorem{Question}{Question} \newenvironment{Exercise}{ \renewcommand{\labelenumi}{\alph{enumi}.}\begin{Ext}% }{\end{Ext}} \newenvironment{Solution}{% \noindent\textbf{Solution:}\renewcommand{\labelenumi}{\alph{enumi}.}% }{\bigskip} \title{AnnotationHub: A client package for retrieving data from the AnnotationHub web service} \author{Marc Carlson} \SweaveOpts{keep.source=TRUE} \begin{document} \SweaveOpts{concordance=TRUE} \maketitle \section{\Robject{AnnotationHub} Objects} The \Rpackage{AnnotationHub} package provides a client interface to resources stored at the AnnotationHub web service. <>= library(AnnotationHub) @ The \Rpackage{AnnotationHub} package is straightforward to use. The 1st thing you need to do to make use of it is to create an \Robject{AnnotationHub} object like this: <>= ah = AnnotationHub() @ Now at this point you have already done everything you need in order to get annotations. If you know exactly what the resource you want is called (and where it can be found), you could get it right now by just tab completing to it using the \Rmethod{\$} operator. <>= path <- "ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData" @ Lets suppose that you knowd the following is the path to your data: \url{\Sexpr{path}} Simply tab completing to the above path (followed by hitting enter), as demonstrated below will actually retrieve an object and then assign its contents to a local variable called res. <>= res <- ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData @ As you can see it's pretty easy to get data out using \Robject{AnnotationHub} objects. The rest of this vignette is mostly about helping you to make sure you are accessing the version of \Robject{AnnotationHub} objects that you intend to use and also about making sure that you can filter down the huge number of objects to the few that you are really interested in. \section{Configuring \Robject{AnnotationHub} objects} When you create the \Robject{AnnotationHub} object, it will set up the object for you with some default settings. If you look at the object you will see some helpful information about it. <>= ah @ By default, you can see that the \Robject{AnnotationHub} object is set to the latest snapshotData and a snapshot version that matches the version of Bioconductor that you are using. You can also learn about these data with the appropriate methods. <>= snapshotVersion(ah) snapshotDate(ah) @ If you are interested in using an older version of a snapshot, you can list previous versions with the \Rmethod{possibleDates} like this: <>= pd <- possibleDates(ah) pd @ And then you can set the dates like this: <>= snapshotDate(ah) <- pd[1] @ \section{Exploring and setting filters for \Robject{AnnotationHub}} If you are interested in how many annotation resources are currently available for your \Robject{AnnotationHub} object, you can just take the length like this: <>= length(ah) @ Similarly, there are also methods to show the resource names, or even the full set of resource URLs for available resources. <>= names <- head(names(ah),n=1) @ <>= names @ \url{\Sexpr{names}} <>= urls <- head(snapshotUrls(ah),n=1) @ <>= urls @ \url{\Sexpr{urls}} For humans, the number of resources available is going to be overwhelming. How should we cut this data set down to size? For this task, we introduce filters. Every \Robject{AnnotationHub} object contains a list of filters that can be configured to control which resources it can return. By default this list is empty, which means you get everything. <>= filters(ah) @ How can we learn which things are available for filtering? For this we have defined \Rmethod{columns} and \Rmethod{keytypes} methods, which will list all the kinds of data that can be filtered on. <>= columns(ah) keytypes(ah) @ Once we know which things can be used to filter on, we can extract values that these things can be required to match. For this task, we have defined a \Rmethod{key} method. <>= head(keys(ah, keytype="Species")) @ Now we are able to construct and assign a filter to our \Robject{AnnotationHub} object. Lets set it up to only find resources from humans. <>= filters(ah) <- list(Species="Homo sapiens") @ And now if we look we will see that our \Robject{AnnotationHub} object is only exposing resources from Homo sapiens. <>= length(ah) names <- head(names(ah),n=1) @ <>= names @ \url{\Sexpr{names}} <>= urls <- head(snapshotUrls(ah),n=1) @ <>= urls @ \url{\Sexpr{urls}} We can also look at the \Robject{AnnotationHub} object in a broswer using the display function. We can then filter the \Robject{AnnotationHub} object for `Homo sapiens' by either using the Global search field on the top right corner of the page or the in-column search field for `Species'. <>= d <- display(ah) @ \begin{figure}[ht] \centering \includegraphics[width=\textwidth]{display.pdf} \caption{Displaying and filtering the Annnotation Hub object in a browser} \label{fig:dbtypes} \end{figure} By default 1000 entries are displayed per page, we can change this using the filter on the top of the page or navigate through different pages using the page scrolling feature at the bottom of the page. We can also select the rows of interest to us and send them back to the R session using 'Send Rows' button ; this sets a filter internally which filters the \Robject{AnnotationHub} object. \section{Using \Robject{AnnotationHub} to retrieve data} So now that we have our \Robject{AnnotationHub} object configured to expose only the data for humans how would we go about getting that data downloaded? As mentioned above, we can use the \Rmethod{\$} operator and tab completion to pull down a data source of interest like this. <>= path <- "ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData" @ \url{\Sexpr{path}} Just by using tab completion like this: <>= res <- ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData @ And once you have done this, you can look at the object stored in res and use it etc.. Any dependencies that you need to use this kind of object should automatically try to load at this time. <>= res @ Also, since you have previously downloaded this object at the start of this vignette, the 2nd time it should pull this object from a local cache that \Rpackage{AnnotationHub} will have created for you. This is a feature of \Rpackage{AnnotationHub} that is meant to provide better performance by removing the need to pull a large amount of data from a distant server every time. However, this does not mean that once you have used \Robject{AnnotationHub} to retrieve data that you no longer need to have internet access. This is because whenever you create a \Robject{AnnotationHub} object, it needs to talk to the metadata server to learn about things like the latest available version etc. So if you intend to access your objects on the plane you will need to either save them to a convenient location or else take note of where your local cache is located so that you can load them up manually later. \section{Session Information} <>= sessionInfo() @ \end{document}