%\VignetteIndexEntry{How to write recipes for new resources for the AnnotationHub} %\VignetteDepends{AnnotationHub} \documentclass[11pt]{article} \usepackage{Sweave} \usepackage[usenames,dvipsnames]{color} \usepackage{graphics} \usepackage{latexsym, amsmath, amssymb} \usepackage{authblk} \usepackage[colorlinks=true, linkcolor=Blue, urlcolor=Black, citecolor=Blue]{hyperref} %% Simple macros \newcommand{\code}[1]{{\texttt{#1}}} \newcommand{\file}[1]{{\texttt{#1}}} \newcommand{\software}[1]{\textsl{#1}} \newcommand\R{\textsl{R}} \newcommand\Bioconductor{\textsl{Bioconductor}} \newcommand\Rpackage[1]{{\textsl{#1}\index{#1 (package)}}} \newcommand\Biocpkg[1]{% {\href{http://bioconductor.org/packages/devel/bioc/html/#1.html}% {\textsl{#1}}}% \index{#1 (package)}} \newcommand\Rpkg[1]{% {\href{http://cran.fhcrc.org/web/devel/#1/index.html}% {\textsl{#1}}}% \index{#1 (package)}} \newcommand\Biocdatapkg[1]{% {\href{http://bioconductor.org/packages/devel/data/experiment/html/#1.html}% {\textsl{#1}}}% \index{#1 (package)}} \newcommand\Robject[1]{{\small\texttt{#1}}} \newcommand\Rclass[1]{{\textit{#1}\index{#1 (class)}}} \newcommand\Rfunction[1]{{{\small\texttt{#1}}\index{#1 (function)}}} \newcommand\Rmethod[1]{{\texttt{#1}}} \newcommand\Rfunarg[1]{{\small\texttt{#1}}} \newcommand\Rcode[1]{{\small\texttt{#1}}} %% Question, Exercise, Solution \usepackage{theorem} \theoremstyle{break} \newtheorem{Ext}{Exercise} \newtheorem{Question}{Question} \newenvironment{Exercise}{ \renewcommand{\labelenumi}{\alph{enumi}.}\begin{Ext}% }{\end{Ext}} \newenvironment{Solution}{% \noindent\textbf{Solution:}\renewcommand{\labelenumi}{\alph{enumi}.}% }{\bigskip} \title{Adding new resources to AnnotationHub.} \author{Marc Carlson} \SweaveOpts{keep.source=TRUE} \begin{document} \SweaveOpts{concordance=TRUE} \maketitle \section{Overview of the process} If you are reading this it is (hopefully) because you intend to write some code that will allow the processing of online resources into R objects that are to be made available via that the \Rpackage{AnnotationHub} package. In order to do this you will have to do three basic steps (outlined below). These steps will have you writing two functions and then calling a third function to do some automatic set up for you. The 1st function will contain instructions on how to process data that is stored online into metadata for describing your new R resources for the AnnotationHub. And the 2nd function is for describing how to take these online resources and transform them into an R object that is useful to end users. \section{Introducing \Robject{AnnotationHubMetadata} Objects} The \Rpackage{AnnotationHubData} package is a complementary package to the \Rpackage{AnnotationHub} package that provides a place where we can store code that processes online resources into R objects suitable for access through the \Rpackage{AnnotationHub} package. But before you can understand the requirements for this package it is important that you 1st learn about the objects that are used as intermediaries between the hub and its web based repository behind the scenes. That means that you need to know about \Robject{AnnotationHubMetadata} objects. These objects store the metadata that describes an online resource. And if you want to see a set of online resources added to the repository and maintained, then it will be necessary to become familiar with the \Rfunction{AnnotationHubMetadata} constructor. For each online resource that you want to process into the AnnotationHub, you will have to be able to construct an \Rfunction{AnnotationHubMetadata} object that describes it in detail and that specifies where the recipe function lives. \section{Step 1: Writing your \Robject{AnnotationHubMetadata} generating function} The 1st function you need to provide is one that processes some online resources into \Robject{AnnotationHubMetadata} objects. This function MUST return a list of \Robject{AnnotationHubMetadata} objects. It can rely on other helper functions that you define, but ultimately it (and it's helpers) need to contain all of the instructions needed to find resources and process those resources into \Robject{AnnotationHubMetadata} objects. The following example function takes files from the latest release of inparanoid and processes them into \Robject{AnnotationHubMetadata} objects using Map. The calling of the Map function is really the important part of this function, as it shows the function creating a series of \Robject{AnnotationHubMetadata} objects. Prior to that, the function was just calling out to other helper functions in order to process the metadata so that it could be passed to the \Robject{AnnotationHubMetadata} constructor using Map. Notice how one of the fields specified by this function is the Recipe, which indicates both the name and location of the recipe function. We expect most people will want to submit their recipe to the same package as they are submitting their metadata processing function. <>= makeinparanoid8ToAHMs <- function(currentMetadata){ baseUrl <- 'http://inparanoid.sbc.su.se/download/current/Orthologs' ## Make list of metadata in a helper function meta <- .inparanoidMetadataFromUrl(baseUrl) ## then make AnnotationHubMetadata objects. Map(AnnotationHubMetadata, Description=meta$description, Genome=meta$genome, SourceFile=meta$sourceFile, SourceUrl=meta$sourceUrl, SourceVersion=meta$sourceVersion, Species=meta$species, TaxonomyId=meta$taxonomyId, Title=meta$title, RDataPath=meta$rDataPath, MoreArgs=list( Coordinate_1_based = TRUE, DataProvider = baseUrl, Maintainer = "Marc Carlson ", RDataClass = "SQLiteFile", RDataDateAdded = Sys.time(), RDataVersion = "0.0.1", Recipe = c("inparanoid8ToDbsRecipe", package="AnnotationHubData"), Tags = c("Inparanoid", "Gene", "Homology", "Annotation"))) } @ \section{Step 2: Writing your recipe} The 2nd kind of function you need to write is called a recipe function. It always must take an single argument that must be an \Robject{AnnotationHubMetadata} object. The job of a recipe function is to use the metadata from an \Robject{AnnotationHubMetadata} object to produce an R object or data file that will be retrievable from the AnnotationHub service later on. Below is a recipe function that calls some helper functions to generate an inparanoid database object from the metadata stored in it's \Robject{AnnotationHubMetadata} object. <>= inparanoid8ToDbsRecipe <- function(ahm){ require(AnnotationForge) inputFiles <- metadata(ahm)$SourceFile dbname <- makeInpDb(dir=file.path(inputFiles,""), dataDir=tempdir()) db <- loadDb(file=dbname) outputPath <- file.path(metadata(ahm)$AnnotationHubRoot, metadata(ahm)$RDataPath) saveDb(db, file=outputPath) outputFile(ahm) } @ \section{Step 3: Calling the \Rfunction{makeAnnotationHubResource} helper} Finally you will need to call the \Rfunction{makeAnnotationHubResource} function to do some setup. This function only has two required arguments. The 1st is basically the name of a class that describes the kind of resource you are writing code to import. It just needs to be a unique name and will be used internally to create a class for dispatch. The 2nd argument is the name of your metadata processing function from step one. Once you have finished this, the only step left is to export the class name into the NAMESPACE (rememeber that this is that string you are providing as your 1st argument), and then add this code to the \Rpackage{AnnotationHubData} repository. We are planning to set up a bridge to github so that you can give us a pull request. <>= makeAnnotationHubResource("Inparanoid8ImportPreparer", makeinparanoid8ToAHMs) @ \section{Session Information} <>= sessionInfo() @ \end{document}