--- title: "Easy API: A user-friendly cascading API" output: BiocStyle::html_document: toc: true number_sections: true highlight: haddock css: style.css includes: in_header: logo.md --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown(css.files = "custom.css") ``` ```{r code, echo = FALSE} code <- function(...) { cat(paste(..., sep = "\n")) } code2 <- function(...) { cat(paste("```markdown", ..., "\n", "```", sep = "\n")) } ``` ```{r global_options, include=FALSE} knitr::opts_chunk$set(eval = FALSE) ``` # Introduction This "easy API" is built on lower level R API function calls provided by this package, (please check the other tutorial and manual) but with a more user friendly interface that provides - Simpler cascading style from a _Auth_ object or its children, for example ```{r, eval = FALSE} ## download a file a$project("wgs")$file("17.vcf")$download() ## upload a file a$project("wgs")$upload("sample.tgz") ``` - Exact matching or fuzz matching by names and id. - Actions for each different objects - Nice print format than a long list - Return an R5 object in stead of a list - Extensible with more actions We defined several different objects easy to manipulate, including _Auth_, _Project_, _Task_, _File_, _Upload_, _Member_, etc, each comes with their own methods to access the server, send request and get a response. To understand the cascading style, you has to under stand the structure of the SBG platforms. - Each account could generate one auth token, it's our top level object. - Each account has several different projects represented by _Project_ object - Each account had different billing group and each project is assigned to particular billing group in order to run on AWS cloud. - Each project is composed with files (_File_ object), tasks (_Task_ object), pipelines (_Pipeline_ object) and members or users (_Member_ object) with different permission level. - Each member has its own permission to particular project. - A task is a scheduled pipeline job with files and parameters. So our cascading visually represent this relationship and structure, if you are familiar with SBG platform, you will find this interface very easy to understand. To understand the API better, you can also check the original SBG developer hub [documents](https://docs.sbgenomics.com/display/developerhub/API) ## Cheatsheet This cheat sheet provide main API designed for end users, please note "single" is not real project name, just means the cascading method expecting a single returned object not a list, so you name better has a single hit or you pick the one you want from the list. Remember the logic structure and relationship of those concept will help you remember the API easily. For example, a project has members so of course you can run member() method on project object. For a searching function that need name or id, when both are empty, they will return every object (file, project, pipeline, task, member) in a list. These function usually has optional parameter called `ignore.case` and `exact` used for matching, when `exact = FALSE` you can use pattern to grep objects. The cheat sheet only show some most used arguments, please check their help function to read more details. ```{r, eval = FALSE} a <- Auth() a$billing() a$project_new(name = , description = , billing_group_id = ) a$project(repos = c("public", "my", "project"), project_name = , project_id = ) ## to save some typing p <- a$project("single") p$delete() p$member(name = , id = ) p$member_add() p$member_update() p$member_delete() p$file(name = , id = ) p$file("single")$download() p$upload(file = , metadata = ) p$pipeline(name = , id = ) p$pipeline_add() p$task(name = , id = ) p$task_run() p$task("single")$abort() p$task("single")$monitor(time = ) ``` So you can do a cascading like this, it reads like "I want download the file sample1.tgz in project API under account a" ```{r} a$project("API")$file("sample1.tgz")$download("~/Desktop/") ``` Please play the main function yourself, we are not covering all the function in this tutorial. Also some users may notice the object actually has lower level API binding as well for example, you can call `a$pipeline_list_project()` but those are used internally not supposed to be used by end users. user can just call `a$prject("public")` # Quick start: running the FastQC pipeline I will go through the same tutorial again, but this time use a more simpler cascading API. ## Create an Auth object First thing you do is to get your authentication token following this [tutorial](https://docs.sbgenomics.com/display/sbg/Managing+Account+Settings#ManagingAccountSettings-GenerateanAuthenticationToken), then create a _Auth_ object which will be our master object for request and actions. By default we are using SBG US platform API URL "https://api.sbgenomics.com/1.1/", you can also pass another URL, for example, NCI cancer cloud API URL. Now let's use some fake auth token and load the library. ```{r, eval = TRUE} library(sbgr) a <- Auth("aef7e9e3f6c54fb1b338ac4ecddf1a56") a ``` ## Billing group Billing group specify how your computation charges, when you first registered, you should have some free credit, with billing group called something like "Free Account". Or maybe your company/organization purchased a license, you should have other billing group attached as well To show how many billing group you belong too, just call _billing_ function ```{r} b <- a$billing() b ``` ## Create a project Let's first list all projects you have under your account ```{r} a$project() ``` You can search it by `name` or by `id`, __note__: `id` is unique, but `name` may have multiple hits, by default, we are using exact matching, only return it when it has single hits. There are two additional parameters to control it `exact` and `ignore.case`. ```{r} a$project(name = "my first") # return one matching ``` Maybe you want to create another project called "API" just for testing, but make sure you pass the required field `name`, `description` and `billing_group_id`. ```{r} a$project_new(name = "API", description = "API tutorial", billing_group_id = b[[1]]$id) ``` ## Upload file and set metadata To save your typing/lines for cascading style, I want to save the project I am gonna use for this tutorial to a Project object first. ```{r} ## get the project we just created p <- a$project("API") ``` To list all files in that project and search for a file, it's also easy ```{r} a$project("my first")$file() a$project("my first")$file("illumina", exact = FALSE) ``` Get the file we are going to upload and get the metadata ready as well. ```{r} fl <- system.file("extdata", "sample1.fastq", package = "sbgr") ## create meta data fl.meta <- list(file_type = "fastq", seq_tech = "Illumina", sample = "sample1", author = "tengfei") ``` To upload the file to the project, just call `upload` method from project object, it will initialize the multiparts automatically, checking each part and complete the call when finished. You will see the progress bar. ```{r} p$upload(fl, metadata = fl.meta) ``` ``` Initialized |=============================================================================| 100% == File == id : 55c90c73e4b01cacdc4fbf64 name : sample1.fastq size : 16 -- metadata -- file_type : fastq seq_tech : Illumina sample : sample1 author: tengfei ``` To check if it's uploaded successfully, just check the file on the server to see if it exists or not. ```{r} p$file(basename(fl)) ``` ### Keep playing with file metadata Metadata is designed with fixed fields in the GUI with fixed enum types for some fields, like file_type, to check what's fixed, please do ```{r, eval = TRUE} Metadata() Metadata()$file_type levels(Metadata()$file_type) ``` To add more metadata items, you can pass more named list entries, and to keep original turn on parameter `append`. ```{r} p$file(basename(fl))$set_metadata(list(version = "2.0"), append = TRUE) ``` Read more about metadata please visit SBG [developer hub page](https://docs.sbgenomics.com/display/sbg/Metadata) ### Delete a file To delete a file, just call `delete` method on a _File_ object. for example. ```{r} p$file(basename(fl))$delete() ``` ## Get pipelines information ```{r} p$pipeline() ``` It's a new project, there is no pipeline yet, there are several other ways to check out available pipelines for public, my pipeline and pipelines in particular project. ```{r} ## project pipelines a$project("my first")$pipeline() ## all public pipeline a$pipeline() ## my pipeline a$pipeline("my") ## particular project a$pipeline(project_name = "my first") ``` Let's look for a pipeline about "FastQC", from public repos ```{r} f.pipe <- a$pipeline(pipeline_name = "FastQC") f.pipe ``` Cool, we got a unique hit, let's copy the FastQC pipeline to the new project ```{r} p$pipeline_add(pipeline_name = f.pipe$name) ``` To confirm ```{r} f.pipe <- p$pipeline(name = "FastQC") f.pipe ``` ## Run a task with the pipeline Please call `detail` function to check the required inputs, here I quote the SBG API [introduction](https://docs.sbgenomics.com/display/developerhub/API%3A+Pipelines#API:Pipelines-Getpipelinedetails:GET/project/:project_id/pipeline/:pipeline_id) - **id**: The ID number of the pipeline, used by the API to uniquely refer to it - **revision**: The revision of the pipeline - **name**: The human-readable name of the pipeline - **description**: A short description of the pipeline's function - **inputs**: An array listing the pipeline's input nodes, i.e. those pipeline apps which require files as input - **nodes**: An array of intermediary nodes, representing the pipeline apps which transform the data. These may have editable parameters that should be set before running the pipeline, although in this particular pipeline example no parameters are editable. - **outputs**: An array of output nodes, producing files, reports, visualizations, etc. ```{r} f.pipe$details() f.pipe$details()$inputs f.pipe$details()$nodes ``` To run the pipeline, we need to provide task details, we know we want to input our upload sample file as input node here. So we need to link our file id with the input node id. ```{r} f.task <- p$task_run(name = "my task", description = "A text description", pipeline_id = f.pipe$id, inputs = list( "177252" = list(f.file$id) )) f.task ``` ## Monitor your task Pass the interval time (in seconds) this will check status of a task, print finish when it's finished. ```{r} f.task$monitor(30) ``` Or simple call `details` method ```{r} f.task$details() ``` ## Abort a task ```{r} f.task$abort() ``` ## Download the results ```{r} f.task <- p$task(id = "fed9279b-8677-4eef-9c16-9135a457f53f") f.task$download("~/Desktop/") ``` ## Wrap the example up Ok the full script is something like ```{r, eval = FALSE} library(sbgr) token <- "aef7e9e3f6c54fb1b338ac4ecddf1a56" a <- Auth(token) ## get billing info b <- a$billing() ## create project a$project_new(name = "API", description = "API tutorial", billing_group_id = b[[1]]$id) p <- a$project("API") ## get data fl <- system.file("extdata", "sample1.fastq", package = "sbgr") ## create meta data fl.meta <- list(file_type = "fastq", seq_tech = "Illumina", sample = "sample1", author = "tengfei") ## upload data with metadata p$upload(fl, metadata = fl.meta) ## check uploading success f.file <- p$file(basename(fl)) ## get the pipeline from public repos f.pipe <- a$pipeline(pipeline_name = "FastQC") ## copy the pipeline to your poject p$pipeline_add(pipeline_name = f.pipe$name) ## get the pipeline from your project not public one f.pipe <- p$pipeline(name = "FastQC") ## check the inputs needed for running tasks f.pipe$details() ## Ready to run a task? go f.task <- p$task_run(name = "my task", description = "A text description", pipeline_id = f.pipe$id, inputs = list( "177252" = list(f.file$id) )) f.task$run() ## or you can just run with Task constructor f.task <- Task(auth = Auth(token), name = "my task", description = "A text description", pipeline_id = f.pipe$id, project_id = p$id, inputs = list( "177252" = list(f.file$id) )) ## Monitor you task f.task$monitor(30) ## download a task output files f.task <- p$task("my task") f.task$download("~/Desktop/") ## Abort the task f.task$abort() ```