--- title: "Generating HCAMatrix queries with the API" date: "`r BiocStyle::doc_date()`" vignette: | %\VignetteIndexEntry{HCAMatrix API Queries} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: toc_float: true Package: HCAMatrixBrowser --- # Installation Pull the development branch from `Bioconductor/HCAMatrixBrowser` ```{r,eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE) install.packages("BiocManager") BiocManager::install("HCAMatrixBrowser") ``` # Load packages ```{r,include=TRUE,results="hide",message=FALSE,warning=FALSE} library(HCAMatrixBrowser) library(rapiclient) library(AnVIL) ``` Create an HCA api object using the `rapiclient` package: ```{r} (hca <- HCAMatrix()) ``` # Schemas Check that the schemas are available as found in the api configuration file: __Note__. Due to downgrade from Open API Specification (OAS) version 3 to OAS version 2 (formerly known as Swagger), the full list of schemas may not be shown. ```{r} schemas(hca) ``` # Exploring the endpoints Here we make available a number of helper functions that allow the user to explore the possible queries supported by the HCA Matrix Service data model. ## Filters ```{r} available_filters(hca) ``` We can get details for a particular filter: ```{r} filter_detail(hca, "genes_detected") ``` ## Formats We can obtain the available formats for any request by doing: ```{r} available_formats(hca) ``` We can get additional details (via HTML pop-up page): ```{r,eval=FALSE} format_detail(hca, "mtx") ``` ## Features The matrix service provides two types of features (i.e., genes and transcripts): ```{r} available_features(hca) ``` and a short description of a selected feature: ```{r} feature_detail(hca, "gene") ``` # Request generation from schema Using the `rapiclient` package, we can obtain schema or models for queries: ```{r} bundle_fqids <- c("980b814a-b493-4611-8157-c0193590e7d9.2018-11-12T131442.994059Z", "7243c745-606d-4827-9fea-65a925d5ab98.2018-11-07T002256.653887Z") req <- schemas(hca)$v0_MatrixRequest( bundle_fqids = bundle_fqids, format = "loom" ) req ``` For requests such as these, it is better to use the API layer provided by `HCAMatrixBrowser` (see examples below). # Python example The HCA group has provided example JSON requests written in python: The published notebook is available here: https://github.com/HumanCellAtlas/matrix-service/blob/master/docs/HCA%20Matrix%20Service%20to%20Scanpy.ipynb The following JSON request applies a couple of filters on the project short name and the number of genes detected. The provided example request looks like: ```{r} jsonlite::fromJSON( txt = '{"filter": {"op": "and", "value": [ {"op": "=", "value": "Single cell transcriptome analysis of human pancreas", "field": "project.project_core.project_short_name"}, {"op": ">=", "value": 300, "field": "genes_detected"} ] }}' ) ``` # HCAMatrixBrowser example *Advanced usage note*. We are interested in using the following endpoint: ```{r} hca$matrix.lambdas.api.v1.core.post_matrix ``` The `HCAMatrixBrowser` allows filtering using a non-standard evaluation method typical of the 'tidyverse'. Here we recreate the filters provided in the example above. ```{r} hcafilt <- filter(hca, project.project_core.project_short_name == "Single cell transcriptome analysis of human pancreas" & genes_detected >= 300) ``` We can subsequently view the generated filter from the operation: ```{r} filters(hcafilt) ``` # Query send-off In order to send the query with the appropriate filters, we simply use the provided `loadHCAMatrix` function along with the API object that contains the filter structure. See the following section for more information. ## Data representations The matrix service allows you to request three different file formats: 1. loom (default) 2. mtx 3. csv These can be requested using the `format` argument in the `loadHCAMatrix` function: ### loom The `loom` format is supported by the `LoomExperiment` package in Bioconductor. ```{r} (loomex <- loadHCAMatrix(hcafilt, format = "loom")) ``` ### mtx For the `mtx` format, we represent the data as a `SingleCellExperiment` class: ```{r} (mtmat <- loadHCAMatrix(hcafilt, format = "mtx")) ``` ### csv The `csv` format option gives the user the option to obtain the data as a list of `tibble` data.frames from the output of the `readr` package in the 'tidyverse'. **Note**. Loading multiple CSV files may take considerable time depending on system configuration. ```{r} (tib <- loadHCAMatrix(hcafilt, format = "csv")) ``` # Advanced Usage To use a lower level API endpoint, we can use the API object to select a particular endpoint. This will usually show you a description of the endpoint and the parameters for the query. ```{r} hca$matrix.lambdas.api.v0.core.post_matrix ``` In the above example, there are three parameters required: * `bundle_fqids` * `bundle_fqids_url` * `format`