--- title: "Working With Human Cell Atlas Manifests" author: "Maya Reed McDaniel" date: "September 2nd, 2021" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Working With Human Cell Atlas Manifests} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE ) ``` # Motivation & Introduction The purpose of this vignette is to explore the file manifests available from the [Human Cell Atlas][] project. These files provide a metadata summary for a collection of files in a tabular format, including but not limited to information about process and workflow used to generate the file, information about the specimens the file data were derived from, and identifiers connect specific projects, files, and specimens. The [WARP][] (WDL Analysis Research Pipelines) repository contains information on a variety of pipelines, and can be used alongside a manifest to better understand the metadata. [Human Cell Atlas]: https://data.humancellatlas.org/ [WARP]: https://broadinstitute.github.io/warp/docs/get-started ## Installation and getting started Evaluate the following code chunk to install packages required for this vignette. ```{r install, eval = FALSE} ## install from Bioconductor if you haven't already pkgs <- c("LoomExperiment", "hca") pkgs_needed <- pkgs[!pkgs %in% rownames(installed.packages())] BiocManager::install(pkgs_needed) ``` Load the packages into your _R_ session. ```{r setup, message = FALSE} library(dplyr) library(SummarizedExperiment) library(LoomExperiment) library(hca) ``` # Example: manifests The manifest for all files available can be obtained with ```{r, eval = FALSE} default_manifest_tbl <- hca::manifest() default_manifest_tbl ``` This is seldom useful; instead, create a filter identifying the files of interest. ```{r} manifest_filter <- hca::filters( projectId = list(is = "4a95101c-9ffc-4f30-a809-f04518a23803"), fileFormat = list(is = "loom"), workflow = list(is = c("optimus_v4.2.2", "optimus_v4.2.3")) ) ``` Retrieve the manifest ```{r} manifest_tibble <- hca::manifest(filters = manifest_filter) manifest_tibble ``` And perform additional filtering, e.g., identifying the specimen organs represented in the files. ```{r} manifest_tibble |> dplyr::count(specimen_from_organism.organ) ``` # Example: Using manifest data to select files - view the files described in `manifest_tibble` and select one for download ```{r} manifest_tibble ``` - select a file for which more than one specimen contributes ```{r} file_uuid <- "24a8a323-7ecd-504e-a253-b0e0892dd730" ``` - obtain the `file_hca_tbl` for the file based on it's uuid ```{r} file_filter <- hca::filters( fileId = list(is = file_uuid) ) file_tbl <- hca::files(filters = file_filter) file_tbl ``` - download the file and obtain it's file path ```{r} file_location <- file_tbl |> hca::files_download() file_location ``` - import the file as a `LoomExperiment` object ```{r} loom <- LoomExperiment::import(file_location) metadata(loom) |> dplyr::glimpse() colData(loom) |> dplyr::as_tibble() |> dplyr::glimpse() ``` # Example: Using manifest data to annotate a `.loom` file The function `optimus_loom_annotation()` takes in the file path of a `.loom` file generated by the [Optimus pipeline][] and returns a `LoomExperiment` object whose `colData` has been annotated with additional specimen data extracted from a manifest. [Optimus pipeline]: https://broadinstitute.github.io/warp/docs/Pipelines/Optimus_Pipeline/README ```{r} annotated_loom <- optimus_loom_annotation(file_location) annotated_loom ## new metadata setdiff( names(metadata(annotated_loom)), names(metadata(loom)) ) metadata(annotated_loom)$manifest ## new colData columns setdiff( names(colData(annotated_loom)), names(colData(loom)) ) ``` # Session info ```{r sessionInfo} sessionInfo() ```