--- title: "Using DelayedMatrix with MultiAssayExperiment" author: "MultiAssay Special Interest Group" date: "`r format(Sys.time(), '%B %d, %Y')`" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{HDF5Array and MultiAssayExperiment} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: number_sections: yes toc: yes --- ```{r, echo=FALSE, warning=FALSE} suppressPackageStartupMessages({ library(MultiAssayExperiment) library(HDF5Array) library(SummarizedExperiment) }) ``` # Integrating an HDF5 backend for MultiAssayExperiment Dependencies ```{r, eval = FALSE} library(MultiAssayExperiment) library(HDF5Array) library(SummarizedExperiment) ``` ## HDF5Array and DelayedArray Constructor The `HDF5Array` package provides an on-disk representation of large datasets without the need to load them into memory. Convenient lazy evaluation operations allow the user to manipulate such large data files based on metadata. The `DelayedMatrix` class in the `DelayedArray` package provides a way to connect to a large matrix that is stored on disk. First, we create a small matrix for constructing the `DelayedMatrix` class. ```{r} smallMatrix <- matrix(rnorm(10e5), ncol = 20) ``` We add rownames and column names to the matrix object for compatibility with the `MultiAssayExperiment` representation. ```{r} rownames(smallMatrix) <- paste0("GENE", seq_len(nrow(smallMatrix))) colnames(smallMatrix) <- paste0("SampleID", seq_len(ncol(smallMatrix))) ``` Here we use the `DelayedArray` constructor function to create a `DelayedMatrix` object. ```{r} smallMatrix <- DelayedArray(smallMatrix) class(smallMatrix) head(smallMatrix) dim(smallMatrix) ``` ## Importing HDF5 files Note that a large matrix from an HDF5 file can also be loaded using the `HDF5Array` function. For example: ```{r} dataLocation <- system.file("extdata", "exMatrix.h5", package = "MultiAssayExperiment", mustWork = TRUE) h5ls(dataLocation) hdf5Data <- HDF5ArraySeed(file = dataLocation, name = "exMatrix") newDelayedMatrix <- DelayedArray(hdf5Data) class(newDelayedMatrix) head(newDelayedMatrix) ``` ### Dimnames from HDF5 file Currently, the `rhdf5` package does not store `dimnames` in the `h5` file by default. A request for this feature has been sent to the maintainer of the `rhdf5` package and any further development to the `HDF5Array` package is contingent on such lower level dimension name storage. ## Using a `DelayedMatrix` with `MultiAssayExperiment` A `DelayedMatrix` alone conforms to the `MultiAssayExperiment` API requirements. Shown below, the `DelayedMatrix` can be put into a named `list` and passed into the `MultiAssayExperiment` constructor function. ```{r} HDF5MAE <- MultiAssayExperiment(experiments = list(smallMatrix = smallMatrix)) sampleMap(HDF5MAE) colData(HDF5MAE) ``` ### `SummarizedExperiment` with `DelayedMatrix` backend A more information rich `DelayedMatrix` can be created when used in conjunction with the `SummarizedExperiment` class and it can even include `rowRanges`. The flexibility of the `MultiAssayExperiment` API supports classes with minimal requirements. Additionally, this `SummarizedExperiment` with the `DelayedMatrix` backend can be part of a bigger `MultiAssayExperiment` object. Below is a minimal example of how this would work: ```{r} HDF5SE <- SummarizedExperiment(assays = smallMatrix) assay(HDF5SE) MultiAssayExperiment(list(HDF5SE = HDF5SE)) ``` Additional scenarios are currently in development where an `HDF5Matrix` is hosted remotely. Many opportunities exist when considering on-disk and off-disk representations of data with `MultiAssayExperiment`.