--- title: "Applying a function over a SingleCellExperiment's contents" author: - name: Aaron Lun email: infinite.monkeys.with.keyboards@gmail.com package: SingleCellExperiment output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{2. Applying over a SingleCellExperiment object} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r options, include=FALSE, echo=FALSE} library(BiocStyle) knitr::opts_chunk$set(warning=FALSE, error=FALSE, message=FALSE) ``` # Motivation The `SingleCellExperiment` is quite a complex class that can hold multiple aspects of the same dataset. It is possible to have multiple assays, multiple dimensionality reduction results, and multiple alternative Experiments - each of which can further have multiple assays and `reducedDims`! In some scenarios, it may be desirable to loop over these pieces and apply the same function to each of them. This is made conveniently possible via the `applySCE()` framework. # Quick start Let's say we have a moderately complicated `SingleCellExperiment` object, containing multiple alternative Experiments for different data modalities. ```{r} library(SingleCellExperiment) counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10) sce <- SingleCellExperiment(counts) altExp(sce, "Spike") <- SingleCellExperiment(matrix(rpois(20, lambda = 5), ncol=10, nrow=2)) altExp(sce, "Protein") <- SingleCellExperiment(matrix(rpois(50, lambda = 100), ncol=10, nrow=5)) altExp(sce, "CRISPR") <- SingleCellExperiment(matrix(rbinom(80, p=0.1, 1), ncol=10, nrow=8)) sce ``` Assume that we want to compute the total count for each modality, using the first assay. We might define a function that looks like the below. (We will come back to the purpose of `multiplier=` and `subset.row=` later.) ```{r} totalCount <- function(x, i=1, multiplier=1, subset.row=NULL) { mat <- assay(x, i) if (!is.null(subset.row)) { mat <- mat[subset.row,,drop=FALSE] } colSums(mat) * multiplier } ``` We can then easily apply this function across the main and alternative Experiments with: ```{r} totals <- applySCE(sce, FUN=totalCount) totals ``` # Design explanation The `applySCE()` call above is functionally equivalent to: ```{r} totals.manual <- list( totalCount(sce), Spike=totalCount(altExp(sce, "Spike")), Protein=totalCount(altExp(sce, "Protein")), CRISPR=totalCount(altExp(sce, "CRISPR")) ) stopifnot(identical(totals, totals.manual)) ``` Besides being more verbose than `applySCE()`, this approach does not deal well with common arguments. Say we wanted to set `multiplier=10` for all calls. With the manual approach above, this would involve specifying the argument multiple times: ```{r} totals10.manual <- list( totalCount(sce, multiplier=10), Spike=totalCount(altExp(sce, "Spike"), multiplier=10), Protein=totalCount(altExp(sce, "Protein"), multiplier=10), CRISPR=totalCount(altExp(sce, "CRISPR"), multiplier=10) ) ``` Whereas with the `applySCE()` approach, we can just set it once. This makes it easier to change and reduces the possibility of errors when copy-pasting parameter lists across calls. ```{r} totals10.apply <- applySCE(sce, FUN=totalCount, multiplier=10) stopifnot(identical(totals10.apply, totals10.manual)) ``` Now, one might consider just using `lapply()` in this case, which also avoids the need for repeated specification: ```{r} totals10.lapply <- lapply(c(List(sce), altExps(sce)), FUN=totalCount, multiplier=10) stopifnot(identical(totals10.apply, totals10.lapply)) ``` However, this runs into the opposite problem - it is no longer possible to specify _custom_ arguments for each call. For example, say we wanted to subset to a different set of features for each main and alternative Experiment. With `applySCE()`, this is still possible: ```{r} totals.custom <- applySCE(sce, FUN=totalCount, multiplier=10, ALT.ARGS=list(Spike=list(subset.row=2), Protein=list(subset.row=3:5))) totals.custom ``` In cases where we have a mix between custom and common arguments, `applySCE()` provides a more convenient and flexible interface than manual calls or `lapply()`ing. # Simplifying to a `SingleCellExperiment` The other convenient aspect of `applySCE()` is that, if the specified `FUN=` returns a `SingleCellExperiment`, `applySCE()` will try to format the output as a `SingleCellExperiment`. To demonstrate, let's use the `head()` function to take the first few features for each main and alternative Experiment: ```{r} head.sce <- applySCE(sce, FUN=head, n=5) head.sce ``` Rather than returning a list of `SingleCellExperiment`s, we can see that the output is neatly organized as a `SingleCellExperiment` with the specified `n=5` features. Moreover, each of the alternative Experiments is also truncated to its first 5 features (or fewer, if there weren't that many to begin with). This output mirrors, as much as possible, the format of the input `sce`, and is much more convenient to work with than a list of objects. ```{r} altExp(head.sce) altExp(head.sce, "Protein") altExp(head.sce, "CRISPR") ``` To look under the hood, we can turn off simplification and see what happens. We see that the function indeed returns a list of `SingleCellExperiment` objects corresponding to the `head()` of each Experiment. When `SIMPLIFY=TRUE`, this list is passed through `simplifyToSCE()` to attempt the reorganization into a single object. ```{r} head.sce.list <- applySCE(sce, FUN=head, n=5, SIMPLIFY=FALSE) head.sce.list ``` For comparison, if we had to do this manually, it would be rather tedious and error-prone, e.g., if we forgot to set `n=` or if we re-assigned the output of `head()` to the wrong alternative Experiment. ```{r} manual.head <- head(sce, n=5) altExp(manual.head, "Spike") <- head(altExp(sce, "Spike"), n=5) altExp(manual.head, "Protein") <- head(altExp(sce, "Protein"), n=5) altExp(manual.head, "CRISPR") <- head(altExp(sce, "CRISPR"), n=5) manual.head ``` Of course, this simplification is only possible when circumstances permit. It requires that `FUN=` returns a `SingleCellExperiment` at each call, and that no more than one result is generated for each alternative Experiment. Failure to meet these conditions will result in a warning and a non-simplified output. Developers may prefer to set `SIMPLIFY=FALSE` and manually call `simplifyToSCE()`, possibly with `warn.level=3` to trigger an explicit error when simplification fails. # Session information {-} ```{r} sessionInfo() ```