---
title: "spillR Vignette"
author: 
- name: Marco Guazzini
  affiliation: 
  - Department of Advanced Computing Sciences, Maastricht University, The Netherlands
- name: Alexander G. Reisach
  affiliation: 
  - Université Paris Cité, CNRS, MAP5, F-75006 Paris, France
- name: Sebastian Weichwald
  affiliation: 
  - Department of Mathematical Sciences, University of Copenhagen, Denmark
- name: Christof Seiler
  affiliation: 
  - Department of Advanced Computing Sciences, Maastricht University, The Netherlands
  - Mathematics Centre Maastricht, Maastricht University, The Netherlands
  - Center of Experimental Rheumatology, Department of Rheumatology, University Hospital Zurich, University of Zurich, Switzerland
date: "`r Sys.Date()`"
output: 
  BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{spillR Vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: spillr_paper.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
    echo = TRUE, warning = FALSE, message = FALSE,
    fig.retina = 2, dpi = 96,
    fig.width = 7.2916667, fig.asp = 0.6178571
)
```

# Introduction

Mass cytometry makes it possible to count a large number of proteins simultaneously on individual cells [@bandura2009mass; @bendall2011single]. Mass cytometry has less spillover--- measurements from one channel overlap less with those of another---than flow cytometry [@sp-c; @novo2013generalized], but spillover is still a problem and affects downstream analyses such as differential testing [@diffcyt; @seiler2021cytoglmm] or dimensionality reduction [@scater]. Reducing spillover by careful design of experiment is possible [@takahashi2017mass], but a purely experimental approach may not be sufficient nor efficient [@lun2017influence]. @catalyst propose a method for addressing spillover by conducting an experiment on beads. This experiment measures spillover by staining each bead with a single antibody. Their solution relies on an estimate for the spillover matrix using non-negative matrix factorization. The spillover matrix encodes the pairwise spillover proportion between channels. We avoid this step and directly describe the spillover channels and the channel with the true signal using a mixture of nonparametric distributions. Our main new assumption is that the spillover distribution---not just the spillover proportion---from the beads experiment carries over to the biological experiment. Here, we illustrate the `spillR`_R_ package for spillover compensation in mass cytometry.

Our motivation to submit to Bioconductor is to make our package available to a large user base and to ensure its compatibility with other packages that address preprocessing and analysis of mass cytometry data.

# Installation

Install this package.

```{r install-package, eval=FALSE}
if (!require("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("spillR")
```

# Data

We test our method on one of the example datasets in the `CATALYST` package. The dataset has an experiment with real cells and a corresponding bead experiment. The experiments on real cells has 5,000 peripheral blood mononuclear cells from healthy donors measured on 39 channels. The experiment on beads has 10,000 cells measured on 36 channels. They have single stained bead experiments. The number of beads per mental label range from 112 to 241.

We compare the two methods on the same marker as in the original `CATALYST` paper [@catalyst] in their Figure 3B. In the original experiment, they conjugated three proteins---CD3, CD8, and HLA-DR---with two different metal labels. Here is the `CATALYST` code to load the data into a Single Cell Experiment (SCE).

```{r load-data, warning=FALSE}
library(spillR)
library(CATALYST)
library(dplyr)
library(ggplot2)
library(cowplot)

bc_key <- c(139, 141:156, 158:176)
sce_bead <- prepData(ss_exp)
sce_bead <- assignPrelim(sce_bead, bc_key, verbose = FALSE)
sce_bead <- applyCutoffs(estCutoffs(sce_bead))
sce_bead <- computeSpillmat(sce_bead)

# --------- experiment with real cells ---------
data(mp_cells, package = "CATALYST")
sce <- prepData(mp_cells)
```

# Compensation Workflow

The function `compCytof` takes as inputs two Single Cell Experiment (SCE) objects. One contains the real cells experiment and the other the beads experiment. It also requires a table `marker_to_barc` that maps the channels to their barcodes used in the beads experiment. The output is the same SCE for the real experiments with the addition of the compensated counts and the `asinh` transformed compensated counts.

```{r compensation-workflow, warning=FALSE}
# --------- table for mapping markers and barcode ---------
marker_to_barc <-
    rowData(sce_bead)[, c("channel_name", "is_bc")] |>
    as_tibble() |>
    dplyr::filter(is_bc == TRUE) |>
    mutate(barcode = bc_key) |>
    select(marker = channel_name, barcode)

# --------- compensate function from spillR package ---------
sce_spillr <-
    spillR::compCytof(sce, sce_bead, marker_to_barc, impute_value = NA, 
                      overwrite = FALSE)

# --------- 2d histogram from CATALYST package -------
as <- c("counts", "exprs", "compcounts", "compexprs")
chs <- c("Yb171Di", "Yb173Di")
ps <- lapply(as, function(a) plotScatter(sce_spillr, chs, assay = a))
plot_grid(plotlist = ps, nrow = 2)
```

# Diagnostic Plot

`spillR` offers the possibility to visualize the compensation results and the internal spillover estimates. The function `plotDiagnostics` presents two plots: the frequency polygons before and after spillover compensation, and the density plot of spillover markers with our estimation of the spillover probability function as a black dashed curve. This plot allows us to check the compensation performed by our method. If the black curve captures all the spillover makers, then that indicates a reliable spillover estimation. If the target marker in the beads experiment overlaps with the real cells, then that indicates a high-quality bead experiment.

```{r plotting, warning=FALSE}
ps <- spillR::plotDiagnostics(sce_spillr, "Yb173Di")
x_lim <- c(0, 7)
plot_grid(ps[[1]] + xlim(x_lim),
    ps[[2]] + xlim(x_lim),
    ncol = 1, align = "v"
)
```

# Session Info {-}

```{r session_info}
sessionInfo()
```

# References {-}