--- title: 'Interoptability between MAST and SingleCellExperiment-derived packages.' author: Andrew McDavid date: "`r Sys.Date()`" package: MAST output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Interoptability between MAST and SingleCellExperiment-derived packages} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- # Introduction As a SingleCellExperiment-derived package, `MAST` can easily be inserted into workflows with packages such as `scran`, `scater`, `zinbwave`, `SCnorm` and others. The main gotcha is packages that assume integer counts vs log-transformed, or log-transformed, approximately scale-normalized data. We find that MAST performs best with log-transformed, scale-normalized data that has been thresholded, such as $\log_2(\text{transcripts per million} + 1)$. We address this by: - testing for log-like data for objects constructed in MAST - explicitly naming the slot of the `assay` containing such putatively log-like data - by default operating on the slot with such log-like data In objects that were constructed in other packages, we ... In what follows, we show an example of using `scater` to plot some QC metrics, `SCnorm` to normalize data, and, and conversion to a `Seurat` object. # From MAST to Scater Scater (citation) is a package that ... ```{r, results = 'hide'} library(MAST) knitr::opts_chunk$set(message = FALSE,error = FALSE,warning = FALSE) data(maits, package='MAST') unlog <- function(x) ceiling(2^x - 1) sca_raw = FromMatrix(t(maits$expressionmat), maits$cdat, maits$fdat) assays(sca_raw)$counts = unlog(assay(sca_raw)) assayNames(sca_raw) ``` Here we make an object with assays `counts` and `et`. By default, `MAST` will operate on the `et` assay, but scran wants count-like data for some of its QC. The `et` data are log2 + 1 transcripts per million (TPM), as output by RSEM. We could specify the assay name at creation with `sca_raw = FromMatrix(list(logTPM = t(maits$expressionmat)), maits$cdat, maits$fdat)` or rename an object that contains appropriately transformed data with `assayNames(sca_raw) = c('logTPM', 'counts')`. Before calling `scater` functionality, you might pause to consider if some features should belong in special `control` sets, such as mitochrondial genes, or spike-ins. ```{r scaterQC,results='hide'} library(scater) sca_raw = calculateQCMetrics(sca_raw) plotRowData(sca_raw, x = 'log10_mean_counts', 'pct_dropout_by_counts') plotColData(sca_raw, y="total_features_by_counts", x="total_counts") ``` Evidently some features were filtered, so not all cells contain 1 million counts. We can tell these were rare features based on the inverse relationship between `total_counts` and `total_features_by_counts`: the most complex libraries (with the greatest numer of features) are missing the most counts. ```{r} sca_raw <- runPCA(sca_raw, ncomponents=5, exprs_values = 'et') plotReducedDim(sca_raw, use_dimred = 'PCA', colour_by = 'condition') ``` We can also run a PCA. ## From scater to MAST ```{r} data(sc_example_counts) ``` # WIP: From MAST to ZINB-wave