--- title: "The `waddR` package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{waddR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The `waddR` package offers statistical tests based on the 2-Wasserstein distance for detecting and characterizing differences between two distributions given in the form of samples. Functions for calculating the 2-Wasserstein distance and testing for differential distributions are provided, as well as a specifically tailored test for differential expression in single-cell RNA sequencing data. `waddR` provides tools to address the following tasks, each described in a separate vignette: * [Calculation of the 2-Wasserstein distance](wasserstein_metric.html), * [Two-sample tests](wasserstein_test.html) to check for differences between two distributions, * Detection of [differential gene expression distributions](wasserstein_singlecell.html) in single-cell RNA sequencing (scRNAseq) data. These are bundled into one package, because they are internally dependent: The procedure for detecting differential distributions in scRNAseq data is an adaptation of the general two-sample test, which itself uses the 2-Wasserstein distance to compare two distributions. ### 2-Wasserstein distance functions The 2-Wasserstein distance is a metric to describe the distance between two distributions, representing e.g. two diferent conditions $A$ and $B$. The `waddR` package specifically considers the squared 2-Wasserstein distance which can be decomposed into location, size, and shape terms, thus providing a characterization of potential differences. The `waddR` package offers three functions to calculate the (squared) 2-Wasserstein distance, which are implemented in C++ and exported to R with Rcpp for faster computation. The function `wasserstein_metric` is a Cpp reimplementation of the `wasserstein1d` function from the R package `transport`. The functions `squared_wass_approx` and `squared_wass_decomp` compute approximations of the squared 2-Wasserstein distance, with `squared_wass_decomp` also returning the decomposition terms for location, size, and shape. See `?wasserstein_metric`, `?squared_wass_aprox`, and `?squared_wass_decomp` for more details. ### Testing for differences between two distributions The `waddR` package provides two testing procedures using the 2-Wasserstein distance to test whether two distributions $F_A$ and $F_B$ given in the form of samples are different by testing the null hypothesis $H_0: F_A = F_B$ against the alternative hypothesis $H_1: F_A != F_B$. The first, semi-parametric (SP), procedure uses a permutation-based test combined with a generalized Pareto distribution approximation to estimate small p-values accurately. The second procedure uses a test based on asymptotic theory (ASY) which is valid only if the samples can be assumed to come from continuous distributions. See `?wasserstein.test` for more details. ### Testing for differences between two distributions in the context of scRNAseq data The `waddR` package provides an adaptation of the semi-parametric testing procedure based on the 2-Wasserstein distance which is specifically tailored to identify differential distributions in scRNAseq data. In particular, a two-stage (TS) approach is implemented that takes account of the specific nature of scRNAseq data by separately testing for differential proportions of zero gene expression (using a logistic regression model) and differences in non-zero gene expression (using the semiparametric 2-Wasserstein distance-based test) between two conditions. See `?wasserstein.sc` and `?testZeroes` for more details. ## Installation To install `waddR` from Bioconductor, use `BiocManager` with the following commands: ```{r install, eval=FALSE, echo=TRUE} if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install("MyPackage") ``` Using `BiocManager`, the package can also be installed from GitHub directly: ```{r install-github, eval=FALSE, echo=TRUE} BiocManager::install("goncalves-lab/waddR") ``` The package `waddR` can then be used in R: ```{r load-package} library("waddR") ``` ## Session info ```{r session-info} sessionInfo() ```