---
title: Meta-analyses on p-values of various differential tests
author: 
- name: Aaron Lun
  email: infinite.monkeys.with.keyboards@gmail.com
date: "Revised: 1 November 2020"
output:
  BiocStyle::html_document:
    toc_float: true
package: metapod
vignette: >
  %\VignetteIndexEntry{Meta-analysis strategies}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}    
---

```{r, echo=FALSE, results="hide", message=FALSE}
require(knitr)
opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE)
library(BiocStyle)
```

# Overview

The `r Biocpkg("metapod")` package provides utilities for meta-analyses on precomputed $p$-values, typicall generated from differential analyses of some sort.
This enables users to consolidate multiple related tests into a single metric for easier interpretation and prioritization of interesting events.
The functionality here was originally developed as part of the `r Biocpkg("csaw")` package, to combine inferences from adjacent windows inside a genomic region;
as well as from the `r Biocpkg("scran")` package, to merge statistics from multiple batches or from multiple pairwise comparisons during marker gene detection.
Most functions in this package involve combining multiple (independent or not) $p$-values into a single combined $p$-value.
Additional functions are also provided to summarize the direction of any effect.

# Meta-analyses on $p$-values

`r Biocpkg("metapod")` provides two families of functions to combine a group of $p$-values into a single statistic.
The first family considers parallel vectors of $p$-values where each group of $p$-values is defined from corresponding elements across vectors.
This is useful for combining different tests on the same set of features, e.g., DE analyses with the same gene annotation. 
We illustrate this below by using Simes' method to combine two parallel vectors:

```{r}
library(metapod)
p1 <- rbeta(10000, 0.5, 1)
p2 <- rbeta(10000, 0.5, 1)
par.combined <- parallelSimes(list(p1, p2))
str(par.combined)
```

The other approach is to consider groups of $p$-values within a single numeric vector.
This is useful when the $p$-values to be combined are generated as part of a single analysis.
The motivating example is that of adjacent windows in the same genomic region for ChIP-seq,
but one can also imagine the same situation for other features, e.g., exons in RNA-seq.
In this situation, we need a grouping factor to define the groups within our vector:

```{r}
p <- rbeta(10000, 0.5, 1)
g <- sample(100, length(p), replace=TRUE)
gr.combined <- groupedSimes(p, g)
str(gr.combined)
```

Both families of functions return the combined $p$-value along with some other helpful statistics.
The `representative` vector specifies the index of the representative test for each group;
this is a test that is particularly important to the calculation of the combined $p$-value and can be used to, e.g., obtain a representative effect size. 
Similarly, the `influential` vector indicates which of the individual tests in each group are "influential" in computing the combined $p$-value.
This is defined as the subset of tests that, if their $p$-values were set to unity, would alter the combined $p$-value.

# $p$-value combining strategies

The following methods yield a significant result for groups where **any** of the individual tests are significant:

- Simes' method (`parallelSimes()` and `groupedSimes()`) is robust to dependencies between tests and supports weighting of individual tests.
- Fisher's method (`parallelFisher()` and `groupedFisher()`) assumes that tests are independent and does not support weighting.
- Pearson's method (`parallelPearson()` and `groupedPearson()`) assumes that tests are independent and does not support weighting.
- Stouffer's Z-score method (`parallelStouffer()` and `groupedStouffer()`) assumes that tests are independent and does support weighting.

Fisher's method is most sensitive to the smallest $p$-value while Pearson's method is most sensitive to the largest $p$-value.
Stouffer's method serves as a compromise between these two extremes.
Simes' method is more conservative than all three but is still functional in the presence of dependencies.

The following methods yield a significant result for groups where **some** of the individual tests are significant
(i.e., a minimum number or proportion of individual tests reject the null hypothesis):

- Wilkinson's method (`parallelWilkinson()` and `groupedWilkinson()`) assumes that tests are independent and does not support weighting.
- The minimum Holm approach (`parallelHolmMin()` and `groupedHolmMin()`) does not require independence and supports weighting.

Technically, Wilkinson's method can reject if any of the individual nulls are significant.
However, it has a dramatic increase in detection power when the specified minimum is attained, hence its inclusion in this category.

Finally, Berger's intersection union test will yield a significant result for groups where **all** of the individual tests are significant.
This does not require independence but does not support weighting.

# Summarizing the direction

The `summarizeParallelDirection()` and `summarizeGroupedDirection()` functions will summarize the direction of effect of all influential tests for each group.
For example, if all influential tests have positive log-fold changes, the group would be reported as having a direction of `"up"`.
Alternatively, if influential tests have both positive and negative log-fold changes, the reported direction would be `"mixed"`.
By only considering the influential tests, we avoid noise from "less significant" tests that do not contribute to the final $p$-value.

```{r}
logfc1 <- rnorm(10000)
logfc2 <- rnorm(10000)

par.dir <- summarizeParallelDirection(list(logfc1, logfc2), 
    influential = par.combined$influential)
table(par.dir)
```

`countParallelDirection()` and `countGroupedDirection` will just count the number of significant tests changing in each direction.
This is done after correcting for the number of tests in each group with the Benjamini-Hochberg or Holm methods.
These counts are somewhat simpler to interpret than the summarized direction but have their own caveats - see `?countParallelDirection` for details.

```{r}
par.dir2 <- countParallelDirection(list(p1, p2), list(logfc1, logfc2)) 
str(par.dir2)
```

Another approach is to just use the direction of effect of the representative test.
This is usually the best approach if a single direction and effect size is required, e.g., for visualization purposes.

# Further comments

All functions support `log.p=TRUE` to accept log-transformed $p$-values as input and to return log-transformed output.
This is helpful in situations where underflow would cause generation of zero $p$-values.

Several functions support weighting of the individual tests within each group.
This is helpful if some of the tests are _a priori_ known to be more important than others.

The `combineParallelPValues()` and `combineGroupedPValues()` wrapper functions will dispatch to the requested `method`,
making it easier to switch between methods inside other packages:

```{r}
gr.combined2 <- combineGroupedPValues(p, g, method="holm-min")
str(gr.combined2)
```

# Session information {-}

```{r}
sessionInfo()
```