---
title: Managing multiple testing challenges with sequential inference using the onlineFDR
package
author: "David S. Robertson and Natasha A. Karp"
date: "2018-12-10"
output:
html_document:
toc: yes
pdf_document:
toc: yes
vignette: >
%\VignetteIndexEntry{Using the onlineFDR package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
---
## Introduction
Multiple hypothesis testing is a fundamental problem in statistical inference.
The failure to manage the multiple testing problem has been highlighted as one
of the elements contributing to the replicability crisis in science (Ioannidis
2015). Methodologies have been developed for a family of hypotheses to adjust
the significance levels to manage the multiple testing situation by controlling
error metrics such as the familywise error rate (FWER) or the false discovery
rate (FDR).
Frequently, modern data analysis problems have a further complexity that the
hypothesis arrive sequentially in a stream. This introduces the challenge that
at each step the investigator must decide whether to reject the current null
hypothesis without having access to the future p-values or the total number of
hypothesis to be tested, but does have knowledge of the historic decisions to
date. The International Mouse Phenotyping Consortium (Koscielny *et al*, 2013),
provides a concrete example of such a scenario. Here the dataset is constantly
growing as new knockout mice lines are generated and phenotyping data uploaded
to a database.
Javanmard and Montanari proposed two procedures, LOND and LORD, to control
FDR in an online manner (Javanmard and Montanari (2015, 2018)), with the latter
extended by Ramdas et al. (2017). The LOND procedure sets the significance level
$\alpha_i$ based on the number of discoveries made so far, while LORD sets
them according to the time of the most recent discovery. This package,
onlineFDR, implements these procedures and provides wrapper functions to apply
them to a historic dataset or a growing dataset. As a comparison, we have also
provided a wrapper function for implementation of a method based on the
Bonferroni procedure adapted to the online scenario. This vignette explains the
use of the package and demonstrates a typical workflow.
---
## Overview of the process
1. A dataset with three columns (an identifier ('id'), date ('date') and
p-value ('pval')) is passed to one of the onlineFDR wrapper functions.
2. The function orders the information by date. If there are multiple p-values
with the same date (i.e. the same batch), the order of the p-values within each
batch is randomised by default. In order for the randomisation of the p-values
to be reproducible, it is necessary to set a seed (via the `set.seed`
function) before calling the wrapper function (see also step 6).
3. For each hypothesis test, an adjusted significance threshold (`alphai`) is
calculated, which gives the threshold at which the corresponding p-value would
be declared significant.
4. Using the p-values provided and the alphai, an indicator of discoveries (`R`)
is calculated, where `R[i] = 1` corresponds to hypothesis i being rejected
(and `R[i] = 0` otherwise).
5. A dataframe is returned, reordered by batch, with the original data and the
newly calculated `alphai` and `R`.
6. For simplicity, as the dataset grows the new larger dataset should be passed
to the wrapper function and the values recalculated repeating steps 1 to 5. In
order for the randomisation of the data within the previous batches to remain
the same (and hence to allow for reproducibility of the results),
*the same seed should be used for all analyses*.
### Outline of the four functions available
1: LOND
Implements the LOND procedure for online FDR control, where LOND stands for
(significance) Levels based On Number of Discoveries, as presented by Javanmard
and Montanari (2015). The procedure controls the FDR for independent p-values,
with an option (dep = TRUE) which guarantees control for dependent p-values.
2: LORD
Implements the LORD procedure for online FDR control where LORD stands for
(significance) Levels based On Recent Discovery, as presented by Javanmard
and Montanari (2018) and Ramdas et al. (2017). The function provides different
versions of the procedure for independent p-values, see
'Theory behind onlineFDR'.
3: LORDdep
Implements the LORD procedure for online FDR control for dependent p-values,
where LORD stands for (significance) Levels based On Recent Discovery,
as presented by Javanmard and Montanari (2018).
4: BonfInfinite
Implements online FDR control using a Bonferroni-like test.
### Quick start
Here we show the steps to achieve online FDR control of a growing dataset. First
you load a dataframe with the three columns: an identifier ('id'), date ('date')
and p-value ('pval'), and then call the wrapper function of interest. In order
for the results to be reproducible, we also set a seed using the `set.seed`
function.
```{r}
library(onlineFDR)
sample.df <- data.frame(
id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902',
'C38292', 'A30619', 'D46627', 'E29198', 'A41418',
'D51456', 'C88669', 'E03673', 'A63155', 'B66033'),
date = as.Date(c(rep("2014-12-01",3),
rep("2015-09-21",5),
rep("2016-05-19",2),
"2016-11-12",
rep("2017-03-27",4))),
pval = c(2.90e-14, 0.06743, 0.01514, 0.08174, 0.00171,
3.61e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
0.69274, 0.30443, 0.000487, 0.72342, 0.54757))
set.seed(1)
results <- LORD(sample.df)
results
```
### Input data
A dataset with three columns (an identifier ('id'), date ('date') and p-value
('pval')). All p values generated should be passed to the function (and not
just the significant p-values). An exception to this would be if you have
implemented an orthogonal filter to reduce the dataset size, such as discussed
in (Burgon *et al.*, 2010).
### Understanding the output
For each hypothesis test, the functions calculate the adjusted significance
thresholds (`alphai`) at which the corresponding p-value would be declared
statistically significant.
Also calculated is an indicator function of discoveries (`R`), where `R[i] = 1`
corresponds to hypothesis i being rejected, otherwise `R[i] = 0`.
A dataframe is returned, reordered by batch, with the original data and the
newly calculated `alphai` and `R`.
---
## How to get help for onlineFDR
All questions regarding onlineFDR should be posted to the
**Bioconductor support site**, which serves as a searchable knowledge base of
questions and answers:
https://support.bioconductor.org
Posting a question and tagging with "onlineFDR" will automatically send an alert
to the package authors to respond on the support site.
---
## Theory behind onlineFDR
### Online hypothesis testing
Consider a sequence of hypotheses $H_1, H_2, H_3, \ldots$ that arrive
sequentially in a stream, with corresponding $p$-values
$(p_1, p_2, p_3, \ldots)$. A testing procedure provides a sequence of adjusted
significance thresholds $\alpha_i$, with corresponding decision rule:
\[ R_i = \begin{cases}
1 & \text{if } p_i \leq \alpha_i & (\text{reject } H_i)\\
0 & \text{otherwise} & (\text{accept } H_i)
\end{cases} \]
In *online* testing, the significance thresholds can only be functions of
the prior decisions, i.e. $\alpha_i = \alpha_i(R_1, R_2, \ldots, R_{i-1})$.
Javanmard and Montanari (2015, 2018) proposed two procedures for online control.
The first is LOND, which stands for (significance) Levels based On Number of
Discoveries. The second is LORD, which stands for (significance) Levels based
On Recent Discovery. LORD was subsequently extended by Ramdas et al. (2017).
### LOND procedure {#LOND}
The LOND procedure controls the FDR for independent $p$-values. Given an overall
significance level $\alpha$, we choose any infinite sequence of non-negative
numbers $\beta = (\beta_i)_{i \in \mathbb{N}}$ such that they sum to $\alpha$.
The values of the adjusted significance thresholds $\alpha_i$ are chosen as
follows:
\[ \alpha_i = \beta_i (D(i-1) + 1) \]
where $D(n) = \sum_{i=1}^n R_i$ denotes the number of discoveries
(i.e. rejections) in the first $n$ hypotheses tested.
LOND can be adjusted to also control FDR under *dependent* $p$-values. To do so,
it is modified with $\tilde{\beta}_i = \beta_i/H(i)$ in place of $\beta_i$,
where $H(i) = \sum_{j=1}^i \frac{1}{j}$ is the $i$-th harmonic number.
Note that this leads to a substantial loss in power compared to the
unadjusted LOND procedure. The correction factor is similat to the
classical one used by Benjamini and Yekutieli (2001), except that in this case
the $i$-th hypothesis among $N$ is penalised by a factor of $H(i)$ to give
consistent results across time (as compared to a factor $H(N)$ for the Benjamini
and Yekutieli method).
The default sequence of $\beta$ is given by
\[\beta_j = C \alpha \frac{\log(\max(j, 2))}{j e^{\sqrt{\log j}}}\] where
$C \approx 0.07720838$, as proposed by Javanmard and Montanari (2018)
equation 31.
### LORD procedures {#LORD}
The LORD procedure controls the FDR for independent $p$-values. We first fix a
sequence of non-negative numbers $\gamma = (\gamma_i)_{i \in \mathbb{N}}$ such
that $\gamma_i \geq \gamma_j$ for
$i \leq j$ and $\sum_{i=1}^{\infty} \gamma_i = 1$. At each time $i$, let
$\tau_i$ be the last time a discovery was made before $i$: \[
\tau_i = \max \left\{ l \in \{1, \ldots, i-1\} : R_l = 1\right\}
\] Also let $T(i)$ be the set of discovery times up to time $i$:
\[ T(i) = \left\{ l \in \{1, \ldots, i-1 \} : R_l = 1 \right\}\]
LORD depends on constants $w_0$ and $b_0$, where $w_0 \geq 0$ represents the
initial 'wealth' of the procedure and $b_0 > 0$ represents the 'payout' for
rejecting a hypothesis. We require $w_0+b_0 \leq \alpha$ for FDR control to
hold.
Javanmard and Montanari (2018) presented three different versions of LORD, which
have different definitions of the adjusted signifcance thresholds $\alpha_i$.
* **LORD 1**: The significance thresholds are based on the time of the last
discovery, with
\[
\alpha_i = \begin{cases}
\gamma_i w_0 & \text{ if } i \leq t_1 \\
\gamma_{i - \tau_i} b_0 & \text{ if } i > t_1
\end{cases}
\] where $t_1$ denotes the time of the first discovery.
* **LORD 2**: The significance thresholds are based on all previous discovery
times, with
\[ \alpha_i = \gamma_i w_0 + \left(\sum_{l \in T(i)} \gamma_{i-l} \right) b_0
\]
* **LORD 3**: The significance thresholds depend on the time of the last
discovery time and the wealth accumulated at that time, with
\[
\alpha_i = \gamma_{i - \tau_i} W(\tau_i)
\]
where $\tau_1 = 0$. Here $\{W(j)\}_{j \geq 0}$ represents the 'wealth' available
at time $j$, and is defined recursively: \[
\begin{align}
W(0) & = w_0 \nonumber \\
W(j) & = W(j-1) - \alpha_{j-1} + b_0 R_j
\end{align}
\]
where $w_0$ represents the initial 'wealth', and $b_0$ represents the amount
earned for rejecting a hypothesis.
* **LORD++**: Ramdas et al. (2017) derived a modified version of LORD 2, called
LORD++, which has at least as high power. The significance thresholds for LORD++
are chosen as follows: \[
\alpha_i = \gamma_i w_0 + (\alpha - w_0) \gamma_{i-\tau_1} +
\alpha \sum_{j : \tau_j < i, \tau_j \neq \tau_1} \gamma_{i - \tau_j}
\] Hence the significance thresholds for LORD++ will always be equal or greater
than those for LORD 2.
LORD 1 and LORD 2 are instances of monotone generalised alpha investing rules,
and hence provably control the FDR for independent p-values provided
$w_0 + b_0 \leq \alpha$. LORD++ also provable controls the FDR for independent
p-values provided $w_0 \leq \alpha$. In Javanmard and
Montanari (2018), the authors demonstrate empirically that LORD 3
controls the FDR.
Note that LORD 1 will always have a lower power than LORD 2 (and hence LORD++)
as well as LORD 3, and is included for completeness. In many scenarios with
large $N$, LORD 3 will have a slightly higher power than LORD++
(see Robertson et al., 2018). Hence we would recommend using LORD 3
(the default) unless there is a need for a method with a provable guarantee of
FDR control in which case we would recommend using LORD++.
In all versions, the default sequence of $\gamma$ is
given by \[\gamma_j = C \frac{\log(\max(j, 2))}{j e^{\sqrt{\log j}}}\]
where $C \approx 0.07720838$, as proposed by Javanmard and Montanari (2018)
equation 31.
#### LORDdep {#LORDdep}
Javanmard and Montanari (2018) also presented an adjusted version of LORD that
is valid for *dependent* p-values. Similarly to LORD 3, the adjusted
significance thresholds are set equal to \[ \alpha_i = \xi_i W(\tau_i)\] where
(assuming $w_0 \leq b_0$),
$\sum_{j=1}^{\infty} \xi_i (1 + \log(j)) \leq \alpha / b_0$
The default sequence of $\xi$ is given by
\[ \xi_j = \frac{C \alpha }{b_0 j \log(\max(j, 2))^3}\]
where $C \approx 0.139307$.
Note that allowing for dependent p-values can lead to a substantial loss in
power compared with the LORD procedures described above.
### Bonferroni-like procedure (BonfInfinite) {#BonfInfinite}
The procedure controls the FDR (as well as the FWER) for a potentially infinite
stream of p-values using a Bonferroni-like test. Given an overall significance
level $\alpha$, the significance thresholds are chosen as
\[\alpha_i = \alpha \gamma_i\]
where $\sum_{i=1}^{\infty} \gamma_i = 1$ and $\gamma_i \geq 0$. Note that the
procedure is also valid for dependent p-values.
The default sequence of $\gamma$ is the same as that for $\xi$ for LORD given
[here](#LORD_gamma).
### Accounting for dependent p-values
As noted above, the LOND procedure and LORD procedures (1, 2 and 3) assumes
*independent* p-values, with adjusted versions of LOND and LORD available for
*dependent* p-values (where the dependencies may be arbitrary). The
Bonferroni-like procedure also controls the FDR for dependent p-values.
By way of comparison, the usual Benjamini and Hochberg method for controlling
the FDR assumes that the p-values are positively dependent (for example, for
multivariate normal test statistics with a positive correlation matrix).
See Benjamini and Yekutieli (2001) for further technical details.
---
## Variations to the default options
In the following section, we consider the arguments that a typical user might
consider amending for their analysis.
### Common arguments
As a default, the `alpha` argument is set to 0.05, where `alpha` sets the
overall significance level of the FDR procedure. As a community the standard
significance level utilised is the 5%. However, there are applications where
an alternate threshold could be considered. For example, a more stringent
threshold might be appropriate when there are limited resources to follow up
significant findings. A less stringent threshold might be appropriate when the
downstream analysis is a global analysis which can tolerate a higher proportion
of false positives.
To ensure correct interpretation of the dates provided there is a date.format
argument. As a default, the date format is set to receive dates as
year-month(00-12)-day(number). The following website provides clear guidance
on symbols used to interpret the date information:
https://www.statmethods.net/input/dates.html
As a default, the `random` argument is set to `TRUE`. In this situation, the
order of p-values in each batch (i.e. with the same date) are randomised. This
is to avoid the risk of p-values being ordered post-hoc, which can lead to an
inflation of the FDR. As the dataset grows the data is reprocessed. To ensure
the consistency of the output (with the randomisation within the previous
batches remaining the same), it is necessary to set the same `seed`
for all analyses.
The user also has the option to turn off the randomisation step, by setting the
`random` argument to `FALSE`. This approach would be appropriate if the user
has both a date *and* a time stamp for the p-values, in which case the data
should be ordered by date and time beforehand and then passed to a wrapper
function. Another scenario would be when p-values within the batches are
ordered using *independent* side information, so that hypotheses most likely to
be rejected come first, which would potentially increase the power of the
procedure (see Javanmard and Montanari (2018) and Li and Barber (2017)).
### LOND
As a default, the `dep` argument is set to `FALSE`. Alternatively, this could
be set to `TRUE` and will implement the LOND procedure to guarantee FDR control
for dependent p-values. This method will in general be more conservative.
```{r}
set.seed(1); results.indep <- LOND(sample.df) # for independent p-values
set.seed(1); results.dep <- LOND(sample.df, dep=TRUE) # for dependent p-values
# compare adjusted significance thresholds
cbind(independent = results.indep$alphai, dependent = results.dep$alphai)
```
The vector `beta` is supplied by default, but can optionally be specified by the
user (as described above, see the formula for $\beta_j$ [here](#LOND_beta)).
### LORD
The default version of LORD used is version 3, but the user can optionally
specify versions 1, 2 and ++ using the `version` argument (see [here](#LORD) for
further details about the different versions).
```{r}
set.seed(1); results.LORD1 <- LORD(sample.df, version=1)
set.seed(1); results.LORD2 <- LORD(sample.df, version=2)
set.seed(1); results.LORD.plus <- LORD(sample.df, version='++')
set.seed(1); results.LORD3 <- LORD(sample.df) # default version = 3
# compare adjusted significance thresholds
cbind(LORD1 = results.LORD1$alphai,
LORD2 = results.LORD2$alphai,
LORD.plus = results.LORD.plus$alphai,
LORD3 = results.LORD3$alphai)
```
By default `w0 = alpha/10` and (for LORD 1/2/3) `b0 = alpha - w0`, but these
parameters can optionally be specified by the user subject to the requirements
that $0 \leq w_0 \leq \alpha$, $b_0 > 0$ and $w_0+b_0 \leq \alpha$.
The value of `gammai` is also supplied by default, but can optionally be
specified by the user (as described above, see the formula for $\gamma_j$
[here](#LORD_gamma)).
### LORDdep
By default `w0 = alpha/10` and `b0 = alpha - w0`, but can optionally be
specified by the user subject to the requirements that $w_0 \geq 0$,
$b_0 > 0$ and $w_0+b_0 \leq \alpha$.
The value of `xi` is also supplied by default, but can optionally be
specified by the user (as described above, see the formula for
$\xi_j$ [here](#LORDdep_xi)).
### BonfInfinite
The significance thresholds `alphai` are supplied by default, but can optionally
be specified by the user (as described above, see [here](#BonfInfinite)).
---
## Acknowledgements
We would like to thank the IMPC team (via Jeremy Mason and Hamed Haseli) for
useful discussions during the development of the package.
---
## References
Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate
in multiple testing under dependency. *The Annals of Statistics*,
29(4):1165-1188.
Bourgon, R., Gentleman, R., and Huber, W. (2010). Independent filtering
increases detection power for high-throughput experiments.
*Proceedings of the National Academy of Sciences*, 107(21), 9546-9551.
Ioannidis, J.P.A. (2005). Why most published research findings are false.
*PLoS Medicine*, 2.8:e124.
Javanmard, A., and Montanari, A. (2015). On Online Control of False
Discovery Rate. *arXiv preprint*, https://arxiv.org/abs/1502.06197.
Javanmard, A., and Montanari, A. (2018). Online Rules for Control of False
Discovery Rate and False Discovery Exceedance. *Annals of Statistics*,
46(2):526-554.
Koscielny, G., *et al*. (2013). The International Mouse Phenotyping Consortium
Web Portal, a unified point of access for knockout mice and related phenotyping
data. *Nucleic Acids Research*, 42.D1:D802-D809.
Li, A., and Barber, F.G. (2017). Accumulation Tests for FDR Control in Ordered
Hypothesis Testing. *Journal of the American Statistical Association*,
112(518):837-849.
Ramdas, A. et al. (2017). Online control of the false discovery rate with
decaying memory. *Advances in Neural Information Processing Systems 30*,
5650-5659.
Robertson, D.S. and Wason, J.M.S. (2018). Online control of the false discovery
rate in biomedical research. *arXiv preprint*, https://arxiv.org/abs/1809.07292.