---
title: "An overview of topconfects"
author: "Paul Harrison"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
    %\VignetteIndexEntry{An overview of topconfects}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
params:
    cache: !r FALSE
---

TOP results by CONfident efFECT Size. Topconfects is an R package intended for
RNA-seq or microarray Differntial Expression analysis and similar, where we are
interested in placing confidence bounds on many effect sizes---one per
gene---from few samples, and ranking genes by these confident effect sizes.

Topconfects builds on
[TREAT](http://bioinformatics.oxfordjournals.org/content/25/6/765.long) p-values
offered by the limma and edgeR packages, or the "greaterAbs" test p-values
offered by DESeq2. It tries a range of fold changes, and uses this to rank genes
by effect size while maintaining a given FDR. This also produces confidence
bounds on the fold changes, with adjustment for multiple testing.

* **A principled way to avoid using p-values as a proxy for effect size.** The
difference between a p-value of 1e-6 and 1e-9 has no practical meaning in terms
of significance, however tiny p-values are often used as a proxy for effect
size. This is a misuse, as they might simply reflect greater quality of evidence
(for example RNA-seq average read count or microarray average spot intensity).
It is better to reject a broader set of hypotheses, while maintaining a sensible
significance level.

* **No need to guess the best fold change cutoff.** TREAT requires a fold change
cutoff to be specified. Topconfects instead asks you specify a False Discovery
Rate appropriate to your purpose. You can then read down the resulting ranked
list of genes as far as you wish. The "confect" value given in the last row that
you use is the fold change cutoff required for TREAT to produce that set of
genes at the given FDR.

The method is described in:

[Harrison PF, Pattison AD, Powell DR, Beilharz TH. 2018. Topconfects: a package
for confident effect sizes in differential expression analysis provides improved
usability ranking genes of interest. bioRxiv.
doi:10.1101/343145](https://www.biorxiv.org/content/early/2018/06/11/343145)

## If you want to find top confident differentially expressed genes

Use `limma_confects`, `edger_confects`, or `deseq2_confects` as an alternative
final step in your limma, edgeR, or DESeq2 analysis. The limma method is
currently much faster than other methods.

For examples, see the vignette "Confident fold change".

## If you have a collection of effect sizes with standard errors

If you have a collection of effect sizes of some sort, with associated standard
errors, and possibly associated degrees of freedom, use `normal_confects`.
Errors are assumed to be normally distributed, or t-distributed if degrees of
freedom are given.

This is a re-implementation of limma's TREAT method, which is then supplied to
`nest_confects` (described next). (Alternatively, if the effect sizes are all
positive, there is an option to use a one-sided t-test as the underlying
hypothesis test.)

## If you can calculate p-values for a collection of interval hypotheses

The core algorithm of `topconfects` is implemented in the function
`nest_confects`. You may supply any function that can calculate p-values for the
null hypothesis that an effect size is no more than a specified amount. Testing
is performed for n items, and the function should be able to perform this
calculation for a subset of these n items and a given amount.

## Visualizing results

Use `confects_plot` to plot confident effect sizes of top genes. The estimated
effect size (eg log fold change) is shown as a dot, and the confidence bound is
shown as a line.

Use `confects_plot_me2` to gain a global overview. Similar to an MD or MA plot,
the x axis is average expression and the y axis is estimated effect size. The 
confident effect size is shown using colors. Effect sizes confidently "> 0"
correspond to traditional differential expression testing, and effect sizes
confidently greater than larger values correspond to the TREAT test at that
threshold.

There is also an older `confects_plot_me`, which I no longer recommend as it is
hard to explain and easily misleading. The y axis is effect size. Estimated effect
sizes are shown in grey and confident effect sizes in red or blue (ie a gene with a
non-NA confident effect size is shown with both a grey and a colored dot).

Use `rank_rank_plot` to compare two rankings.

For examples, see the vignette "Confident fold change".