%\VignetteIndexEntry{RRHO}
\documentclass[11pt]{article}
\usepackage[utf8]{inputenc}
\usepackage{natbib}
\usepackage{amsthm}

\newtheorem{remark}{Remark}[section]

\title{A Practitioner's Guide to Multiple Testing Error Rates}
\author{Jonathan Rosenblatt \\
        Department of Statistics and Operations Research,\\
        The Sackler Faculty of Exact Sciences, \\
        Tel Aviv University \\
        Israel}


\begin{document}
\SweaveOpts{concordance=TRUE}
\maketitle


\section{Introduction}

Consider the problem of comparing two lists of same length. 
In the following, we will assume these lists consist of genes.
A possible way of comparison would be to test the hypothesis that 
the lists' ordering is arbitrary, versus some type of agreement.
This is the purpose of this package, based on the 
work of~\citet{plaisier_rankrank_2010}.

The proposed approach is to count the number of common genes 
in the first $i$ and $j$ elements of the first and second list respectively. 
As the count of common elements could be driven by chance, 
the significance of the observed count is computed assuming 
the hypothesis of completely random list orderings. 
As this is performed for all $i$ and $j$, multiplicity kicks in.

The package offers both FWER control
\footnote{ For a general introduction to multiple testing error rates, 
see~\citep{rosenblatt_practitioners_2013}.} using permutation testing 
and FDR control using the B-Y procedure~\citep{benjamini_control_2001} 
as proposed in the original work by~\citet{plaisier_rankrank_2010}.


\begin{remark} FDR or FWER?

~\citet{plaisier_rankrank_2010} recommend the control of the FDR over the different $i$s and $j$s. 
Each $i,j$ combination tests the null hypothesis of 
``arbitrary rankings of the two lists'', 
versus an alternative of 
``non-arbitrary ranking in the first $i$ and $j$ elements 
of the first and second list respectively''.
FDR control is thus appropriate if concerned with the number of false $i,j$ statements made. 

If only concerned with the existence of any non-arbitrariness, 
without claiming at which part of the lists it resides, 
than FWER control is more appropriate. 
\end{remark}

\section{Workflow}
We start with a sketch of the workflow. The details follow.

\begin{itemize}
\item
Compute the significance of the gene overlap for all
$i$ and $j$ first elements of the two lists.
\item
Correct the nominal significance levels for the multiple $i$s and $j$s.
\item
Report finding using the exported significance matrices and 
accompanying Venn diagrams. 
\end{itemize}


<<fig=FALSE>>=
library(RRHO)
# Create "gene" lists:
  list.length <- 100
	list.names <- paste('Gene',1:list.length, sep='')
	gene.list1<- data.frame(list.names, sample(100))
	gene.list2<- data.frame(list.names, sample(100))
# Compute overlap and significance
	RRHO.example <-  RRHO(gene.list1, gene.list2, BY=TRUE)
@

<<fig=TRUE>>=
# Examine Nominal (-log) pvalues
  image(RRHO.example$hypermat)
# Note: If lattice is available try: levelplot(RRHO.example$hypermat)
@

<<>>=
pval.testing <- pvalRRHO(RRHO.example,50)
# FWER corrected pvalue:
pval.testing$pval
# The sampling distribution of the minimum of the (-log) nominal p-values:
xs<- seq(0,10,length=100)
plot(Vectorize(pval.testing$FUN.ecdf)(xs)~xs, xlab='-log(pvalue)', ylab='ECDF', type='S')
@


<<fig=TRUE>>=
# Examine B-Y corrected pvalues 
# Note: probably nothing will be rejected in this example as the data is generated from the null.
  image(RRHO.example$hypermat.by)
@


\bibliography{Project-RRHO}
\bibliographystyle{plainnat}

\end{document}