---
title: "Identify reproducible genomic peaks from replicate ChIP-seq experiments"
author: "Konstantin Krismer"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: bibliography.bib
vignette: >
  %\VignetteIndexEntry{Identify reproducible genomic peaks from replicate ChIP-seq experiments}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

IDR2D is an extension of the original method IDR [@li2011], which was intended
for ChIP-seq peaks (or one-dimensional genomic data). This package applies the
method to two-dimensional genomic data, such as interactions between two
genomic loci (also called anchors). Genomic interaction data is generated by
genome-wide methods such as Hi-C [@pmid20461051], ChIA-PET [@pmid19247990], 
and HiChIP [@pmid25128017].

```{r setup, include=FALSE}
knitr::opts_chunk$set(fig.width = 7, fig.height = 7, echo = FALSE,
                      warning = FALSE, message = FALSE, dev = 'png',
                      out.extra = 'style="border-width: 0;"')
```

# Input data

Load example data:
```{r, echo = TRUE}
rep1_df <- idr2d:::chipseq$rep1_df
rep2_df <- idr2d:::chipseq$rep2_df
```

## Example data - replicate 1

```{r}
library(DT)
header <- htmltools::withTags(table(
  class = 'display',
  thead(
    tr(
      th("chromosome"),
      th("start coordinate"),
      th("end coordinate"),
      th("score")
    )
  )
))
datatable(rep1_df[seq_len(min(nrow(rep1_df), 1000)), ],
          container = header, rownames = FALSE,
          options = list(searching = FALSE)) %>%
  formatRound("value", 3)
```

Only the first 1000 peaks are shown.

## Example data - replicate 2

```{r}
datatable(rep2_df[seq_len(min(nrow(rep2_df), 1000)), ], container = header,
          rownames = FALSE, options = list(searching = FALSE)) %>%
  formatRound("value", 3)
```

Only the first 1000 peaks are shown.

# Analysis

Load the package:
```{r, echo = TRUE}
library(idr2d)
```

Estimate IDR:
```{r, echo = TRUE}
idr_results <- estimate_idr1d(rep1_df, rep2_df, 
                              value_transformation = "log")
rep1_idr_df <- idr_results$rep1_df
```

Important to note here is that the appropriate value transformation depends
on the semantics of the *value* column (always the seventh column) in `rep1_df`
and `rep2_df`. This column is used to establish a ranking between interactions,
with highly significant interactions on top of the list and least significant
interactions (i.e., most likely noise) at the bottom of the list. The ranking
is established by the *value* column, sorted in descending order. Since our 
*value* column contains FDRs (the lower, the more significant), we need to
transform the values to comply with the assumption that high values indicate
high significance. For p-values and p-value derived measures (like Q values),
the `log_additive_inverse` transformation (`-log(x)`) is recommended.

## Results

```{r}
chr <- start <- end <- rank <- rep_rank <- value <- rep_value <- idr <- NULL
header <- htmltools::withTags(table(
  class = 'display',
  thead(
    tr(
      th("chr."),
      th("start coordinate"),
      th("end coordinate"),
      th("rank in R1"),
      th("rank in R2"),
      th("transformed value in R1"),
      th("transformed value in R2"),
      th("IDR")
    )
  )
))

df <- dplyr::select(rep1_idr_df, chr, start, end,
                    rank, rep_rank, value, rep_value, idr)
datatable(df[seq_len(min(nrow(df), 1000)), ], 
          rownames = FALSE,
          options = list(searching = FALSE),
          container = header) %>%
  formatRound(c("value", "rep_value", "idr"), 3)
```

Only the first 1000 observations are shown.

### Summary

```{r, echo = TRUE}
summary(idr_results)
```

### Distribution of IDRs

```{r, echo = TRUE}
draw_idr_distribution_histogram(rep1_idr_df)
```

### Rank - IDR dependence

```{r, echo = TRUE}
draw_rank_idr_scatterplot(rep1_idr_df)
```

### Value - IDR dependence

```{r, echo = TRUE}
draw_value_idr_scatterplot(rep1_idr_df)
```

# Additional information

Most of the functionality of the IDR2D package is also offered through 
the website at https://idr2d.mit.edu.

For a more detailed discussion on IDR2D, please have a look at 
the IDR2D paper:

**IDR2D identifies reproducible genomic interactions**  
Konstantin Krismer, Yuchun Guo, and David K. Gifford  
Nucleic Acids Research, Volume 48, Issue 6, 06 April 2020, Page e31; DOI: https://doi.org/10.1093/nar/gkaa030

# References