---
title: "flowPloidy: Determining Genome Size and Ploidy from Flow Cytometry Histograms in R"
author: \includegraphics[width=1\textwidth]{ajbFigs/flowShiny01Crop.png}\newline{}Tyler Smith$^1$, Paul Kron$^2$ and Sara Martin$^1$
institute: $^1$Agriculture and Agri-Food Canada, $^2$University of Guelph
output:
    beamer_presentation:
        template: pandoc/ajb.beamer
vignette: >
  %\VignetteIndexEntry{flowPloidy: BSA 2017}
  %\VignetteEngine{knitr::rmarkdown}
theme: "metropolis"
themeoptions: [numbering=none]
---

```{r setup, echo = FALSE}
library(knitr)
library(flowPloidy)
library(flowPloidyData)
```

# Motivation

Evolutionary biologists working with polyploid taxa

Population screening:

- large sample sizes
- poor tissue quality

Genome size assessment:

- repeat measurements of fresh/greenhouse tissue
- high precision required

Need **high-throughput**, **high-precision** genome size estimates

# Histogram Analysis
## Tissue preparation

Chop and stain tissue

- solution contains 1000s of individual nuclei (and debris)

## Flow Cytometer

Measure the fluorescence properties of each nuclei:
    
- size (forward scatter)
- DNA content (fluorescence)
- granularity (side scatter)

## Histogram Construction

- aggregate data into bins (256, 512, 1024)

# Ideal Histogram

\center

```{r ideal histogram, echo = FALSE}
plot(x = 1, type = 'n', xlim = c(0, 256), ylim = c(0, 500),
     ylab = "Nuclei", xlab = "Fluorescence", main = "")
abline(h = 0)
lines(x = c(80, 80), y = c(0, 325), lwd = 3)
lines(x = c(160, 160), y = c(0, 150), lwd = 3)
tmpfun <- function(x) 100 * exp(-x/80)
curve(tmpfun, 80, 160, add = TRUE, lwd = 3)
text(x = 80, y = 325, "G1 Peak", pos = 3, cex = 2)
text(x = 160, y = 150, "G2 Peak", pos = 3, cex = 2)
text(x = 120, y = 30, "S Phase", pos = 3, cex = 2)
```

# Histogram Components

**G1 peak** Gap 1 diploid cells

**G2 peak** Gap 2 tetraploid cells (pre-division)

**S Phase** Synthesis cells (actively duplicating DNA)

\transparent{0}

**Debris** Damaged nuclei, cell components, contaminants

**Aggregates** clusters of two or more nuclei stuck together

**NOISE** Measurement error, capriciousness of life

\transparent{0.4}
(**Co-Chopped Standard Peak** ignored for now)

# Empirical Histogram

```{r real histogram, echo = FALSE}
set.seed(1234)
plot(x = 1, type = 'n', xlim = c(0, 256), ylim = c(0, 500),
     ylab = "Nuclei", xlab = "Fluorescence")
abline(h = 0)

vals <- rep(0, 256)
vals <- jitter(3500 * dnorm(0:256, mean = 80, sd = 80 * 0.05), amount = 20)
vals <- vals + jitter(200 * exp ((0:-256)/60), amount = 5)
vals <- vals +
  jitter(2500 * dnorm(0:256, mean = 160, sd = 160 * 0.05), amount = 20)
vals[80:160] <- vals[80:160] + jitter(100 * exp((-80:-160)/80),
                                      amount = 5)
vals[vals < 0] <- 0
points(vals, type = 'l')

text(x = 80, y = 450, "G1 Peak", pos = 3, cex = 2)
text(x = 160, y = 200, "G2 Peak", pos = 3, cex = 2)
text(x = 120, y = 100, "S Phase", pos = 3, cex = 2)
text(x = 40, y = 170, "Debris", pos = 3, cex = 2)
text(x = 240, y = 40, "Aggregates", pos = 3, cex = 2)
```

# Histogram Components

\transparent{0.4}

**G1 peak** Gap 1 diploid cells

**G2 peak** Gap 2 tetraploid cells (pre-division)

**S Phase** Synthesis cells (actively duplicating DNA)

\transparent{1}

**Debris** Damaged nuclei, cell components, contaminants

**Aggregates** clusters of two or more nuclei stuck together

**NOISE** Measurement error, capriciousness of life

\transparent{0.4}

(**Co-Chopped Standard Peak** ignored for now)

# Manual Histogram Analysis

```{r manual analysis, echo = FALSE}
set.seed(1234)
plot(x = 1, type = 'n', xlim = c(0, 256), ylim = c(0, 500),
     ylab = "Nuclei", xlab = "Fluorescence",
     main = "")
abline(h = 0)

vals <- rep(0, 256)
vals <- jitter(3500 * dnorm(0:256, mean = 80, sd = 80 * 0.05), amount = 20)
vals <- vals + jitter(200 * exp ((0:-256)/60), amount = 5)
vals <- vals +
  jitter(2500 * dnorm(0:256, mean = 160, sd = 160 * 0.05), amount = 20)
vals[80:160] <- vals[80:160] + jitter(100 * exp((-80:-160)/80),
                                      amount = 5)
vals[vals < 0] <- 0
points(vals, type = 'l')
rect(70, 0, 95, 475, border = 2, lwd = 3)
```

# Manual Histogram Analysis

## Advantages

- Intuitive
- Several Programs Available
- Can be done ad-hoc in R

## Disadvantages

- Subjective
    - CV estimate depends on user
- Doesn't account for overlapping components
    - G1 cell count estimate inflated by debris and S-phase

# Overlapping Histogram Components

```{r individual components, echo = FALSE}
set.seed(1234)
plot(x = 1, type = 'n', xlim = c(0, 256), ylim = c(0, 500),
     ylab = "Nuclei", xlab = "Fluorescence",
     main = "")
abline(h = 0)

vals <- rep(0, 257)
sphase <- vals
G1 <- jitter(3500 * dnorm(0:256, mean = 80, sd = 80 * 0.05), amount = 20)
G1[G1<0] <- 0
vals <- G1
debris <- jitter(200 * exp ((0:-256)/60), amount = 5)
debris[debris<0] <- 0
vals <- vals + debris
G2 <- jitter(2500 * dnorm(0:256, mean = 160, sd = 160 * 0.05), amount = 20)
G2[G2<0] <- 0
vals <- vals + G2
sphase[80:160] <- jitter(100 * exp((-80:-160)/80), amount = 5)
sphase[sphase<0] <- 0
vals <- vals + sphase
vals[vals < 0] <- 0
points(vals, type = 'l')
G1[c(1:70, 90:256)] <- 0
points(debris, type = 'l', col = 3, lwd = 2)
points(G1, type = 'l', col = 4, lwd = 3)
sphase[c(1:70, 170:256)] <- 0
points(sphase, type = 'l', col = 2, lwd = 2)
rect(70, 0, 95, 475, border = 2, lwd = 3)
text(x = 25, y = 50, col = 3, "Debris", cex = 2)
text(x = 120, y = 350, col = 4, "Actual G1", cex = 2)
text(x = 120, y = 120, col = 2, "S-Phase", cex = 2)
text(x = 35, y = 450, col = 2, "G1 Estimate", cex = 2)
```

# Non-linear Regression Histogram Analysis

Model histogram components using mathematical functions:

- G1 and G2 peaks fit as Normal curves
- Debris and aggregates fit using theoretical models

## Advantages
- objective
- estimates taken directly from the data

## Disadvantages
- availability (few programs, expensive licenses)
- conceptually complex

## Source

\footnotesize

Bagwell, C. B. (1993). Chapt. 3 _In_ K. D. Bauer et al., Clinical flow
cytometry: principles and applications. Williams \& Wilkins.

# Non-linear Regression Histogram Analysis

```{r histogram modeling, echo = FALSE}
set.seed(1234)
plot(x = 1, type = 'n', xlim = c(0, 256), ylim = c(0, 500),
     ylab = "Nuclei", xlab = "Fluorescence",
     main = "")
abline(h = 0)

vals <- rep(0, 257)
sphase <- vals
G1 <- jitter(3500 * dnorm(0:256, mean = 80, sd = 80 * 0.05), amount = 20)
G1[G1<0] <- 0
vals <- G1
debris <- jitter(200 * exp ((0:-256)/60), amount = 5)
debris[debris<0] <- 0
vals <- vals + debris
G2 <- jitter(2500 * dnorm(0:256, mean = 160, sd = 160 * 0.05), amount = 20)
G2[G2<0] <- 0
vals <- vals + G2
sphase[80:160] <- jitter(100 * exp((-80:-160)/80), amount = 5)
sphase[sphase<0] <- 0
vals <- vals + sphase
sphase[80:160] <- 100 * exp((-80:-160)/80)
vals[vals < 0] <- 0
points(vals, type = 'l')
points(200 * exp ((0:-256)/60), type = 'l', col = 3, lwd = 2)
points(sphase, type = 'l', col = 2, lwd = 2)
points(2500 * dnorm(0:256, mean = 160, sd = 160 * 0.05), type = 'l',
       lwd = 2, col = 5)
points(3500 * dnorm(0:256, mean = 80, sd = 80 * 0.05), type = 'l',
       col = 4, lwd = 2)
text(x = 25, y = 50, col = 3, "Debris", cex = 2)
text(x = 95, y = 350, col = 4, "G1", cex = 2)
text(x = 120, y = 120, col = 2, "S-Phase", cex = 2)
text(x = 180, y = 100, col = 5, "G2", cex = 2)
```

# Co-Chopped Standard

```{r histogram standard, echo = FALSE}
set.seed(1234)
plot(x = 1, type = 'n', xlim = c(0, 256), ylim = c(0, 500),
     ylab = "Nuclei", xlab = "Fluorescence",
     main = "")
abline(h = 0)

vals <- rep(0, 257)
sphase <- vals
G1 <- jitter(3500 * dnorm(0:256, mean = 80, sd = 80 * 0.05), amount = 20)
G1[G1<0] <- 0
vals <- G1
debris <- jitter(200 * exp ((0:-256)/60), amount = 5)
debris[debris<0] <- 0
vals <- vals + debris
G2 <- jitter(2500 * dnorm(0:256, mean = 160, sd = 160 * 0.05), amount = 20)
G2[G2<0] <- 0
vals <- vals + G2
sphase[80:160] <- jitter(100 * exp((-80:-160)/80), amount = 5)
sphase[sphase<0] <- 0
standard <- jitter(1500 * dnorm(0:256, mean = 130, sd = 130 * 0.01),
                   amount = 10)
standard[standard<0] <- 0
vals <- vals + sphase
vals <- vals + standard
sphase[80:160] <- 100 * exp((-80:-160)/80)
vals[vals < 0] <- 0
points(vals, type = 'l')
text(x = 160, y = 450, "Standard", cex = 2)
text(x = 95, y = 400, "G1", cex = 2)
```

# flowPloidy - Motivation

## Issues with ModFit

- Cost and accessibility

- Functionality (too much and too little)

## flowPloidy Goals

- Streamline our workflow, integrate with R

- Increase our understanding of histogram analysis

- Platform for developing best practices

# Free Software

## Open access for everyone in our lab and yours

\vspace{1em}

## You can learn from my work

\vspace{1em}

## You can improve and extend my work

# Bioconductor

Builds on existing tools for flow cytometry in R

## Installation is simple:

```{r biocLite1, eval=FALSE}
source("https://bioconductor.org/biocLite.R")
biocLite("flowPloidy")
biocLite("flowPloidyData")              # for examples
```

# Loading Data

```{r batchFlowHist, message = 1:5, collapse = TRUE, cache = TRUE, message = FALSE}
library(flowPloidy)
library(flowPloidyData)                 # for examples
batch1 <- batchFlowHist(flowPloidyFiles,
                        channel = "FL3.INT.LIN")
```

```{r browseFlowHist, eval = FALSE}
batch1 <- browseFlowHist(batch1) 
```

# Reviewing Histograms

![](gsFigs/gs_browse1.png)\ 

# Reviewing Histograms

![](ajbFigs/gs_browse1_crop.png)\ 

# Reviewing Histograms

\center

![](ajbFigs/gs_browse1_cropb.png)\ 

# Correcting Histograms

![](ajbFigs/gs_browse2_crop.png)\ 

# Correcting Histograms

![](ajbFigs/gs_browse2c_crop.png)\ 

# Changing Model Components: Debris

![](ajbFigs/gs_browse3b_crop.png)\ 

# Changing Model Components: Debris

\center

![](ajbFigs/gs_browse1_cropb.png)\ 

# Changing Model Components: Debris

![](ajbFigs/gs_browse3b_crop.png)\ 

![](ajbFigs/gs_browse3c_crop.png)\ 

# Changing Model Components: Debris

## One size doesn't fit all

Depends on:

- species
- preparation method
- individual sample quality

## Response

Make switching components quick and easy

Impact on parameter estimates is usually small

RCS provides an objective basis for choice

# Local Minima

There is no direct solution for non-linear regression

Sometimes the algorithm gets stuck in a local minima:

![](ajbFigs/gs_browse11b_crop.png)\ 

# Local Minima

![](ajbFigs/gs_browse11b_crop.png)\ 

![](ajbFigs/gs_browse11c_crop.png)\ 

# Exporting Results

Save to file or use directly in R:

```{r tabulateFlowHist, eval = FALSE}
tabulateFlowHist(batch1)
```

```{r tabulateFlowHist table, echo = FALSE, results = "as-is"}
kable(tabulateFlowHist(batch1)[c(1, 4, 5),
                               c("countsA", "sizeA", "cvA", "AB")],
      digits = 3)
```

# Gating

## Flow data is not always pretty

![](gsFigs/vac1.png)\ 

# Gating

![](gsFigs/vac1.png)\ 

![](gsFigs/gate_ui.png)\ 

# Gating

\center

![](gsFigs/gs_gate.png)\ 

# Gating

\center

![](gsFigs/gs_gate_zoom2.png)\ 

# Gating

\center

![](gsFigs/draw_gate.png)\ 

# Gating

\center

![](gsFigs/gated_analysis.png)\ 

# `flowPloidy` Summary

Compared to modFit:

- parameter estimates within 1%
- simpler interface
- integration with R
- cost

Compared to manual analysis:

- objective and repeatable

Gating:

- less sensitive to subjective gating decisions


# `flowPloidy` Future Development

Gating

- better define or automate best practice
- more sophisticated options for pulse analysis
- impact of gating on theoretical model components

# Thank You!

## `tyler@plantarum.ca`

## `plantarum.github.io/flowPloidy`
![](ajbFigs/webpage.png)\