---
title: "GrafGen: Classifying Subpopulations of H. pylori Genomes"
author: "William Wheeler, Difei Wang, Isaac Zhao, Yumi Jin and Charles Rabkin"
output:
    rmarkdown::html_document
vignette: >
    %\VignetteIndexEntry{GrafGen: Classifying H. pylori genomes}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

# Introduction
The GrafGen package is for classifying Helicobacter pylori genomes
according to genetic distance from nine reference populations as
defined by equation 2 in Jin (2019).
The main function is this package is `grafGen()` which requires a
file of genotypes that can be either a PLINK bed file or a VCF file.

# Installing the GrafGen package from Bioconductor
```{r, eval=FALSE}
    if (!requireNamespace("BiocManager", quietly = TRUE)) 
        install.packages("BiocManager") 
    BiocManager::install("GrafGen") 
```

# Loading the package
Before using the GrafGen package, it must be loaded into an R session.
```{r}
library(GrafGen)
```

# Example data
The GrafGen package includes example data which is a subset of the
reference data that was used to train the model. 
The data is stored in the extdata folder.
```{r}
    dir <- system.file("extdata", package="GrafGen", mustWork=TRUE)
    geno.file <- paste0(dir, .Platform$file.sep, "data.vcf.gz")
    print(geno.file)
```

# Running grafGen() 
The `grafGen()` function returns a list of class "grafpop"
with two objects: `table` and 
`vertex`. The object `table` is a data frame containing hypothetical
ancestry percents (F_percent, E_percent and A_percent) based on known 
African, European and Asian samples, respectively, normalized 
genetic distance scores (GD1_x, GD2_y, GD3_z), 
the predicted reference population (Refpop), nearest neighboring
reference population, percent separation as defined in the user manual
and the genetic distances to each reference populations 
(hpgpAfrica, hpgpAfrica-distant, hpgpAfroamerica,
hpgpEuroamerica, hpgpMediterranea, hpgpEurope, hpgpEurasia,
hpgpAsia,  and hpgpAklavik86-like).  
The object `vertex` is a list containing the (fixed)
x-y coordinates of the African, European and Asian vertex population 
centroids.
```{r}
ret <- grafGen(geno.file, print=0)
ret$table[seq_len(5), ]
```

Printing the return object from `grafGen()` will display 
a table of frequency 
counts for the predicted reference populations for the user input data.
```{r}
print(ret)
```

Plotting the return object will display a plot of the genetic distance
scores (GD1_x vs GD2_y) for the user input data and the reference data.
Additional plots can be obtained by calling the `grafGenPlot()` function.
```{r, crop=NULL}
plot(ret)
```

# Interactive plots
The functions `interactiveReferencePlot` and `interactivePlot` create 
interactive plots
for the reference data and user input data respectively. 
A call to `interactiveReferencePlot` will all show the results of all samples 
in the reference data.
Hovering over a point in the plot will display three lines of information. 
Line 1 contains the type and id of that sample.
Line 2 contains the sample's reference population, 
next nearest reference population, 
and separation percent to the next nearest reference population as
defined in the user manual.
Line 3 contains the percent African, European and Asian ancestry for 
that sample.
The legend shows the types (which are the source countries in
`interactiveReferencePlot`) for all samples, and clicking the name of a
type will add or remove those samples from the plot.
```{r}
if (interactive()) interactiveReferencePlot()
```

# R shiny app
The `GrafGen` package also includes an R shiny app to view and filter 
the plot using up to two variables.
The function `createApp` returns a list containing the app and data
objects needed with the app. The app then can be launched with the
`runApp` function.
```{r}
tmp <- createApp(ret)
if (interactive()) {
    reference_results <- tmp$reference_results
    user_results      <- tmp$user_results
    user_metadata     <- tmp$user_metadata
    shiny::runApp(tmp$app)
}
```

# Session Information
```{r}
sessionInfo()
```