---
title: "Step 3: Post-Processing"
author: "Tyler J Burns"
date: "October 2, 2017"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Final Post-Processing Steps for Scone}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, 
                      results = "markup", 
                      message = FALSE, 
                      warning = FALSE)
knitr::opts_chunk$set(fig.width=6, fig.height=4) 
```

### The post-processing function:  
This vignette covers what takes place following the generation of SCONE output 
detailed in TheSconeWorkflow.Rmd. The obvious step that needs to take place is 
the Scone generated columns being merged into the original input data. The user 
gets the option of log base 10 transforming q values, which is easier to 
visualize. The user also gets the option to run t-SNE on the data, such that 
said maps can be colored by SCONE generated values. In this case, t-SNE is run 
utilizing the Rtsne package, using the same markers that were used as input for 
the KNN. generation. 

```{r}
library(Sconify)
wand.final <- PostProcessing(scone.output = wand.scone,
                         cell.data = wand.combined,
                         input = input.markers)


wand.combined # input data
wand.scone # scone-generated data
wand.final # the data after post-processing

# tSNE map shows highly responsive population of interest
TsneVis(wand.final, 
        "pSTAT5(Nd150)Di.IL7.change", 
        "IL7 -> pSTAT5 change")

# tSNE map now colored by q value
TsneVis(wand.final, 
        "pSTAT5(Nd150)Di.IL7.qvalue", 
        "IL7 -> pSTAT5 -log10(qvalue)")

# tSNE map colored by KNN density estimation
TsneVis(wand.final, "density")

```

### Subsampling your data prior to running t-SNE:
If one has a large number of cells in the dataset (>100K), then t-SNE can 
become time-consuming and produce results that are less clean. 
As such, I provide a wrapper that allows one to subsample the final data and 
run t-SNE on the subsampled data, producing a new tibble that contains the 
subsampled data along with two t-SNE dimensions added to it. Note the two added 
dimensions at the end of the tibble are called "bh-SNE11" and "bh-SNE21". This 
is because the dimensions "bh-SNE1" and "bh-SNE2" are already in the data, 
because t-SNE was run during the post processing step in this example. As I 
have stated, a user would realistically use this function with a much larger 
number of cells, in which case the user would have selected "tsne = FALSE" in 
the post.processing function detailed above in this vignette.

```{r}
wand.final.sub <- SubsampleAndTsne(dat = wand.final, 
                                   input = input.markers, 
                                   numcells = 500)
wand.final.sub
```