---
title: tidyverse and ggplot integration with destiny
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{tidyverse and ggplot integration with destiny}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Interaction with the tidyverse and ggplot2
==========================================
The [tidyverse](https://www.tidyverse.org/), [ggplot2](http://ggplot2.tidyverse.org/), and destiny are a great fit!
```{r}
suppressPackageStartupMessages({
library(destiny)
library(tidyverse)
library(forcats) # not in the default tidyverse loadout
})
```
ggplot has a peculiar method to set default scales: You just have to define certain variables.
```{r}
scale_colour_continuous <- scale_color_viridis_c
```
When working mainly with dimension reductions, I suggest to hide the (useless) ticks:
```{r}
theme_set(theme_gray() + theme(
axis.ticks = element_blank(),
axis.text = element_blank()))
```
Let’s load our dataset
```{r}
data(guo_norm)
```
Of course you could use [tidyr](http://tidyr.tidyverse.org/)::[gather()](https://rdrr.io/cran/tidyr/man/gather.html)
to tidy or transform the data now, but the data is already in the right form for destiny, and [R for Data Science](http://r4ds.had.co.nz/tidy-data.html) is a better resource for it than this vignette. The long form of a single cell `ExpressionSet` would look like:
```{r}
guo_norm %>%
as('data.frame') %>%
gather(Gene, Expression, one_of(featureNames(guo_norm)))
```
But destiny doesn’t use long form data as input, since all single cell data has always a more compact structure of genes×cells, with a certain number of per-sample covariates (The structure of `ExpressionSet`).
```{r}
dm <- DiffusionMap(guo_norm)
```
`names(dm)` shows what names can be used in `dm$`, `as.data.frame(dm)$`, or `ggplot(dm, aes())`:
```{r}
names(dm) # namely: Diffusion Components, Genes, and Covariates
```
Due to the `fortify` method (which here just means `as.data.frame`) being defined on `DiffusionMap` objects, `ggplot` directly accepts `DiffusionMap` objects:
```{r}
ggplot(dm, aes(DC1, DC2, colour = Klf2)) +
geom_point()
```
When you want to use a Diffusion Map in a dplyr pipeline, you need to call `fortify`/`as.data.frame` directly:
```{r}
fortify(dm) %>%
mutate(
EmbryoState = factor(num_cells) %>%
lvls_revalue(paste(levels(.), 'cell state'))
) %>%
ggplot(aes(DC1, DC2, colour = EmbryoState)) +
geom_point()
```
The Diffusion Components of a converted Diffusion Map, similar to the genes in the input `ExpressionSet`, are individual variables instead of two columns in a long-form data frame, but sometimes it can be useful to “tidy” them:
```{r}
fortify(dm) %>%
gather(DC, OtherDC, num_range('DC', 2:5)) %>%
ggplot(aes(DC1, OtherDC, colour = factor(num_cells))) +
geom_point() +
facet_wrap(~ DC)
```
Another tip: To reduce overplotting, use `sample_frac(., 1.0, replace = FALSE)` (the default) in a pipeline.
Adding a constant `alpha` improves this even more, and also helps you see density:
```{r}
fortify(dm) %>%
sample_frac() %>%
ggplot(aes(DC1, DC2, colour = factor(num_cells))) +
geom_point(alpha = .3)
```