---
title: "Visualize DAGs"
author: "Zuguang Gu ( z.gu@dkfz.de )"
date: '`r Sys.Date()`'
output:
html_vignette:
css: main.css
toc: true
vignette: >
%\VignetteIndexEntry{07. Visualize DAGs}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo = FALSE}
knitr::knit_hooks$set(pngquant = knitr::hook_pngquant)
knitr::opts_chunk$set(
message = FALSE,
dev = "ragg_png",
fig.align = "center",
pngquant = "--speed=10 --quality=30"
)
```
There are two functions for visualizing DAGs. `dag_graphviz()` uses the **DiagrammeR** package to visualize
small DAGs as HTML widgets. `dag_circular_viz()` uses a circular layout for large DAGs.
## Small DAGs
Let's first create a small DAG.
```{r}
library(simona)
parents = c("a", "a", "b", "b", "c", "d")
children = c("b", "c", "c", "d", "e", "f")
dag_small = create_ontology_DAG(parents, children)
```
```{r}
dag_graphviz(dag_small)
```
The argument `node_param` can be set to a list of graphical parameters.
```{r}
color = 2:7
shape = c("polygon", "box", "oval", "egg", "diamond", "parallelogram")
dag_graphviz(dag_small, node_param = list(color = color, shape = shape))
```
The graphical parameters are not necessary to be a vector. It can be a single
value which affects all nodes, or a named vector that contains a subset of
nodes to be customized.
```{r}
color = c("a" = "red", "d" = "blue")
dag_graphviz(dag_small, node_param = list(color = color))
```
The full set of node-level parameters can be found at: https://graphviz.org/docs/nodes/. They can all
be set in the same format as `color` demonstrated above.
The argument `edge_param` can be set to a list of graphical parameters for
configuring edges. There are two ways to control edge colors. In the following
code, we additionally add the relation types to the DAG.
```{r}
parents = c("a", "a", "b", "b", "c", "d")
children = c("b", "c", "c", "d", "e", "f")
relations = c("is_a", "is_a", "part_of", "part_of", "is_a", "is_a")
dag_small = create_ontology_DAG(parents, children, relations = relations)
```
Now since each edge is associated with a relation type, the color can be set
by a vector with relation types as names:
```{r}
edge_color = c("is_a" = "red", "part_of" = "blue")
dag_graphviz(dag_small, edge_param = list(color = edge_color))
```
To highlight specific edges, the parameter can be set to a named vector where
names directly contain relations.
```{r}
edge_color = c("a -> b" = "red", "c -> e" = "blue")
dag_graphviz(dag_small, edge_param = list(color = edge_color))
```
The direction in the specification does not matter. The following ways are all
the same, but there must be spaces before and after the arrow.
```
"a -> b" = "red"
"a <- b" = "red"
"b -> a" = "red"
"b <- a" = "red"
"a <-> b" = "red"
"a - b" = "red"
```
The full set of edge-level parameters can be found at
https://graphviz.org/docs/edges/. They can all be set in the same format as
`edge_color` demonstrated above.
Internally, `dag_graphviz()` generates the "DOT" code for graphiviz
visualization. The DOT code can be obtained with `dag_as_DOT()`:
```{r, comment = ''}
dag_as_DOT(dag_small, node_param = list(color = color, shape = shape)) |> cat()
```
You can paste the DOT code to http://magjac.com/graphviz-visual-editor/ to generate the diagram.
`dag_graphviz()` is very useful for visualizing a sub-DAG derived from the
global DAG. For example, all upstream terms of a GO term. Recall in the following example, `dag[,
"GO:0010228"]` returns a sub-DAG of all upstream terms of `GO:0010228`.
```{r}
dag = create_ontology_DAG_from_GO_db()
dag_graphviz(dag[, "GO:0010228"],
node_param = list(
fillcolor = c("GO:0010228" = "pink"),
style = c("GO:0010228" = "filled")
),
edge_param = list(
color = c("is_a" = "purple", "part_of" = "darkgreen"),
style = c("is_a" = "solid", "part_of" = "dashed")
), width = 600, height = 600)
```
## Large DAGs
Visualizing large DAGs is not an easy job because a term can have more than
one parents. Here the `dag_circular_viz()` uses a circular layout to visualize
large DAGs.
```{r, fig.width = 9, fig.height = 7}
dag_circular_viz(dag)
```
In the circular layout, each circle correspond to a specific depth (maximal
distance to root). The distance of a circle to the circle center is
proportional to the logorithm of the number of terms with depth equal to or
less than the current depth of this circle. On each circle, each term has a
width (or a sector on the circle) associated where offspring terms are only
drawn within that section. The width is proportional to the number of
leaf terms in the corresponding sub-DAG. Dot size corresponds to the number of child
terms.
By default, the DAG is cut after the root term, and each sub-DAG is assigned
with a different color. Child terms of root is added in the legend in the
plot. If there is a "name" column in the meta data frame, texts in the "name" column
are used as the legend labels, or else term IDs are used.
By default the DAG is split on a certain level controlled by the argument `partition_by_level`.
It can also be controlled by setting the possible number of terms in each sub-DAG.
```{r, fig.width = 9, fig.height = 7}
dag_circular_viz(dag, partition_by_size = 5000)
```
`dag_treelize()` can convert a DAG to a tree where a term only has one parent.
The circular visualization on the reduced tree is as follows:
```{r, fig.width = 10, fig.height = 7}
tree = dag_treelize(dag)
dag_circular_viz(tree)
```
One useful application is to map GO terms of interest (e.g. significant GO terms
from function enrichment analysis) to the DAG. In the following example, `go_tb`
contains GO terms from an enrichment analysis.
```{r, fig.width = 10, fig.height = 7}
go_tb = readRDS(system.file("extdata", "sig_go_tb.rds", package = "simona"))
sig_go_ids = go_tb$ID[go_tb$p.adjust < 0.01]
# make sure `sig_go_ids` all in current GO.db version
sig_go_ids = intersect(sig_go_ids, dag_all_terms(dag))
dag_circular_viz(dag, highlight = sig_go_ids)
```
In the next example, we will map `-log10(p.adjust)` to the node size.
```{r}
p.adjust = go_tb$p.adjust[go_tb$p.adjust < 0.01]
```
`dag_circular_viz()` has a `node_size` argument which allows to set node sizes for terms, thus,
we only need to calculate node sizes by the adjusted p-values.
In the following code, we defined a simple `node_size_fun()` function that
linearly interpolates values to node sizes within [2, 10].
```{r}
node_size_fun = function(x, range = c(2, 10)) {
s = (range[2] - range[1])/(quantile(x, 0.95) - min(x)) * (x - min(x)) + range[1]
s[s > range[2]] = range[2]
s
}
```
We also generate a legend for the node sizes:
```{r}
library(ComplexHeatmap)
lgd = Legend(title = "p.adjust", at = -log10(c(0.01, 0.001, 0.0001)),
labels = c("0.01", "0.001", "0.0001"), type = "points",
size = unit(node_size_fun(-log10(c(0.01, 0.001, 0.0001))), "pt"))
```
Calculate node sizes:
```{r}
node_size = rep(2, dag_n_terms(dag))
names(node_size) = dag_all_terms(dag)
node_size[sig_go_ids] = node_size_fun(-log10(p.adjust))
```
And finally make the circular plot:
```{r, fig.width = 10, fig.height = 7}
dag_circular_viz(dag,
highlight = sig_go_ids,
node_size = node_size,
edge_transparency = 0.92,
other_legends = lgd)
```
## Session Info
```{r}
sessionInfo()
```