--- title: "Visualize DAGs" author: "Zuguang Gu ( z.gu@dkfz.de )" date: '`r Sys.Date()`' output: html_vignette: css: main.css toc: true vignette: > %\VignetteIndexEntry{07. Visualize DAGs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE} knitr::knit_hooks$set(pngquant = knitr::hook_pngquant) knitr::opts_chunk$set( message = FALSE, dev = "ragg_png", fig.align = "center", pngquant = "--speed=10 --quality=30" ) ``` There are two functions for visualizing DAGs. `dag_graphviz()` uses the **DiagrammeR** package to visualize small DAGs as HTML widgets. `dag_circular_viz()` uses a circular layout for large DAGs. ## Small DAGs Let's first create a small DAG. ```{r} library(simona) parents = c("a", "a", "b", "b", "c", "d") children = c("b", "c", "c", "d", "e", "f") dag_small = create_ontology_DAG(parents, children) ``` ```{r} dag_graphviz(dag_small) ``` The argument `node_param` can be set to a list of graphical parameters. ```{r} color = 2:7 shape = c("polygon", "box", "oval", "egg", "diamond", "parallelogram") dag_graphviz(dag_small, node_param = list(color = color, shape = shape)) ``` The graphical parameters are not necessary to be a vector. It can be a single value which affects all nodes, or a named vector that contains a subset of nodes to be customized. ```{r} color = c("a" = "red", "d" = "blue") dag_graphviz(dag_small, node_param = list(color = color)) ``` The full set of node-level parameters can be found at: https://graphviz.org/docs/nodes/. They can all be set in the same format as `color` demonstrated above. The argument `edge_param` can be set to a list of graphical parameters for configuring edges. There are two ways to control edge colors. In the following code, we additionally add the relation types to the DAG. ```{r} parents = c("a", "a", "b", "b", "c", "d") children = c("b", "c", "c", "d", "e", "f") relations = c("is_a", "is_a", "part_of", "part_of", "is_a", "is_a") dag_small = create_ontology_DAG(parents, children, relations = relations) ``` Now since each edge is associated with a relation type, the color can be set by a vector with relation types as names: ```{r} edge_color = c("is_a" = "red", "part_of" = "blue") dag_graphviz(dag_small, edge_param = list(color = edge_color)) ``` To highlight specific edges, the parameter can be set to a named vector where names directly contain relations. ```{r} edge_color = c("a -> b" = "red", "c -> e" = "blue") dag_graphviz(dag_small, edge_param = list(color = edge_color)) ``` The direction in the specification does not matter. The following ways are all the same, but there must be spaces before and after the arrow. ``` "a -> b" = "red" "a <- b" = "red" "b -> a" = "red" "b <- a" = "red" "a <-> b" = "red" "a - b" = "red" ``` The full set of edge-level parameters can be found at https://graphviz.org/docs/edges/. They can all be set in the same format as `edge_color` demonstrated above. Internally, `dag_graphviz()` generates the "DOT" code for graphiviz visualization. The DOT code can be obtained with `dag_as_DOT()`: ```{r, comment = ''} dag_as_DOT(dag_small, node_param = list(color = color, shape = shape)) |> cat() ``` You can paste the DOT code to http://magjac.com/graphviz-visual-editor/ to generate the diagram. `dag_graphviz()` is very useful for visualizing a sub-DAG derived from the global DAG. For example, all upstream terms of a GO term. Recall in the following example, `dag[, "GO:0010228"]` returns a sub-DAG of all upstream terms of `GO:0010228`. ```{r} dag = create_ontology_DAG_from_GO_db() dag_graphviz(dag[, "GO:0010228"], node_param = list( fillcolor = c("GO:0010228" = "pink"), style = c("GO:0010228" = "filled") ), edge_param = list( color = c("is_a" = "purple", "part_of" = "darkgreen"), style = c("is_a" = "solid", "part_of" = "dashed") ), width = 600, height = 600) ``` ## Large DAGs Visualizing large DAGs is not an easy job because a term can have more than one parents. Here the `dag_circular_viz()` uses a circular layout to visualize large DAGs. ```{r, fig.width = 9, fig.height = 7} dag_circular_viz(dag) ``` In the circular layout, each circle correspond to a specific depth (maximal distance to root). The distance of a circle to the circle center is proportional to the logorithm of the number of terms with depth equal to or less than the current depth of this circle. On each circle, each term has a width (or a sector on the circle) associated where offspring terms are only drawn within that section. The width is proportional to the number of leaf terms in the corresponding sub-DAG. Dot size corresponds to the number of child terms. By default, the DAG is cut after the root term, and each sub-DAG is assigned with a different color. Child terms of root is added in the legend in the plot. If there is a "name" column in the meta data frame, texts in the "name" column are used as the legend labels, or else term IDs are used. By default the DAG is split on a certain level controlled by the argument `partition_by_level`. It can also be controlled by setting the possible number of terms in each sub-DAG. ```{r, fig.width = 9, fig.height = 7} dag_circular_viz(dag, partition_by_size = 5000) ``` `dag_treelize()` can convert a DAG to a tree where a term only has one parent. The circular visualization on the reduced tree is as follows: ```{r, fig.width = 10, fig.height = 7} tree = dag_treelize(dag) dag_circular_viz(tree) ``` One useful application is to map GO terms of interest (e.g. significant GO terms from function enrichment analysis) to the DAG. In the following example, `go_tb` contains GO terms from an enrichment analysis. ```{r, fig.width = 10, fig.height = 7} go_tb = readRDS(system.file("extdata", "sig_go_tb.rds", package = "simona")) sig_go_ids = go_tb$ID[go_tb$p.adjust < 0.01] # make sure `sig_go_ids` all in current GO.db version sig_go_ids = intersect(sig_go_ids, dag_all_terms(dag)) dag_circular_viz(dag, highlight = sig_go_ids) ``` In the next example, we will map `-log10(p.adjust)` to the node size. ```{r} p.adjust = go_tb$p.adjust[go_tb$p.adjust < 0.01] ``` `dag_circular_viz()` has a `node_size` argument which allows to set node sizes for terms, thus, we only need to calculate node sizes by the adjusted p-values. In the following code, we defined a simple `node_size_fun()` function that linearly interpolates values to node sizes within [2, 10]. ```{r} node_size_fun = function(x, range = c(2, 10)) { s = (range[2] - range[1])/(quantile(x, 0.95) - min(x)) * (x - min(x)) + range[1] s[s > range[2]] = range[2] s } ``` We also generate a legend for the node sizes: ```{r} library(ComplexHeatmap) lgd = Legend(title = "p.adjust", at = -log10(c(0.01, 0.001, 0.0001)), labels = c("0.01", "0.001", "0.0001"), type = "points", size = unit(node_size_fun(-log10(c(0.01, 0.001, 0.0001))), "pt")) ``` Calculate node sizes: ```{r} node_size = rep(2, dag_n_terms(dag)) names(node_size) = dag_all_terms(dag) node_size[sig_go_ids] = node_size_fun(-log10(p.adjust)) ``` And finally make the circular plot: ```{r, fig.width = 10, fig.height = 7} dag_circular_viz(dag, highlight = sig_go_ids, node_size = node_size, edge_transparency = 0.92, other_legends = lgd) ``` ## Session Info ```{r} sessionInfo() ```