Making A Single Heatmap ======================================== **Author**: Zuguang Gu ( z.gu@dkfz.de ) **Date**: `r Sys.Date()` ------------------------------------------------------------- ```{r global_settings, echo = FALSE, message = FALSE} library(markdown) options(markdown.HTML.options = c(options('markdown.HTML.options')[[1]], "toc")) library(knitr) knitr::opts_chunk$set( error = FALSE, tidy = FALSE, message = FALSE, fig.align = "center", fig.width = 5, fig.height = 5) options(markdown.HTML.stylesheet = "custom.css") options(width = 100) ``` A single heatmap is mostly used for a quick view of the data. It is a special case of a heatmap list which only contains one heatmap. Compare to available tools, **ComplexHeatmap** package provides a more flexible way to support visualization of a single heatmap. In following examples, we will demonstrate how to set parameters to visualize a single heatmap. First let's load packages and generate a random matrix: ```{r data} library(ComplexHeatmap) library(circlize) set.seed(123) mat = cbind(rbind(matrix(rnorm(16, -1), 4), matrix(rnorm(32, 1), 8)), rbind(matrix(rnorm(24, 1), 4), matrix(rnorm(48, -1), 8))) # permute the rows and columns mat = mat[sample(nrow(mat), nrow(mat)), sample(ncol(mat), ncol(mat))] rownames(mat) = paste0("R", 1:12) colnames(mat) = paste0("C", 1:10) ``` Plot the heatmap with default settings. The default style of the heatmap is quite the same as those generated by other similar heatmap functions. ```{r default} Heatmap(mat) ``` ## Colors In most cases, the heatmap visualizes a matrix with continuous values. In this case, user should provide a color mapping function. A color mapping function should accept a vector of values and return a vector of corresponding colors. The `colorRamp2()` from the **circlize** package is helpful for generating such functions. The two arguments for `colorRamp2()` is a vector of breaks values and corresponding colors. Currently `colorRamp2()` linearly interpolates colors in every interval through LAB color space. In following example, values between -3 and 3 are linearly interpolated to obtain corresponding colors, values larger than 3 are all mapped to red and values less than -3 are all mapped to green (so the color mapping function demonstrated here is robust to outliers). ```{r color_fun} mat2 = mat mat2[1, 1] = 100000 Heatmap(mat2, col = colorRamp2(c(-3, 0, 3), c("green", "white", "red")), cluster_rows = FALSE, cluster_columns = FALSE) ``` If the matrix is continuous, you can also provide a vector of colors and colors will be interpolated according to the 'k'th quantile. But remember this method is not robust to outliers. ```{r color_vector} Heatmap(mat, col = rev(rainbow(10))) ``` If the matrix contains discrete values (either numeric or character), colors should be specified as a named vector to make it possible for the mapping from discrete values to colors. If there is no name for the color, the order of colors corresponds to the order of `unique(mat)`. ```{r discrete_matrix} discrete_mat = matrix(sample(1:4, 100, replace = TRUE), 10, 10) colors = structure(circlize::rand_color(4), names = c("1", "2", "3", "4")) Heatmap(discrete_mat, col = colors) ``` Or a character matrix: ```{r discrete_character_matrix} discrete_mat = matrix(sample(letters[1:4], 100, replace = TRUE), 10, 10) colors = structure(circlize::rand_color(4), names = letters[1:4]) Heatmap(discrete_mat, col = colors) ``` As you see, for the numeric matrix (no matter it is continuous mapping or discrete mapping), by default clustering is applied on both dimensions while for character matrix, clustering is suppressed. ``NA`` is allowed in the heatmap. You can control the color of `NA` by `na_col` argument. The matrix which contains `NA` can also be clustered by `Heatmap()` (since `dist()` accepts `NA` values) and clustering a matrix with `NA` values by "pearson", "spearman" or "kendall" method gives warning messages. ```{r na_value} mat_with_na = mat mat_with_na[sample(c(TRUE, FALSE), nrow(mat)*ncol(mat), replace = TRUE, prob = c(1, 9))] = NA Heatmap(mat_with_na, na_col = "orange", clustering_distance_rows = "pearson") ``` Color space is important for interpolating colors. By default, colors are linearly interpolated in [LAB color space](https://en.wikipedia.org/wiki/Lab_color_space), but you can select the color space in `colorRamp2()` function. Compare following two plots (`+` operation on two heatmaps will be introduced in [**Making a list of heatmaps**](s3.a_list_of_heatmaps.html) vignette): ```{r, fig.width = 10} f1 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red")) f2 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"), space = "RGB") Heatmap(mat, col = f1, column_title = "LAB color space") + Heatmap(mat, col = f2, column_title = "RGB color space") ``` On following figure, corresponding values change evenly on the folded axis, you can see how colors change under different color spaces (the plot is made by **HilbertCurve** package). Choosing a proper color space is a little bit subjective and it depends on specific data and color theme. Sometimes you need to try several color spaces to determine one which can best reveal potential structure of your data. ```{r, fig.width = 14, fig.height = 14/5, echo = FALSE, message = FALSE} suppressPackageStartupMessages(library(HilbertCurve)) suppressPackageStartupMessages(library(IRanges)) space = c("RGB", "LAB", "XYZ", "sRGB", "LUV") pushViewport(viewport(layout = grid.layout(nr = 1, nc = length(space)))) for(i in seq_along(space)) { pushViewport(viewport(layout.pos.row = 1, layout.pos.col = i)) hc = HilbertCurve(1, 100, level = 4, newpage = FALSE, title = space[i]) ir = IRanges(start = 1:99, end = 2:100) f = colorRamp2(c(-1, 0, 1), c("green", "black", "red"), space = space[i]) col = f(seq(-1, 1, length = 100)) hc_points(hc, ir, np = 3, gp = gpar(col = col, fill = col)) upViewport() } upViewport() grid.newpage() pushViewport(viewport(layout = grid.layout(nr = 1, nc = length(space)))) for(i in seq_along(space)) { pushViewport(viewport(layout.pos.row = 1, layout.pos.col = i)) hc = HilbertCurve(1, 100, level = 4, newpage = FALSE, title = space[i]) ir = IRanges(start = 1:99, end = 2:100) f = colorRamp2(c(-1, 0, 1), c("blue", "white", "red"), space = space[i]) col = f(seq(-1, 1, length = 100)) hc_points(hc, ir, np = 3, gp = gpar(col = col, fill = col)) upViewport() } upViewport() ``` ## Titles The name of the heatmap by default is used as the title of the heatmap legend. The name also plays as a unique id if you plot more than one heatmaps together. Later we can use this name to go to the corresponding heatmap to add more graphics (see [**Heatmap Decoration**](s6.heatmap_decoration.html) vignette). ```{r with_matrix_name} Heatmap(mat, name = "foo") ``` The title of the heatmap legend can be modified by `heatmap_legend_param` (see [**Heatmap and Annotation Legends**](s5.legend.html) vignette for more control on the legend). ```{r heatmap_legend_title} Heatmap(mat, heatmap_legend_param = list(title = "legend")) ``` You can set heatmap titles to be put either by the rows or by the columns. Note at a same time you can only put e.g. column title either on the top or at the bottom of the heatmap. The graphic parameters can be set by `row_title_gp` and `column_title_gp` respectively. Please remember you should use `gpar()` to specify graphic parameters. ```{r row_column_title} Heatmap(mat, name = "foo", column_title = "I am a column title", row_title = "I am a row title") Heatmap(mat, name = "foo", column_title = "I am a column title at the bottom", column_title_side = "bottom") Heatmap(mat, name = "foo", column_title = "I am a big column title", column_title_gp = gpar(fontsize = 20, fontface = "bold")) ``` Rotations for titles can be set by `row_title_rot` and `column_title_rot`, but only horizontal and vertical rotations are allowed. ```{r title_rotation} Heatmap(mat, name = "foo", row_title = "row title", row_title_rot = 0) ``` ## Clustering Clustering may be the key feature of the heatmap visualization. In **ComplexHeatmap** package, clustering is supported with high flexibility. You can specify the clustering either by a pre-defined method (e.g. "eulidean" or "pearson"), or by a distance function, or by a object that already contains clustering, or directly by a clustering function. It is also possible to render your dendrograms with different colors and styles for different branches for better revealing structures of your data. First there are general settings for the clustering, e.g. whether do or show dendrograms, side of the dendrograms and size of the dendrograms. ```{r cluster_basic} Heatmap(mat, name = "foo", cluster_rows = FALSE) Heatmap(mat, name = "foo", show_column_dend = FALSE) Heatmap(mat, name = "foo", row_dend_side = "right") Heatmap(mat, name = "foo", column_dend_height = unit(2, "cm")) ``` There are three ways to specify distance metric for clustering: - specify distance as a pre-defined option. The valid values are the supported methods in `dist()` function and within `pearson`, `spearman` and `kendall`. `NA` values are ignored for pre-defined clustering but with giving warnings (see example in **Colors** section). - a self-defined function which calculates distance from a matrix. The function should only contain one argument. Please note for clustering on columns, the matrix will be transposed automatically. - a self-defined function which calculates distance from two vectors. The function should only contain two arguments. ```{r cluster_distance} Heatmap(mat, name = "foo", clustering_distance_rows = "pearson") Heatmap(mat, name = "foo", clustering_distance_rows = function(m) dist(m)) Heatmap(mat, name = "foo", clustering_distance_rows = function(x, y) 1 - cor(x, y)) ``` Based on this feature, we can apply clustering which is robust to outliers based on the pair-wise distance. ```{r cluster_distance_advanced} mat_with_outliers = mat for(i in 1:10) mat_with_outliers[i, i] = 1000 robust_dist = function(x, y) { qx = quantile(x, c(0.1, 0.9)) qy = quantile(y, c(0.1, 0.9)) l = x > qx[1] & x < qx[2] & y > qy[1] & y < qy[2] x = x[l] y = y[l] sqrt(sum((x - y)^2)) } Heatmap(mat_with_outliers, name = "foo", col = colorRamp2(c(-3, 0, 3), c("green", "white", "red")), clustering_distance_rows = robust_dist, clustering_distance_columns = robust_dist) ``` If possible distance method provided, you can also cluster a character matrix. `cell_fun` argument will be explained in later section. ```{r cluster_character_matrix} mat_letters = matrix(sample(letters[1:4], 100, replace = TRUE), 10) # distance in th ASCII table dist_letters = function(x, y) { x = strtoi(charToRaw(paste(x, collapse = "")), base = 16) y = strtoi(charToRaw(paste(y, collapse = "")), base = 16) sqrt(sum((x - y)^2)) } Heatmap(mat_letters, name = "foo", col = structure(2:5, names = letters[1:4]), clustering_distance_rows = dist_letters, clustering_distance_columns = dist_letters, cell_fun = function(j, i, x, y, w, h, col) { grid.text(mat_letters[i, j], x, y) }) ``` Method to make hierarchical clustering can be specified by `clustering_method_rows` and `clustering_method_columns`. Possible methods are those supported in `hclust()` function. ```{r cluster_method} Heatmap(mat, name = "foo", clustering_method_rows = "single") ``` By default, clustering is performed by `hclust()`. But you can also utilize clustering results which are generated by other methods by specifying `cluster_rows` or `cluster_columns` to a `hclust` or `dendrogram` object. In following examples, we use `diana()` and `agnes()` methods which are from the **cluster** package to perform clusterings. ```{r cluster_object} library(cluster) Heatmap(mat, name = "foo", cluster_rows = as.dendrogram(diana(mat)), cluster_columns = as.dendrogram(agnes(t(mat)))) ``` In the native `heatmap()` function, dendrograms on row and on column are reordered to let features with larger different separated more from each other, By default the reordering for the dendrograms are turned on by `Heatmap()` as well. Besides the default reordering method, you can first generate a dendrogram and apply other reordering method and then send the reordered dendrogram to `cluster_rows` argument. Compare following three plots: ```{r cluster_dendsort, fig.width = 14} pushViewport(viewport(layout = grid.layout(nr = 1, nc = 3))) pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1)) draw(Heatmap(mat, name = "foo", row_dend_reorder = FALSE, column_title = "no reordering"), newpage = FALSE) upViewport() pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2)) draw(Heatmap(mat, name = "foo", row_dend_reorder = TRUE, column_title = "applied reordering"), newpage = FALSE) upViewport() library(dendsort) dend = dendsort(hclust(dist(mat))) pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 3)) draw(Heatmap(mat, name = "foo", cluster_rows = dend, row_dend_reorder = FALSE, column_title = "reordering by dendsort"), newpage = FALSE) upViewport(2) ``` You can render your `dendrogram` object by the **dendextend** package and make a more customized visualization of the dendrogram. ```{r cluster_dendextend} library(dendextend) dend = hclust(dist(mat)) dend = color_branches(dend, k = 2) Heatmap(mat, name = "foo", cluster_rows = dend) ``` More generally, `cluster_rows` and `cluster_columns` can be functions which calculate the clusterings. The input argument for the self-defined function should be a matrix and returned value should be a `hclust` or `dendrogram` object. Please note, when `cluster_rows` is executed internally, the argument `m` is the input `mat` itself while `m` is the transpose of `mat` when executing `cluster_columns`. ```{r cluster_function} Heatmap(mat, name = "foo", cluster_rows = function(m) as.dendrogram(diana(m)), cluster_columns = function(m) as.dendrogram(agnes(m))) ``` `fastcluster::hclust` implements a faster version of `hclust`. We can re-define `cluster_rows` and `cluster_columns` to use the faster version of `hclust`. But note `fastcluster::hclust` only speed up the calculation of the cluster while not the calculation of distance matrix. ```{r} # code not run when building the vignette Heatmap(mat, name = "foo", cluster_rows = function(m) fastcluster::hclust(dist(m)), cluster_columns = function(m) fastcluster::hclust(dist(m))) # for column cluster, m will be automatically transposed ``` To make it more convinient to use the faster version of `hclust` (assuming you have many heatmaps to be concatenated), it can be set as a global option: ```{r} # code not run when building the vignette ht_global_opt(fast_hclust = TRUE) # now hclust from fastcluster package is used in all heatmaps Heatmap(mat, name = "foo") ``` Clustering can help to adjust order in rows and in columns. But you can still set the order manually by `row_order` and `column_order`. Note you need to turn off clustering if you want to set order manually. `row_order` and `column_order` can also be set according to matrix row names and column names if they exist. ```{r manual_order} Heatmap(mat, name = "foo", cluster_rows = FALSE, cluster_columns = FALSE, row_order = 12:1, column_order = 10:1) ``` Note `row_dend_reorder` and `row_order` are different. `row_dend_reorder` is applied on the dendrogram. Because for any node in the dendrogram, rotating two leaves gives an identical dendrogram. Thus, reordering the dendrogram by automatically rotating sub-dendrogram at every node will help to separate elements with more difference to be farther from each other. While `row_order` is applied on the matrix and dendrograms are suppressed. ## Dimension names Side, visibility and graphic parameters for dimension names can be set as follows. ```{r dimension_name} Heatmap(mat, name = "foo", row_names_side = "left", row_dend_side = "right", column_names_side = "top", column_dend_side = "bottom") Heatmap(mat, name = "foo", show_row_names = FALSE) Heatmap(mat, name = "foo", row_names_gp = gpar(fontsize = 20)) Heatmap(mat, name = "foo", row_names_gp = gpar(col = c(rep("red", 4), rep("blue", 8)))) ``` Currently, rotations for column names and row names are not supported (or maybe in the future versions). Because after the text rotation, the dimension names will go inside other heatmap components which will mess up the heatmap layout. However, as will be introduced in [**Heatmap Annotation**](s4.heatmap_annotation.html) vignette, text rotation is allowed in the heatmap annotations. Thus, users can provide a row annotation or column annotation which only contains rotated text to simulate rotated row/column names (You will see the example in the [**Heatmap Annotation**](s4.heatmap_annotation.html) vignette). ## Split heatmap by rows A heatmap can be split by rows. This will enhance the visualization of group separation in the heatmap. The `km` argument with a value larger than 1 means applying a k-means clustering on rows and clustering is applied on every k-means cluster. ```{r k_means} Heatmap(mat, name = "foo", km = 2) ``` More generally, `split` can be set to a vector or a data frame in which different combination of levels split the rows of the heatmap. Actually, k-means clustering just generates a vector of row classes and appends `split` with one additional column. The combined row titles for each row slice can be controlled by `combined_name_fun` argument. The order of each slice can be controlled by `levels` of each variable in `split`. ```{r split} Heatmap(mat, name = "foo", split = rep(c("A", "B"), 6)) Heatmap(mat, name = "foo", split = data.frame(rep(c("A", "B"), 6), rep(c("C", "D"), each = 6))) Heatmap(mat, name = "foo", split = data.frame(rep(c("A", "B"), 6), rep(c("C", "D"), each = 6)), combined_name_fun = function(x) paste(x, collapse = "\n")) Heatmap(mat, name = "foo", km = 2, split = factor(rep(c("A", "B"), 6), levels = c("B", "A")), combined_name_fun = function(x) paste(x, collapse = "\n")) Heatmap(mat, name = "foo", km = 2, split = rep(c("A", "B"), 6), combined_name_fun = NULL) ``` If you are not happy with the default k-means partitioning method, it is easy to use other partitioning methods by just assigning the partitioning vector to `split`. ```{r pam} pa = pam(mat, k = 3) Heatmap(mat, name = "foo", split = paste0("pam", pa$clustering)) ``` If ``row_order`` is set, in each slice, rows are still ordered. ```{r split_row_order} Heatmap(mat, name = "foo", row_order = 12:1, cluster_rows = FALSE, km = 2) ``` Height of gaps between row slices can be controlled by `gap` (a single unit or a vector of units). ```{r split_gap} Heatmap(mat, name = "foo", split = paste0("pam", pa$clustering), gap = unit(5, "mm")) ``` Character matrix can only be split by `split` argument. ```{r split_discrete_matrix} Heatmap(discrete_mat, name = "foo", col = 1:4, split = rep(letters[1:2], each = 5)) ``` When split is applied on rows, graphic parameters for row title and row names can be specified as same length as number of row slices. ```{r split_graphical_parameter} Heatmap(mat, name = "foo", km = 2, row_title_gp = gpar(col = c("red", "blue"), font = 1:2), row_names_gp = gpar(col = c("green", "orange"), fontsize = c(10, 14))) ``` Users may already have a dendrogram for rows and they want to split rows by splitting the dendrogram into k sub trees. In this case, `split` can be specified as a single number: ```{r split_dendrogram} dend = hclust(dist(mat)) dend = color_branches(dend, k = 2) Heatmap(mat, name = "foo", cluster_rows = dend, split = 2) ``` Or they just split rows by specifying `split` as an integer. Note it is different from by `km`. If `km` is set, k-means clustering is applied first and clustering is applied to every k-mean cluster; while if `split` is an integer, clustering is applied to the whole matrix and later split by `cutree()`. ```{r} Heatmap(mat, name = "foo", split = 2) ``` ## Self define the heatmap body `rect_gp` argument provides basic graphic settings for the heatmap body (note `fill` parameter is disabled). ```{r rect_gp} Heatmap(mat, name = "foo", rect_gp = gpar(col = "green", lty = 2, lwd = 2)) ``` The heatmap body can be self-defined. By default the heatmap body is composed by an array of rectangles (it is called cells here) with different filled colors. If `type` in `rect_gp` is set to `none`, the array for cells is initialized but no graphics are put in. Then, users can define their own graphic function by `cell_fun`. `cell_fun` is applied on every cell in the heatmap and provides following information on the 'current' cell: - `j`: column index in the matrix. Column index corresponds to the x-direction in the viewport, that's why `j` is put as the first argument. - `i`: row index in the matrix. - `x`: x coordinate of middle point of the cell which is measured in the viewport of the heatmap body. - `y`: y coordinate of middle point of the cell which is measured in the viewport of the heatmap body. - `width`: width of the cell. - `height`: height of the cell. - `fill`: color of the cell. The most common use is to add numeric values to the heatmap: ```{r} Heatmap(mat, name = "foo", cell_fun = function(j, i, x, y, width, height, fill) { grid.text(sprintf("%.1f", mat[i, j]), x, y, gp = gpar(fontsize = 10)) }) ``` In following example, we make a heatmap which shows correlation matrix similar as the **corrplot** package: ```{r cell_fun, fig.width = 6.5, fig.height = 6} cor_mat = cor(mat) od = hclust(dist(cor_mat))$order cor_mat = cor_mat[od, od] nm = rownames(cor_mat) col_fun = circlize::colorRamp2(c(-1, 0, 1), c("green", "white", "red")) # `col = col_fun` here is used to generate the legend Heatmap(cor_mat, name = "correlation", col = col_fun, rect_gp = gpar(type = "none"), cell_fun = function(j, i, x, y, width, height, fill) { grid.rect(x = x, y = y, width = width, height = height, gp = gpar(col = "grey", fill = NA)) if(i == j) { grid.text(nm[i], x = x, y = y) } else if(i > j) { grid.circle(x = x, y = y, r = abs(cor_mat[i, j])/2 * min(unit.c(width, height)), gp = gpar(fill = col_fun(cor_mat[i, j]), col = NA)) } else { grid.text(sprintf("%.1f", cor_mat[i, j]), x, y, gp = gpar(fontsize = 8)) } }, cluster_rows = FALSE, cluster_columns = FALSE, show_row_names = FALSE, show_column_names = FALSE) ``` Note `cell_fun` is applied to every cell through a `for` loop, so it will be a little bit slow for large matrix. One last example is to visualize a [GO game](https://en.wikipedia.org/wiki/Go_(game)). The input data takes records of moves in the game. ```{r} str = "B[cp];W[pq];B[dc];W[qd];B[eq];W[od];B[de];W[jc];B[qk];W[qn] ;B[qh];W[ck];B[ci];W[cn];B[hc];W[je];B[jq];W[df];B[ee];W[cf] ;B[ei];W[bc];B[ce];W[be];B[bd];W[cd];B[bf];W[ad];B[bg];W[cc] ;B[eb];W[db];B[ec];W[lq];B[nq];W[jp];B[iq];W[kq];B[pp];W[op] ;B[po];W[oq];B[rp];W[ql];B[oo];W[no];B[pl];W[pm];B[np];W[qq] ;B[om];W[ol];B[pk];W[qp];B[on];W[rm];B[mo];W[nr];B[rl];W[rk] ;B[qm];W[dp];B[dq];W[ql];B[or];W[mp];B[nn];W[mq];B[qm];W[bp] ;B[co];W[ql];B[no];W[pr];B[qm];W[dd];B[pn];W[ed];B[bo];W[eg] ;B[ef];W[dg];B[ge];W[gh];B[gf];W[gg];B[ek];W[ig];B[fd];W[en] ;B[bn];W[ip];B[dm];W[ff];B[cb];W[fe];B[hp];W[ho];B[hq];W[el] ;B[dl];W[fk];B[ej];W[fp];B[go];W[hn];B[fo];W[em];B[dn];W[eo] ;B[gp];W[ib];B[gc];W[pg];B[qg];W[ng];B[qc];W[re];B[pf];W[of] ;B[rc];W[ob];B[ph];W[qo];B[rn];W[mi];B[og];W[oe];B[qe];W[rd] ;B[rf];W[pd];B[gm];W[gl];B[fm];W[fl];B[lj];W[mj];B[lk];W[ro] ;B[hl];W[hk];B[ik];W[dk];B[bi];W[di];B[dj];W[dh];B[hj];W[gj] ;B[li];W[lh];B[kh];W[lg];B[jn];W[do];B[cl];W[ij];B[gk];W[bl] ;B[cm];W[hk];B[jk];W[lo];B[hi];W[hm];B[gk];W[bm];B[cn];W[hk] ;B[il];W[cq];B[bq];W[ii];B[sm];W[jo];B[kn];W[fq];B[ep];W[cj] ;B[bk];W[er];B[cr];W[gr];B[gk];W[fj];B[ko];W[kp];B[hr];W[jr] ;B[nh];W[mh];B[mk];W[bb];B[da];W[jh];B[ic];W[id];B[hb];W[jb] ;B[oj];W[fn];B[fs];W[fr];B[gs];W[es];B[hs];W[gn];B[kr];W[is] ;B[dr];W[fi];B[bj];W[hd];B[gd];W[ln];B[lm];W[oi];B[oh];W[ni] ;B[pi];W[ki];B[kj];W[ji];B[so];W[rq];B[if];W[jf];B[hh];W[hf] ;B[he];W[ie];B[hg];W[ba];B[ca];W[sp];B[im];W[sn];B[rm];W[pe] ;B[qf];W[if];B[hk];W[nj];B[nk];W[lr];B[mn];W[af];B[ag];W[ch] ;B[bh];W[lp];B[ia];W[ja];B[ha];W[sf];B[sg];W[se];B[eh];W[fh] ;B[in];W[ih];B[ae];W[so];B[af]" ``` Then we convert it into a matrix: ```{r} str = gsub("\\n", "", str) step = strsplit(str, ";")[[1]] type = gsub("(B|W).*", "\\1", step) row = gsub("(B|W)\\[(.).\\]", "\\2", step) column = gsub("(B|W)\\[.(.)\\]", "\\2", step) mat = matrix(nrow = 19, ncol = 19) rownames(mat) = letters[1:19] colnames(mat) = letters[1:19] for(i in seq_along(row)) { mat[row[i], column[i]] = type[i] } mat ``` Black and white stones are put based on the values in the matrix: ```{r, fig.width = 8, fig.height = 8} Heatmap(mat, name = "go", rect_gp = gpar(type = "none"), cell_fun = function(j, i, x, y, w, h, col) { grid.rect(x, y, w, h, gp = gpar(fill = "#dcb35c", col = NA)) if(i == 1) { grid.segments(x, y-h*0.5, x, y) } else if(i == nrow(mat)) { grid.segments(x, y, x, y+h*0.5) } else { grid.segments(x, y-h*0.5, x, y+h*0.5) } if(j == 1) { grid.segments(x, y, x+w*0.5, y) } else if(j == ncol(mat)) { grid.segments(x-w*0.5, y, x, y) } else { grid.segments(x-w*0.5, y, x+w*0.5, y) } if(i %in% c(4, 10, 16) & j %in% c(4, 10, 16)) { grid.points(x, y, pch = 16, size = unit(2, "mm")) } r = min(unit.c(w, h))*0.45 if(is.na(mat[i, j])) { } else if(mat[i, j] == "W") { grid.circle(x, y, r, gp = gpar(fill = "white", col = "white")) } else if(mat[i, j] == "B") { grid.circle(x, y, r, gp = gpar(fill = "black", col = "black")) } }, col = c("B" = "black", "W" = "white"), show_row_names = FALSE, show_column_names = FALSE, column_title = "One famous GO game", heatmap_legend_param = list(title = "Player", at = c("B", "W"), labels = c("player1", "player2"), grid_border = "black") ) ``` ## Set heatmap body as raster image Saving plots in PDF format is kind of best parctice to preserve the quality. However, when there are too many rows (say, > 10000), the output PDF file size would be huge and it takes time and memory to read the plot. On the other hand, details of the huge matrix will not be seen in limited size of PDF file. Rendering heatmaps as raster images will effectively reduce the file size. In `Heatmap()` function, there are four options which control how to generate the raster image: `use_raster`, `raster_device`, `raster_quality`, `raster_device_param`. You can choose graphic device (`png`, `jpeg` and `tiff`) by `raster_device`, control the quality of the raster image by `raster_quality`, and pass further parameters for a specific device by `raster_device_param`. [Check this web page for better demonstrations.](http://zuguang.de/blog/html/d3aa6e2b289514ecddded64a467d1961.html) ## Session info ```{r} sessionInfo() ```