%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %Comments: % * set opts_chunk$set(eval=FALSE) for 'knitting' % without running R code % \documentclass{article} %\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{nethet} \newcommand{\X}{\mathbf{X}} \newcommand{\argmin}{\mathop{\arg \min}\limits} \newcommand{\argmax}{\mathop{\arg \max}\limits} \usepackage{amsmath} \usepackage{amsfonts} <>= BiocStyle::latex() @ \bioctitle[nethet]{A Bioconductor package for investigation of network heterogeneity from high-dimensional data % Heterogeneous Clustering and Network Inference with the MixGLasso } \author{Nicolas St\"{a}dler\footnote{staedler.n@gmail.com}, Frank Dondelinger\footnote{fdondelinger.work@gmail.com}} \begin{document} \maketitle <>= # Load package library(nethet) set.seed(1) @ \section{Introduction} Data analysis in systems biology and medicine often requires analysing data whose dynamics can be described as a network of observed and unobserved variables. A simple example is a protein signalling network in a cell. Simplifying the process greatly, signalling proteins known as kinases can be unphosphorylated (inactive) or phosphorylated (active). Cell signalling uses the phosphorylation machinery to pass messages from the exterior of the cell to the interior where they will be acted upon. This message passing is achieved via a relay of kinases and other proteins (the signalling pathway), which can be thought of as a network. Numerous software packages exist for reconstructing networks from observational data (e.g. \cite{ARACNE}, \cite{bnlearn}, \cite{RTN}, \cite{EDISON}). However, most of these packages assume that there is a single underlying network. Package \texttt{nethet} was designed with the intent of handling heterogeneous datasets arising from a collection of (possibly related) networks. Take for example protein measurements of breast cancer tumor cells. It is known that there exist several subtypes of breast cancer with different molecular profiles \cite{tcga_breast}. We might be interested in whether the signalling pathways (networks) reconstructed from two subtypes are statistically different. If they are not, then we might want to identify new subtypes that present different molecular profiles, and reconstruct the networks for each identified subtype. The \texttt{nethet} package contains functionalities to tackle all of these tasks. To the best of our knowledge, \texttt{nethet} is currently the only implementation of statistical solid methodology enabling the analysis of network heterogeneity from high-dimensional data. Package \texttt{nethet} combines several implementations of recent statistical innovations useful for estimation and comparison of networks in a heterogeneous, high-dimensional setting. In particular, we provide code for formal two-sample testing in Gaussian graphical models (differential network and GGM-GSA; \cite{staedler_diffnet}, \cite{ggmgsa2014}) and make a novel network-based clustering algorithm available (mixed graphical lasso, \cite{staedler_mixglasso}). \section{Statistical setup} We consider independent samples $X_i\in \mathbb{R}^p$ ($i=1,\ldots,n$), measuring $p$ molecular variables. We assume that the collected data can be divided into $K$ different groups. Let $S_i\in\{1,\ldots,K\}$ be the group assignment of sample $i$, denote with $n_k$ the group specific sample size and write $\X_k$ for the $n_k\times p$ data matrix consisting of all samples belonging to group $k$. To describe networks we use Gaussian graphical models (\emph{GGMs}, \cite{rue2005}). These models use an undirected graph (or network) to describe probabilistic relationships between variables. For each group $k$, we assume that $\X_k$ is sampled from a multivariate Gaussian distribution with (unknown) mean $\mu_k$ and (unknown) $p \times p$ concentration matrix $\Omega_k=\Sigma_k^{-1}$. The matrix $\Omega_k$ defines the group-specific graph $G_k$ via \begin{eqnarray*} &(j,j') \in E(G_k) \Leftrightarrow \Omega_{k;jj'}\neq 0,\\ &j,j' \in \{ 1, \ldots, p \}\;\textrm{and}\; j \neq j', \end{eqnarray*} where $E(G)$ denotes the edge set of graph $G$. %The different groups are characterized by group-specific mean vectors %$\mu_k$ and networks $G_k$ ($k=1,\ldots,K$). These are unknown at the %outset and have to be estimated from the data. Learning of networks $G_k$ is a so-called high-dimensional statistical problem. % because of the large %number of parameters relative to the sample sizes available at the %group level. We employ regularization to learn sparse, parsimonious networks and thereby control over-fitting. In particular, we use the popular \emph{graphical Lasso} \cite{friedman2007sic,huge2012}. Frequently the group assignments $S_i$, as well as the number of groups $K$, are unknown at the outset and have to be inferred simultaneously with the group-specific mean vectors and networks. The method \emph{mixglasso}, implemented in this package, is a novel tool for high-dimensional, network-based clustering. It is based on a finite mixture of GGMs and employs an adaptive and automatic penalization scheme \cite{staedler_mixglasso}. Network inference is subject to statistical uncertainty and observed differences between estimated networks may be due to noise in the data and variability in estimation rather than any true difference in underlying network topology. Testing hypotheses of the form \begin{eqnarray*} \label{eq:hypothesis} \mathbf{H_0}: G_k=G_{k'}, \quad k,k'\in \{1,\ldots,K\},\; k\neq k' \end{eqnarray*} is challenging. %and involves non-nested model comparison %\citep{vuong1989}. We build upon a recent approach called \emph{differential network} \cite{staedler_diffnet,ggmgsa2014} which allows formal two-sample testing in high-dimensional GGMs. \section{Package functionalities} The package consists of the following main parts: \begin{itemize} \item Simulation functions for creating synthetic data from the underlying Gaussian mixture (network) model. \item Network inference using the \texttt{het\_cv\_glasso} function for reconstructing heterogeneous networks from data with the graphical Lasso \cite{glasso} when the group structure is known. \item High-dimensional hypothesis testing capabilities, including the \texttt{diffnet} functions implementing a statistical test for whether the networks underlying a pair of dataset are different, the \texttt{ggmgsa} functions allowing for differential gene set testing and the \texttt{diffregr} functions testing whether two high-dimensional regression models are statistically different \cite{staedler_diffnet,ggmgsa2014}. \item The \texttt{mixglasso} functions implementing a network-based clustering and reconstruction algorithm also based on the graphical Lasso, for unknown group structure \cite{staedler_mixglasso}. \item Plotting and export functions for displaying and saving the results of the analysis in a sensible way. \end{itemize} \section{Simulate data} \label{sec:simulation} In order to demonstrate the functionalities of the package, we will first simulate data from a Gaussian mixture model with a known covariance structure. The \textit{nethet} package includes code for generating random covariance matrices with a given sparsity, and for simulating from a Gaussian mixture model with given means and covariances. The function \texttt{sim\_mix\_networks} provides a convenient wrapper for both: <>= # Specify number of simulated samples and dimensionality n = 100 p = 25 # Specify number of components of the mixture model and mixture probabilities n.comp = 4 mix.prob = c(0.1, 0.4, 0.3, 0.2) # Specify sparsity in [0,1], indicating fraction of off-diagonal zero entries. s = 0.9 # Generate networks with random means and covariances. Means will be drawn from # a standard Gaussian distribution, non-zero covariance values from a # Beta(1,1) distribution. sim.result = sim_mix_networks(n, p, n.comp, s, mix.prob) @ The data is contained in \texttt{sim.result\$data}, and the components that each data point belongs to are contained in \texttt{sim.result\$comp}. Let's check that the mixture probabilities are correct and then plot the first two dimensions of the data. Note that we do not expect these to be well-separated in any way. <>= print(table(sim.result$comp)/n) component = as.factor(sim.result$comp) library('ggplot2') qplot(x=sim.result$data[,1], y=sim.result$data[,2], colour=component) + xlab('Dimension 1') + ylab('Dimension 2') @ The means and covariances of the data are contained in \texttt{sim.result\$Mu} and \texttt{sim.result\$Sig}. If desired, they can also be specified when calling \texttt{sim\_mix\_networks}. <>= # Generate new dataset with the same covariances, but different means sim.result.new = sim_mix_networks(n, p, n.comp, s, mix.prob, Sig=sim.result$Sig) component = as.factor(sim.result.new$comp) qplot(x=sim.result.new$data[,1], y=sim.result.new$data[,2], colour=component) + xlab('Dimension 1') + ylab('Dimension 2') @ When the covariance matrices for the components are not specified in advance, the \texttt{sim\_mix\_networks} function implicitly assumes that they are generated independently of each other. In order to test the \texttt{diffnet} functions, we also want to be able to generate simulated data from pairs of networks that present some common edges. The \texttt{generate\_2networks} function is used to generate pairs of networks with an arbitrary overlap. <>= ## Sample size and number of nodes n <- 40 p <- 10 ## Specify sparse inverse covariance matrices, ## with number of edges in common equal to ~ 0.8*p gen.net <- generate_2networks(p,graph='random',n.nz=rep(p,2), n.nz.common=ceiling(p*0.8)) invcov1 <- gen.net[[1]] invcov2 <- gen.net[[2]] plot_2networks(invcov1,invcov2,label.pos=0,label.cex=0.7) @ \section{Network estimation with known group labels} If it is known a priori to which component each sample belongs, then the problem of reconstructing the network reduces to a simple application of the graphical Lasso to each component. For convenience, we have included a wrapper function \texttt{het\_cv\_glasso} in \texttt{nethet} that applies the graphical Lasso \cite{glasso} to each component in a heterogeneous dataset with specified component labels. The penalisation hyperparameter is tuned individually for each component using cross-validation. To demonstrate \texttt{het\_cv\_glasso}, we will generate some data in the same way as in the previous section: <>= n = 100 p = 25 # Generate networks with random means and covariances. sim.result = sim_mix_networks(n, p, n.comp, s, mix.prob) test.data = sim.result$data test.labels = sim.result$comp # Reconstruct networks for each component networks = het_cv_glasso(data=test.data, grouping=test.labels) @ One way of checking if the reconstructed networks are sensible is plotting the covariance matrices used for generating the networks against the reconstructed covariance matrices. <>= # Component labels for covariance values components = as.factor(rep(1:n.comp, each=p^2)) qplot(x=c(networks$Sig), y=c(sim.result$Sig), colour=components) + xlab('Reconstructed Covariances') + ylab('True Covariances') @ \section{High-dimensional two-sample testing}\label{sec:diffnet} We have demonstrated how to use our package to estimate networks from heterogeneous data. Often, we would like to perform a statistical comparison between networks. \emph{Differential network} allows formal hypothesis testing regarding network differences. It is based on a novel and very general methodology for high-dimensional two-sample testing. Other useful tools based on this technology are \emph{GGM-GSA} (``multivariate gene-set testing based on GGMs'') and \emph{differential regression} which allows formal two-sample testing in the high-dimensional regression model. For details on this methodology we refer the reader to \cite{staedler_diffnet,ggmgsa2014}. \subsection{Differential network} Let us consider datasets generated from GGMs $G_1$ and $G_2$ respectively. We would like to know whether networks inferred from these datasets differ in a statistical significant manner, that is we would like to test the hypothesis \begin{eqnarray*} \mathbf{H}_0:\quad G_1=G_2. \end{eqnarray*} The function \texttt{diffnet\_multisplit} uses repeated sample splitting to address this task. The main steps are: \begin{enumerate} \item Both datasets are randomly split into two halves: the ``\emph{in}-'' and ``\emph{out}-sample''. \item Networks are inferred using only the \emph{in}-sample (``screening step''). \item Based on the \emph{out}-sample, a p-value is computed which compares the networks obtained in step 2 (``cleaning step''). \item Steps 1-3 are repeated many times (e.g. 50 times); the resulting p-values are aggregated and the final aggregated p-value is reported. \end{enumerate} We now illustrate the use of \texttt{diffnet\_multisplit} with an example. We consider GGMs (i.e. inverse covariance matrices) previously generated in Section \ref{sec:simulation}. <>= ## Set seed set.seed(1) ## Sample size and number of nodes p <- 30 ## Specify sparse inverse covariance matrices gen.net <- generate_2networks(p,graph='random',n.nz=rep(p,2), n.nz.common=ceiling(p*0.8)) invcov1 <- gen.net[[1]] invcov2 <- gen.net[[2]] ## Get corresponding correlation matrices cor1 <- cov2cor(solve(invcov1)) cor2 <- cov2cor(solve(invcov2)) @ We start with generating data under the ``null-scenario'' where both datasets have the same underlying network. <>= ## Generate data under null hypothesis library(mvtnorm) # To generate multivariate Gaussian random samples ## Sample size n <- 70 x1 <- rmvnorm(n,mean = rep(0,dim(cor1)[1]), sigma = cor1) x2 <- rmvnorm(n,mean = rep(0,dim(cor1)[1]), sigma = cor1) @ Then, we run a differential network analysis: <>= ## Run diffnet (under null hypothesis) dn.null <- diffnet_multisplit(x1,x2,b.splits=1,verbose=FALSE) @ We obtain the p-value \Sexpr{dn.null$ms.pval}, which is stored in \texttt{dn.null\$ms.pval}. The same analysis can be performed for data generated under the alternative hypothesis. <>= ## Generate data under alternative hypothesis (datasets have different networks) x1 <- rmvnorm(n,mean = rep(0,dim(cor1)[1]), sigma = cor1) x2 <- rmvnorm(n,mean = rep(0,dim(cor1)[2]), sigma = cor2) ## Run diffnet (under alternative) dn.altn <- diffnet_multisplit(x1,x2,b.splits=1,verbose=FALSE) @ The resulting p-value is \Sexpr{dn.altn$ms.pval} which indicates a highly significant network difference. The variable \texttt{b.splits} specifies the number of data splits used in the differential network procedure. The p-values in the previous examples were obtained using only a single data split (\texttt{b.splits=1}). P-values heavily depend on the random split of the data. This amounts to a "p-value lottery". To get stable and reproducible results we therefore would typically choose a larger number for the variable \texttt{b.split} and report the aggregated p-value. <>= ## Typically we would choose a larger number of splits # Use parallel library (only available under Unix) for computational efficiency if(.Platform$OS.type == "unix") { dn.altn <- diffnet_multisplit(x1,x2,b.splits=50,verbose=FALSE,mc.flag=TRUE) } else { dn.altn <- diffnet_multisplit(x1,x2,b.splits=25,verbose=FALSE,mc.flag=FALSE) } par(cex=0.7) plot(dn.altn, cex=0.5) # histogram over 50 p-values cat('p-value:',dn.altn$medagg.pval,'\n') # median aggregated p-value @ \subsection{Multivariate gene-set testing based on GGMs} In the case where molecular variables can be grouped into various sets of biologically related features (e.g. gene-sets or pathways), \texttt{ggmgsa\_multisplit} can be used to perform differential network analyses iteratively for all gene-sets. This allows us to identify gene-sets which show a significant network difference. For illustration we consider data generated from the following networks. <>= ## Generate new networks set.seed(1) p <- 9 # network with p nodes n <- 40 hub.net <- generate_2networks(p,graph='hub',n.hub=3,n.hub.diff=1)#generate hub networks invcov1 <- hub.net[[1]] invcov2 <- hub.net[[2]] plot_2networks(invcov1,invcov2,label.pos=0,label.cex=0.7, main=c('network 1', 'network 2'),cex.main=0.7) ## Generate data library('mvtnorm') x1 <- rmvnorm(n,mean = rep(0,p), sigma = cov2cor(solve(invcov1))) x2 <- rmvnorm(n,mean = rep(0,p), sigma = cov2cor(solve(invcov2))) @ %## Run DiffNet %# fit.dn <- diffnet_multisplit(x1,x2,b.splits=2,verbose=FALSE) %# fit.dn$medagg.pval The nodes can be grouped into three gene-sets where only the first has a different underlying network. <>= ## Identify groups with 'gene-sets' gene.names <- paste('G',1:p,sep='') gsets <- split(gene.names,rep(1:3,each=3)) @ We run GGM-GSA with a single data split (\texttt{b.splits=1}) and note that only the p-value for the first gene-set has small magnitude. Again, we would typically use a larger number of data splits in order to obtain stable p-values. <>= ## Run GGM-GSA fit.ggmgsa <- ggmgsa_multisplit(x1,x2,b.splits=1,gsets,gene.names,verbose=FALSE) library(xtable) print(xtable(summary(fit.ggmgsa),digits=6)) @ \subsection{Differential regression} In addition to differential network, this \textbf{R}-package also provides an implementation of differential regression. In particular, the function \texttt{diffregr\_multisplit} allows formal two-sample testing in the high-dimensional regression model. It is also based on sample splitting and is very similar to the previously introduced \texttt{diffnet\_multisplit}. Consider the following sparse regression models. <>= ## Number of predictors and sample size p <- 100 n <- 80 ## Predictor matrices x1 <- matrix(rnorm(n*p),n,p) x2 <- matrix(rnorm(n*p),n,p) ## Active-sets and regression coefficients act1 <- sample(1:p,5) act2 <- c(act1[1:3],sample(setdiff(1:p,act1),2)) beta1 <- beta2 <- rep(0,p) beta1[act1] <- 0.7 beta2[act2] <- 0.7 @ We generate data under the null-hypothesis and run differential regression. The histogram shows the distribution of the p-values obtained form ten data splits. <>= ## Response vectors under null-hypothesis y1 <- x1%*%as.matrix(beta1)+rnorm(n,sd=1) y2 <- x2%*%as.matrix(beta1)+rnorm(n,sd=1) ## Differential regression; b.splits=10 fit.null <- diffregr_multisplit(y1,y2,x1,x2,b.splits=10) par(cex=0.7) plot(fit.null,cex=0.5) # histogram of p-values from b.split data splits cat('p-value: ',fit.null$medagg.pval,'\n') # median aggregated p-value @ The following example illustrates differential regression in scenario with different regression models. <>= ## Response vectors under alternative-hypothesis y1 <- x1%*%as.matrix(beta1)+rnorm(n,sd=1) y2 <- x2%*%as.matrix(beta2)+rnorm(n,sd=1) ## Differential regression (asymptotic p-values) fit.alt <- diffregr_multisplit(y1,y2,x1,x2,b.splits=10) par(cex=0.7) plot(fit.alt) cat('p-value: ',fit.alt$medagg.pval,'\n') @ For differential regression we have the option to compute permutation-based p-values by choosing a number of permutations \texttt{n.perm}. <>= ## Differential regression (permutation-based p-values; 100 permutations) fit.alt.perm <- diffregr_multisplit(y1,y2,x1,x2,b.splits=5,n.perm=100) @ The default option (\texttt{n.perm=NULL}) uses an asymptotic approximation to calculate p-values. \section{Network estimation and model-based clustering with unknown group labels} % Network reconstruction with unknown components using \texttt{mixglasso} Often we do not know a priori which component each sample belongs to. For example in the case of samples corresponding to protein measurements in breast cancer patients, the particular subtype of breast cancer that a patient suffers from may be unknown. In these cases, our package allows for network-based clustering of the samples using the mixture graphical Lasso (mixglasso), which jointly clusters the samples and reconstructs the networks for each group or cluster. To demonstrate the \texttt{mixglasso} function, let us first generate some data in the same way as before, but with means defined to ensure separability of the groups: <>= # Generate networks with random means and covariances. n = 1000 p = 10 s = 0.9 n.comp = 3 # Create different mean vectors Mu = matrix(0,p,n.comp) # Define non-zero means in each group (non-overlapping) nonzero.mean = split(sample(1:p),rep(1:n.comp,length=p)) # Set non-zero means to fixed value for(k in 1:n.comp){ Mu[nonzero.mean[[k]],k] = -2/sqrt(ceiling(p/n.comp)) } # Generate data sim.result = sim_mix_networks(n, p, n.comp, s, Mu=Mu) @ Now we will run mixglasso on this dataset to retrieve the original clustering and reconstruct the underlying networks. <>= # Run mixglasso mixglasso.result = mixglasso(sim.result$data, n.comp=3) # Calculate adjusted rand index to judge how accurate the clustering is # Values > 0.7 indicate good agreement. library(mclust, quietly=TRUE) adj.rand = adjustedRandIndex(mixglasso.result$comp, sim.result$comp) cat('Adjusted Rand Index', round(adj.rand, digits=2), '\n') @ Table \ref{table:crosstab} shows the cross-tabulation of the number of samples in predicted versus true groups. <>= crosstab <- function(class1,class2){ tab <- matrix(NA,length(levels(class1)),length(levels(class2))) colnames(tab) <- levels(class2) rownames(tab) <- levels(class1) for (i in levels(class1)){ tab[i,] <- table(class2[which(class1==i)]) } return(tab) } # Relabel true groupings to avoid confusion sim.grouping = c('A', 'B', 'C')[sim.result$comp] cross.table = crosstab(factor(mixglasso.result$comp), factor(sim.grouping)) # Generate table library(xtable) latex.table = xtable(cross.table, caption= paste('Cross-tabulation of mixglasso clusters (rows) with', 'true group assignments (columns).'), label='table:crosstab') print(latex.table) @ What if we don't know the true number groups? Luckily, \texttt{mixglasso} supports model comparison using BIC \cite{BIC} and minimum description length \cite{MMDL}. In the following example we will use BIC to find the correct number of components: <>= # Run mixglasso over a range of numbers of components mixglasso.result = mixglasso(sim.result$data, n.comp=1:6) # Repeat with lambda=0 and lambda=Inf for comparison mixglasso.result.0 = mixglasso(sim.result$data, n.comp=1:6, lambda=0) mixglasso.result.Inf = mixglasso(sim.result$data, n.comp=1:6, lambda=Inf) # Aggregate BIC results for plotting BIC.vals = c(mixglasso.result$bic, mixglasso.result.0$bic, mixglasso.result.Inf$bic) lambda.labels = rep(c('Default', 'Lambda = 0', 'Lambda = Inf'), each=6) # Plot to verify that minimum BIC value corresponds with true library(ggplot2) plotting.frame <- data.frame(BIC=BIC.vals, Num.Comps=rep(1:6, 3), Lambda=lambda.labels) p <- ggplot(plotting.frame) + geom_line(aes(x=Num.Comps, y=BIC, colour=Lambda)) + geom_vline(xintercept=3, linetype='dotted') print(p) @ We note that mixglasso involves a penalization parameter $\lambda$ which trades off goodness-of-fit and model complexity. We recommend to use the default which employs an adaptive and automatic penalization scheme \cite{staedler_mixglasso}. Note that in this simplified example, $\lambda=0$ (no penalization) performs well because %the true %inverse covariance matrices are not particularly sparse and $n >> p$. $\lambda=\infty$ constrains inverse covariance matrices to be diagonal, hence the inferior performance. \section{Plotting and exporting results} Our package includes several functions for plotting and exporting the networks and results that have been obtained. \subsection{Plotting results} The output of \texttt{het\_cv\_glmnet} and \texttt{mixglasso} can be plotted either in network form or as individual edges in the networks. For the network plots, we use the \texttt{network} package \cite{network_package}. This is the default plotting when \texttt{plot} is invoked on an object of class \texttt{nethetclustering}, and produces one global plot showing edges that occur in any group, as well as one plot for each group. For this example we will use the networks and clustering obtained using \texttt{mixglasso} in the previous section. <>= # Retrieve best clustering and networks by BIC mixglasso.clustering = mixglasso.result$models[[mixglasso.result$bic.opt]] # Plot networks, omitting edges with absolute partial correlation < 0.5 in # every group. # NOTE: Not displayed. # plot(mixglasso.clustering, p.corrs.thresh=0.5) @ Usually we are only interested in specific edges, and perhaps we wish to compare them among groups. Function \texttt{dot\_plot} generates a plot with edges above a certain threshold along the y-axis, and one circle for each group showing the smallest mean of the two nodes that make up the edge. We use the \texttt{ggplot2} package to make the plots \cite{ggplot2_package}. <>= # Plot edges, omitting those with absolute partial correlation < 0.5 in every # group. g = dot_plot(mixglasso.clustering, p.corrs.thresh=0.5, dot.size.range=c(1,5)) @ Finally, we might want to compare the observed values of the nodes linked by specific edges across groups. Function \texttt{scatter\_plot} will generate plots for a specified list of edges. <>= # Specify edges node.pairs = rbind(c(9,10), c(2,5),c(4,9)) # Create scatter plots of specified edges g = scatter_plot(mixglasso.clustering, data=sim.result$data, node.pairs=node.pairs, cex=0.5) @ \subsection{Exporting Results} Our package offers the option to export the inferred networks as a comma-separated values (CSV) text file. Like the plotting functions, function \texttt{export\_network} can be invoked on the output of \texttt{het\_cv\_glmnet} and \texttt{mixglasso}. <>= # Save network in CSV format, omitting edges with absolute partial correlation # less than 0.25. #export_network(mixglasso.clustering, file='nethet_network.csv', # p.corrs.thresh=0.25) @ This creates a CSV file encoding a table with one row for each edge with partial correlation above the threshold, and columns indicating the nodes linked by the edge, the absolute partial correlation, the sign of the partial correlation, and the group or cluster in which the edge occurred. If the user wishes to use the Cytoscape \cite{cytoscape} software to analyse the network further, we note that the output of \texttt{export\_network} can be loaded into Cytoscape, provided the option \texttt{quote=FALSE} is set. <>= # Save network in CSV format suitable for Cytoscape import #export_network(mixglasso.clustering, file='nethet_network.csv', # p.corrs.thresh=0.25, quote=FALSE) @ <>= sessionInfo() @ \bibliography{nethet} \end{document}