\documentclass[a4paper]{article} \title{Usage of AMOUNTAIN} \author{Dong Li \\ dxl466@cs.bham.ac.uk\\ School of Computer Science, The University of Birmingham, UK } \date{Date modified: \Sexpr{Sys.Date()}} %\VignetteIndexEntry{Usage of AMOUNTAIN} <>= BiocStyle::latex() @ \begin{document} \SweaveOpts{concordance=TRUE} \maketitle In this example we embed parts of the examples from the \texttt{AMOUNTAIN} help page into a single document. And show how to use \texttt{AMOUNTAIN} step by step. \section{Network simulation} We follow \cite{Tensor} to construct gene co-expression networks for simulation study. Let $n$ be the number of genes, and edge weights $W$ as well as node score $z$ follow the uniform distribution in range $[0,1]$. A module contains $k$ genes inside which the edge weights as well as node score follow the uniform distribution in range $[\theta,1]$, where $\theta=\{0.5,0.6,0.7,0.8,0.9\}$. <<>>= library(AMOUNTAIN) n=100 k=20 theta=0.5 pp <- networkSimulation(n, k, theta) moduleid <- pp[[3]] netid <- 1:100 restp<- netid[-moduleid] groupdesign=list(moduleid,restp) names(groupdesign)=c('module','background') @ Figure \ref{fig:single} shows the weighted co-expression network when $n=100,k=20$ and red nodes indicate module members and wider edges mean larger similarities. Visualization is based on \texttt{qgraph} \cite{Qgraph}. \begin{figure} \centering <>= library(qgraph) pg<-qgraph(pp[[1]],groups=groupdesign,legend=TRUE) @ \caption{Simulated single layer network} \label{fig:single} \end{figure} When simulating a two-layer network, the basic method is to connect two independent networks with a inter-layer weight matrix $A$, which is designed to has larger weights between two modules. <<>>= n1=100 k1=20 theta1 = 0.5 n2=80 k2=10 theta2 = 0.5 ppresult <- twolayernetworkSimulation(n1,k1,theta1,n2,k2,theta2) A <- ppresult[[3]] pp <- ppresult[[1]] moduleid <- pp[[3]] netid <- 1:n1 restp<- netid[-moduleid] pp2 <- ppresult[[2]] moduleid2 <- pp2[[3]] netid2 <- 1:n2 restp2<- netid2[-moduleid2] library(qgraph) ## labelling the groups groupdesign=list(moduleid,restp,(moduleid2+n1),(restp2+n1)) names(groupdesign)=c('module1','background1','module2', 'background2') twolayernet<-matrix(0,nrow=(n1+n2),ncol=(n1+n2)) twolayernet[1:n1,1:n1]<-pp[[1]] twolayernet[(n1+1):(n1+n2),(n1+1):(n1+n2)]<-pp2[[1]] twolayernet[1:n1,(n1+1):(n1+n2)] = A twolayernet[(n1+1):(n1+n2),1:n1] = t(A) @ Figure \ref{fig:twolayer} shows the the two-layer weighted co-expression network based on above simulation. \begin{figure} \centering <>= g<-qgraph(twolayernet,groups=groupdesign,legend=TRUE) @ \caption{Simulated two-layer network} \label{fig:twolayer} \end{figure} \section{Module identification for single layer network} The algorithm for module identification requires the input With edge weights $W$ and vertex weight ${\bf z}$ for weighted network $G$. Here We show how to use the following function in the package to find active module for above simulated single layer network. With groundtruth in hand we can evaluate the quality of identified modules by F-score \cite{AMOUNT}. In order to get higher quality, we need to tune parameter $\alpha$ in the elastic net palnety and $\lambda = 1$ in the objective function. The common way to select two optimal parameters is grid search. <<>>== n = 100 k = 20 theta = 0.5 pp <- networkSimulation(n,k,theta) moduleid <- pp[[3]] alphaset = seq(0.1,0.9,by=0.1) lambdaset <- 2^seq(-5,5) ## using a grid search to select lambda and alpha Fscores <- matrix(0,nrow = length(alphaset), ncol = length(lambdaset)) for (j in 1:length(alphaset)) { for (k in 1:length(lambdaset)) { x <- moduleIdentificationGPFixSS(pp[[1]],pp[[2]], rep(1/n,n),maxiter = 500, a=alphaset[j],lambda = lambdaset[k]) predictedid<-which(x[[2]]!=0) recall <- length(intersect(predictedid,moduleid))/ length(moduleid) precise <- length(intersect(predictedid,moduleid))/ length(predictedid) Fscores[j,k] <- 2*precise*recall/(precise+recall) } } @ We can show $gridFscore$ by 3-D plot Figure \ref{fig:psselect} to see how these parameters affect the performance. By certain combination of these two parameters, we can almost exactly find the target model nodes with $F-score=1$. \begin{figure} \centering <>= persp(Fscores, theta = 45, phi = 30, col = "gray", scale = FALSE, xlab = 'alpha', ylab = 'lambda', zlab = 'F-score', main = 'Fscores of identified module', box = TRUE) @ \caption{Simulated two-layer network} \label{fig:psselect} \end{figure} \section{Module identification for two-layer network} The basic idea to identification modules on two-layer network is to find two active modules on each layer, at the same time with maximal inter-later links. we have function $moduleIdentificationGPFixSSTwolayer$ in the package. Following the two-layer network simulation in section 1, we call the method. <<>>== ## network simulation is the same as before modres=moduleIdentificationGPFixSSTwolayer(pp[[1]],pp[[2]], rep(1/n1,n1),pp2[[1]],pp2[[2]],rep(1/n2,n2),A) predictedid<-which(modres[[1]]!=0) recall = length(intersect(predictedid,moduleid))/ length(moduleid) precise = length(intersect(predictedid,moduleid))/ length(predictedid) F1 = 2*precise*recall/ (precise+recall) predictedid2<-which(modres[[2]]!=0) recall2 = length(intersect(predictedid2,moduleid2))/ length(moduleid2) precise2 = length(intersect(predictedid2,moduleid2))/ length(predictedid2) F2 = 2*precise2*recall2/(precise2+recall2) @ And we can also select optimal paramters combination in a more sophscated way based on the example in section 2. \section{Module identification for real-world data} The usage of the package functions is the same for real-world data, but we need to be aware about two aspects. First of all the way to calculate edges score and nodes score in weighted network can make an impact on the performance. Different input $W$ and $\bf z$ in the objective function may lead to different modules. For instance, we may need to try several different forms of $W$ other than using just the correlation matrix in practice. We tried the same differential analysis method of gene pairs as \cite{hsu2015functional}. See \cite{AMOUNT} for more details. Secondly we do not have groundtruth about module membership for real world data. In this case we may need to select the proper parameter so that the desired module size can be archived. When fixing $\lambda=0.01$, we use a binary search method to select $\alpha$ for elastic net penalty which control the sparsity of the module. <>== ## binary search parameter to fix module size to 100~200 abegin = 0.01 aend = 0.9 maxsize = 200 minsize = 100 for (i in 1:100) { x <- moduleIdentificationGPFixSS(W,z,rep(1/n,n), a=(abegin+aend)/2,lambda = 0.001,maxiter = 500) predictedid<-which(x[[2]]!=0) if(length(predictedid) > maxsize){ abegin = (abegin+aend)/2 }else if (length(predictedid) < minsize){ aend = (abegin+aend)/2 }else break } @ \section{Biological explanation} Finally we can do gene annotation enrichment analysis with intergative tools like DAVID\footnote{https://david.ncifcrf.gov} or Enrichr\footnote{http://amp.pharm.mssm.edu/Enrichr}, to see whether a module gene list can be explained by existing biological process, pathways or even diseases. \section{Session info} <>= toLatex(sessionInfo()) @ \begin{thebibliography}{1} \bibitem{Tensor} Li W, Liu C C, Zhang T, et al. Integrative analysis of many weighted co-expression networks using tensor computation. PLoS Comput Biol, 2011, 7(6): e1001106. \bibitem{Qgraph} Epskamp S, Cramer A O J, Waldorp L J, et al. qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 2012, 48(4): 1-18. \bibitem{hsu2015functional}Hsu, Chia-Lang, Hsueh-Fen Juan, and Hsuan-Cheng Huang. Functional Analysis and Characterization of Differential Coexpression Networks. Scientific reports 5 (2015). \bibitem{AMOUNT} Dong Li et al. Active modules for multilayer weighted gene co-expression networks: a continuous optimization approach. 2016. \end{thebibliography} \end{document}