%\VignetteIndexEntry{Main vignette: reconstruction and analysis of transcriptional networks in R} %\VignetteKeywords{Reconstruction and analysis of transcriptional networks in R} %\VignettePackage{RTN} \documentclass[11pt]{article} \usepackage{Sweave,fullpage} \usepackage{float} \SweaveOpts{keep.source=TRUE,eps=FALSE,width=4,height=4.5} \newcommand{\Robject}[1]{\texttt{#1}} \newcommand{\Rpackage}[1]{\textit{#1}} \newcommand{\Rfunction}[1]{\textit{#1}} \usepackage{subfig} \usepackage{color} \usepackage{hyperref} \definecolor{linkcolor}{rgb}{0.0,0.0,0.75} \hypersetup{colorlinks=true, linkcolor=linkcolor, urlcolor=cyan} \bibliographystyle{unsrt} %-------------------------------------------- %-------------------------------------------- %-------------------------------------------- \title{ Vignette for \emph{RTN}: reconstruction of transcriptional networks \\ and analysis of master regulators. } \author{ Mauro AA Castro, Xin Wang, Michael NC Fletcher, \\ Florian Markowetz and Kerstin B Meyer \thanks{Cancer Research UK - Cambridge Research Institute, Robinson Way Cambridge, CB2 0RE, UK.} \\ \texttt{\small kerstin.meyer@cancer.org.uk} \\ } \begin{document} \maketitle \tableofcontents <>= options(width=70) @ %-------------------------------------------- %-------------------------------------------- %-------------------------------------------- \newpage \section{Overview} The package \emph{RTN} is designed for reconstruction and analysis of transcriptional networks (TN) using mutual information \cite{Margolin2006a}. It is implemented by S4 classes in \emph{R} \cite{Rcore} and extends several methods previously validated for assessing transcriptional regulatory units, or regulons (\textit{e.g.} MRA \cite{Carro2010}, GSEA \cite{Subramanian2005}, synergy and shadow \cite{Lefebvre2010}). The package computes the mutual information (MI) between annotated transcription factors (TFs) and all potential targets using gene expression data. It is tuned to deal with large gene expression datasets in order to build genome-wide transcriptional networks centered on TFs and regulons. Using a robust statistical pipeline, \emph{RTN} allows user to set the stringency of the analysis in a stepwise process, including a boostrep routine designed to remove unstable associations. Parallel computing is available for critical steps demanding high-performance. %-------------------------------------------- %-------------------------------------------- %-------------------------------------------- \section{Quick start} \subsection{Transcriptional network inference} \begin{itemize} \item 1 - Load a sample dataset The \Robject{dt4rtn} dataset consists of a list with 6 objects used for demonstration purposes only. It was extracted, pre-processed and size-reduced from \cite{Fletcher2013} and \cite{Curtis2012} and contains a named gene expression matrix (\Robject{gexp}), a data frame of with \Robject{gexp} annotation (\Robject{gexpIDs}), a named numeric vector with differential gene expression data (\Robject{pheno}), a data frame with \Robject{pheno} annotation (\Robject{phenoIDs}), a character vector with genes differentially expressed (\Robject{hits}), and a named vector with transcriptions factors (\Robject{tfs}). \begin{small} <>= library(RTN) data(dt4rtn) @ \end{small} \item 2 - Create a new TNI object and run pre-processing Objects of class \Rfunction{TNI} provide a series of methods to do transcriptional network inference from high-throughput gene expression data. In this 1st step, the generic function \Rfunction{tni.preprocess} is used to run several checks on the input data. \begin{scriptsize} <>= #Input 1: 'gexp', a named gene expression matrix (samples on cols) #Input 2: 'transcriptionFactors', a named vector with TF ids (3 TFs for quick demonstration!) #Input 3: 'gexpIDs', an optional data frame with gene annotation (it can be used to remove duplicated genes) rtni <- new("TNI", gexp=dt4rtn$gexp, transcriptionFactors=dt4rtn$tfs[c("PTTG1","E2F2","FOXM1")] ) rtni<-tni.preprocess(rtni,gexpIDs=dt4rtn$gexpIDs) @ \end{scriptsize} \item 3 - Run permutation analysis The \Rfunction{tni.permutation} function takes the pre-processed \Rfunction{TNI} object and returns a transcriptional network inferred by mutual information (with multiple hypothesis testing corrections). \begin{small} <>= rtni<-tni.permutation(rtni) @ \end{small} \item 4 - Run bootstrap analysis In an additional step, unstable interactions can be removed by bootstrap analysis using the \Rfunction{tni.bootstrap} function, which creates a consensus bootstrap network (referred here as \textit{refnet}). \begin{small} <>= rtni<-tni.bootstrap(rtni) @ \end{small} \item 5 - Run DPI filter In the TN each target can be linked to multiple TFs and regulation can occur as a result of both direct (TF-target) and indirect interactions (TF-TF-target). The Data Processing Inequality (DPI) algorithm \cite{Meyer2008} is used to remove the weakest interaction in any triangle of two TFs and a target gene, thus preserving the dominant TF-target pairs, resulting in the filtered transcriptional network (referred here as \textit{tnet}). The filtered TN has less complexity and highlights the most significant interactions. \begin{small} <>= rtni<-tni.dpi.filter(rtni) @ \end{small} \item 6 - Get results All results available in the \Rfunction{TNI} object can be retrieved using the \Rfunction{tni.get} function: \begin{small} <>= tni.get(rtni,what="summary") refnet<-tni.get(rtni,what="refnet") tnet<-tni.get(rtni,what="tnet") @ \end{small} \item 7 - Build a graph The inferred transcriptional network can also be retrieved as an \Rfunction{igraph} \cite{igraph} object using the \Rfunction{tni.graph} function. The graph object includes some basic network attributes pre-formatted for visualization in the R package \Rpackage{RedeR} \cite{Castro2012}. \begin{small} <>= g<-tni.graph(rtni) @ \end{small} \end{itemize} \subsection{Transcriptional network analysis} \begin{itemize} \item 1 - Create a new TNA object (and run TNI-to-TNA pre-processing) Objects of class \Rfunction{TNA} provide a series of methods to do enrichment analysis on transcriptional networks. In this 1st step, the generic function \Rfunction{tni2tna.preprocess} is used to convert the pre-processed \Rfunction{TNI} object to \Rfunction{TNA}, also running several checks on the input data. \begin{scriptsize} <>= #Input 1: 'object', a TNI object with a pre-processed transcripional network #Input 2: 'phenotype', a named numeric vector of phenotypes #Input 3: 'hits', a character vector of gene ids considered as hits #Input 4: 'phenoIDs', an optional data frame with anottation used to aggregate genes in the phenotype rtna<-tni2tna.preprocess(object=rtni, phenotype=dt4rtn$pheno, hits=dt4rtn$hits, phenoIDs=dt4rtn$phenoIDs ) @ \end{scriptsize} \item 3 - Run MRA analysis pipeline The \Rfunction{tna.mra} function takes the \Rfunction{TNA} object and returns the results of the Master Regulator Analysis (RMA) \cite{Carro2010} over a list of regulons from a transcriptional network (with multiple hypothesis testing corrections). The MRA computes the overlap between the transcriptional regulatory unities (regulons) and the input signature genes using the hypergeometric distribution (with multiple hypothesis testing corrections). \begin{small} <>= rtna<-tna.mra(rtna) @ \end{small} \item 4 - Run overlap analysis pipeline A simple overlap among all regulons can also be tested using the \Rfunction{tna.overlap} function: \begin{small} <>= rtna<-tna.overlap(rtna) @ \end{small} \item 5 - Run GSEA analysis pipeline Alternatively, the gene set enrichment analysis (GSEA) can be used to assess if a given transcriptional regulatory unit is enriched for genes that are differentially expressed among 2 classes of microarrays (\textit{i.e.} a differentially expressed phenotype). The GSEA uses a rank-based scoring metric in order to test the association between gene sets and the ranked phenotypic difference. Here regulons are treated as gene sets, an extension of the GSEA statistics as previously described \cite{Subramanian2005}. \begin{small} <>= rtna<-tna.gsea1(rtna) @ \end{small} \item 6 - Get results All results available in the \Rfunction{TNA} object can be retrieved using the \Rfunction{tna.get} function: \begin{small} <>= tna.get(rtna,what="summary") tna.get(rtna,what="mra") tna.get(rtna,what="overlap") tna.get(rtna,what="gsea1") @ \end{small} \item 7 - Plot GSEA To visualize the GSEA distributions, the user can apply the \Rfunction{tna.plot.gsea1} function that plots the one-tailed GSEA results for individual regulons: \begin{small} <>= tna.plot.gsea1(rtna, file="tna_test", width=6, height=4, heightPanels=c(1,0.7,3), ylimPanels=c(0,3.5,0,0.8)) @ \end{small} \end{itemize} \newpage %%%%%% %Fig1% %%%%%% \begin{figure}[htbp] \begin{center} \includegraphics[width=0.85\textwidth]{tna_test.pdf} \end{center} \label{fig1} \caption{GSEA analysis showing genes in each regulon (as hits) ranked by their differential expression (as phenotype). This toy example illustrates the output from the \emph{TNA} pipeline evaluated by the \emph{tna.gsea1} method.} \end{figure} %-------------------------------------------- %-------------------------------------------- %-------------------------------------------- \clearpage \section{Session information} \begin{scriptsize} <>= print(sessionInfo(), locale=FALSE) @ \end{scriptsize} \newpage \bibliography{bib} \end{document}