%\VignetteIndexEntry{pwOmics} \documentclass[a4paper]{article} \usepackage[utf8]{inputenc} \usepackage{graphicx} \usepackage{enumitem} \usepackage{epstopdf} \usepackage{csquotes} \usepackage[english]{babel} %\usepackage[sort, numbers]{natbib} \usepackage[backend = bibtex, style=numeric]{biblatex} \usepackage{subcaption} \usepackage{hyperref} \addbibresource{pwOmics_vignette.bib} \setlength{\parindent}{0pt} \title{pwOmics - \\ Pathway-based Integration of time-series Omics Data using public database knowledge} \author{Astrid Wachter\\ Medical Statistics, University Medical Center Göttingen, Germany} \begin{document} %\bibliographystyle{ieeetr} \SweaveOpts{concordance=TRUE} \maketitle \tableofcontents \section{Introduction} Characterization of biological processes can be performed in great detail with the increased generation of omics data on different functional levels of the cell. Especially interpretation of time-series omics data measured in parallel with different platforms is a complex but promising task, needing consideration of time-independent combination of omics data and additionally time-dependent signaling analysis. As each measurement technique shows a certain bias and has natural limitations in identifying full signaling responses \cite{yeger-lotem_bridging_2009}, such cross-platform analysis is an up-to-date approach in order to connect biological implications on different signaling levels. Using diverse data types is expected to provide a deeper understanding of global biological functions and the underlying complex processes \cite{Kholodenko2012}.\\ This is why computational data analysis tools for interpretation of data from proteomics and transcriptomics measurements in parallel are needed.\\ \textit{pwOmics} is a tool for pathway-based level-specific data comparison and analysis of single time point or time-series omics data measured in parallel. It provides individual analysis workflows for the different omics data sets (see Figure~\ref{fig:analysis}) and in addition enables consensus analysis of omics data as shown in the workflow overview in Figure~\ref{fig:workflow}. \begin{figure}[htbp] \centering \includegraphics [angle=270, width = 320pt]{updown.pdf} \caption{\textit{pwOmics} downstream and upstream analysis.} \label{fig:analysis} \end{figure} \begin{figure}[htbp] \centering \includegraphics [width = 360pt]{workflow.pdf} \caption{\textit{pwOmics} workflow overview.} \label{fig:workflow} \end{figure} Up to this point analysis is restricted to human species. In future an expansion of the package is possible dependent on available online open access database information. \section{Databases} \label{sec:Databases} As \textit{pwOmics} is a package for data integration based on prior pathway and transcription knowledge data, it is necessary to define the databases to work with. Three different kinds of databases are necessary to do all analyses steps:\\ \begin{enumerate}[topsep=0pt,itemsep=-1ex,partopsep=1ex,parsep=1ex] \item Pathway databases:\\ The user can choose from Biocarta \cite{nishimura_biocarta_2001}, Reactome \cite{milacic_annotating_2012, croft_reactome_2014}, PID \cite{schaefer_pid:_2009} from the National Cancer Institute (NCI) and KEGG \cite{kanehisa_data_2014, kanehisa_kegg:_2000}.\\ \item Protein-protein interaction (PPI) database:\\ STRING \cite{franceschini_string_2013}.\\ \item Transcription factor (TF) - target gene databases:\\ The user can choose from ChEA \cite{lachmann_chea:_2010}, Pazar \cite{portales-casamar_pazar_2009, portales-casamar_pazar:_2007} and/or decide to specify an own file e.g. based on a commercial database.\\ \end{enumerate} The pathway database information is used to identify the pathways of the differentially abundant proteins in the downstream analysis as well as upstream protein regulators of TFs in the upstream analysis. The PPI database STRING \cite{franceschini_string_2013} was chosen to define the protein net for the consensus analysis. The TF - target gene databases information is necessary for the TF identification in pathways in the downstream analysis. Additionally the upstream TFs of differentially expressed genes/transcripts are identified in the upstream analysis based on this information.\\ In downstream analysis the pathway gene set information is used, whereas in the upstream analysis also the pathway topology information is exploited.\\ The database information is downloaded internally via \textit{STRINGdb} and \textit{AnnotationHub} \cite{Morgan} package. In case the author is interested also in the metadata of the pathway database and TF - target database it can be received by <>= library(pwOmics) library(AnnotationHub) ah = AnnotationHub() #pathway databases pw = query(ah, "NIH Pathway Interaction Database") pw[1] #TF-target databases chea = query(ah, "ChEA") chea[1] pazar = query(ah, "Pazar") pazar[1] @ In case you want to use TF - target gene information which is not part of the mentioned databases but e.g. part of a commercial database, a user-specified file can be used for the analysis. This file should be a `.txt' file with first column transcription factors and second column target gene symbols without a header, e.g.:\\ \begin{center} \fbox{\parbox[c]{3cm}{ GATA-4 HAMP\\ c-Jun IL18\\ NF-kappaB TLR2\\ MYB LTB\\ FOXO1A TGFBR1\\ ...}} \end{center} The STRING PPI-information is downloaded automatically while processing and analyzing the data: The \textit{STRINGdb} package \cite{franceschini_string_2013} is used here. \section{Example dataset} The example dataset used here for demonstration purposes is the one presented in \cite{Waters2012}, which comprises the mitogenic response of human mammary epithelial cells to epidermal growth factor (EGF). This dataset includes whole genome time course microarray data measured with NimbleGen whole genome 60-mer oligonucleotide arrays (Design Version 2003\_10\_27) at time points 0, 1, 4, 8, 13, 18 and 24 hr after EGF stimulation. The complementary proteomics data was measured with LC-FTICR (Fourier-transform ion cyclotron resonance-mass spectrometry coupled with advanced capillary liquid chromatography) at time points 0.25, 1, 4, 8, 13, 18 and 24 hours after EGF stimulation. Preprocessing of data was done as described in \cite{Waters2012} resulting in lists of significant genes and proteins for each time point as log10 expression ratios relative to the time 0 hr controls. \section{Data pre-processing} \label{sec:Datapre} \textit{pwOmics} is a package for secondary data analysis, i.e. it needs already pre-processed data as input for the analysis. The input required is\\ \begin{enumerate}[topsep=0pt,itemsep=-1ex,partopsep=1ex,parsep=1ex] \item a list of all protein IDs measured,\\ \item a list of all gene/transcript IDs measured,\\ \item a list of differentially abundant proteins + log fold changes,\\ \item a list of differentially expressed genes/transcripts + log fold changes.\\ \end{enumerate} The IDs need to be gene symbols, both for protein and gene/transcript data. In case time-series data is analyzed inputs 3. and 4. needs to be specified for each time point. It is absolutely necessary, that all proteins and genes/transcript in inputs 3. and 4. are part of the lists of all protein IDs and all gene/transcript IDs, respectively. \\ The OmicsData object is the format used for data analysis in \textit{pwOmics} package. It contains a list of four main elements:\\ \begin{enumerate}[topsep=0pt,itemsep=-1ex,partopsep=1ex,parsep=1ex] \item OmicsD - here the omics data set, its description and the results are stored\\ \item PathwayD - here the chosen pathway databases and the generated Biopax model is stored\\ \item TFtargetsD - here the chosen TF-target gene databases and the combined TF-target gene information is stored\\ \item Status - The status variable equals `1' in case not all information needed for the analysis is read in yet and `2' after identification of the first upstream/downstream signaling levels. As the enrichment step is not necessarily part of the analysis and dependent on the pathway database and the TF-target gene database the identification of signaling molecules in further levels might not be successful, the status variable is not used in the further analysis.\\ \end{enumerate} Thus \textit{pwOmics} reads in the omics data set provided by the user to the first element of the OmicsData object and further on stores all the results in this part as well.\\ This is why the user has to provide the omics data set in a special format: A list needs to be generated with a protein list named `P' as first element and a gene/transcript list named `G' as second element. These lists contain as first element a data frame with all (unique) protein IDs and gene/transcript IDs in the first column, respectively, and as second element a list with data frames for each time point of measurement. The data frames have two columns with the first one containing the differentially abundant/expressed proteins or genes/transcripts as gene symbols and the second column containing the corresponding log fold changes, e.g.:\\ <>= data(OmicsExampleData) OmicsExampleData @ Generated as in the following example: \begin{verbatim} OmicsExampleData = list(P = list(allPIDs, list(PIDstp0.25, PIDstp1, PIDstp4, PIDstp8, PIDstp13, PIDstp18, PIDstp24)), G = list(allGIDs, list(GIDstp1, GIDstp4, GIDstp8, GIDstp13, GIDstp18, GIDstp24))) \end{verbatim} <>= head(OmicsExampleData$P[[2]][[1]]) @ In case the user only wants to analyze omics data from a single time point just one data frame has to be specified.\\ The time points do not have to be the same for protein and gene/transcript data and need to be specified when reading in the omics data set separately via the `tp\_prots' and `tp\_genes' parameters of the `readOmics' function. <>= data_omics = readOmics(tp_prots = c(0.25, 1, 4, 8, 13, 18, 24), tp_genes = c(1, 4, 8, 13, 18, 24), OmicsExampleData, PWdatabase = c("biocarta", "kegg", "nci", "reactome"), TFtargetdatabase = c("chea", "pazar")) @ If data from a single timepoint measurement should be analyzed the user simply assigns the experiment number `1' for these parameters:\\ \begin{verbatim} #for single time point data set: omics = list(P = list(allPIDs, list(PIDs_1)), G = list(allGIDs, list(GIDs_1))) data_omics = readOmics(tp_prots = c(1), tp_genes = c(1), OmicsExampleData, PWdatabase = c("biocarta", "kegg", "nci", "reactome"), TFtargetdatabase = c("chea", "pazar")) \end{verbatim} Additionally the selected databases have to be specified.\\ The stored information can be easily accessed via the following functions: <>= getOmicsTimepoints(data_omics) head(getOmicsallProteinIDs(data_omics)) head(getOmicsallGeneIDs(data_omics)) head(getOmicsDataset(data_omics, writeData = FALSE)[[1]]) @ \section{Individual analysis} As shown in Figure~\ref{fig:analysis} the analysis is based on an individual analysis of the proteomic and the genomic/transcriptomic data. The downstream analysis and upstream analysis are described in the following subsections. Prior to that the database information has to be read in. In a first step the TF- target information can be made accessible to the OmicsData object by: \begin{verbatim} data_omics = readTFdata(data_omics) \end{verbatim} Via the `TF\_target\_path' parameter the path of the user-specified file can be given. This information can be used additionally to the selected database content. Secondly, the `readPWdata' function takes the OmicsData object with the provided information about the omics data set and the path of the prepared `.RData' files from the pathway databases (see section~\ref{sec:Databases}) and automatically generates the corresponding genelists of the pathway data if `loadgenelists = FALSE'. In this step the automatic definition of internal differing IDs for different pathway databases is necessary, which are stored in a new biopax model in the OmicsData object. \begin{verbatim} data_omicsPW = readPWdata(data_omics, loadgenelists = FALSE) \end{verbatim} As the process of generating genelists with these IDs can take some time - especially for rather big databases such as Reactome \cite{milacic_annotating_2012, croft_reactome_2014} - the genelists for the different databases are automatically stored in the working directory and can be reused in another analysis when the corresponding path containing these files is given to the `readPWdata' function as loadgenelists parameter. \begin{verbatim} data_omics = readPWdata(data_omics, loadgenelists = "Genelist_reactome.RData") \end{verbatim} Automatically the information of the selected databases and/or the corresponding user-specified file are merged. The file format (if this option is used) should be exactly as specified in section~\ref{sec:Databases}.\\ \subsection{Downstream analysis} The downstream analysis is starting with the provided proteomic data (either single time point data or time-series data). The first step is the identification of the pathways in which the differentially abundant proteins play a role. \textit{pwOmics} performs this searching step on the basis of the provided proteomic data set and the selected pathway database(s). After reading in these information the user can follow the workflow for downstream analysis and identify the pathways in which the differentially abundant proteins are present: \begin{verbatim} data_omics = identifyPWs(data_omics) \end{verbatim} In a next step pathway enrichment can be conducted. The user can specify the multiple testing correction method as well as the significance level for this step. In case of few identified pathways this might result in too few pathways for further analysis. In this case the enrichment step should be skipped. \begin{verbatim} data_omics = enrichPWs(data_omics, "BH", alpha = 0.05) \end{verbatim} Following the workflow the next step is the identification of the transcription factors in these (enriched) pathways, which is done with the information provided by the chosen TF-target gene database. The user can choose if only the enriched pathways or all pathways should be considered for further analysis: \begin{verbatim} data_omics = identifyPWTFTGs(data_omics, only_enriched = FALSE) \end{verbatim} For use of this function the working directory should contain the previously generated genelists.\\ The results of the downstream analysis can be easily accessed by the following functions: \begin{verbatim} getDS_PWs(data_omics) getDS_TFs(data_omics) getDS_TGs(data_omics) #Access biopax model generated newly on basis of selected #pathway databases: getBiopaxModel(data_omics) \end{verbatim} \subsection{Upstream analysis} The upstream analysis is starting with the provided gene/transcript data (either single time point data or time-series data). It first of all identifies the upstream TFs of the differentially expressed genes/transcripts. This step is done with the provided/selected information of TF-target gene pairs. Given this information, the identification of upstream TFs can be done: \begin{verbatim} data_omics = identifyTFs(data_omics) \end{verbatim} Similarly as in the downstream analysis also in the upstream analysis an optional enrichment step can be conducted, but here on the TF level. \begin{verbatim} data_omics = enrichTFs(data_omics, "BH", alpha = 0.05) \end{verbatim} Upstream of the (enriched) TFs the regulator proteins can be identified with the following function: \begin{verbatim} data_omics = identifyRsofTFs(data_omics, only_enriched = FALSE, noTFs_inPW = 1, order_neighbors = 10) \end{verbatim} Again, the user can specify if only the enriched TFs or all TFs should be considered for further analysis. The identification of upstream regulators is done in the following way:\\ \begin{enumerate}[topsep=0pt,itemsep=-1ex,partopsep=1ex,parsep=1ex] \item Identification of the pathways the previously identified TFs are part of.\\ \item Selection of pathways according to the user-specified parameter `noTFs\_inPW': Only those pathways are considered in further analysis with at least this number of TFs in the pathway. Minimum number of TFs in the pathway is 2.\\ \item Upstream regulators are identified for these TFs. This is done by finding first for each TF the pathway neighborhood according to the user-specified parameter `order\_neighbors'. This parameter specifies the order of the identified pathway neighborhood. Then the intersection of all identified neighborhoods for all TFs in a pathway is determined. The resulting pathway node set is defined here as the set of regulator proteins.\\ \end{enumerate} In case the pathways under consideration do not have pathway components in the downloaded biopax model, this will be indicated with a warning. This warning can be ignored by the user in regard to the following analysis steps. The results of the upstream analysis can be accessed with the following functions: \begin{verbatim} getUS_TFs(data_omics) getUS_PWs(data_omics) getUS_regulators(data_omics) \end{verbatim} \section{Consensus analysis} The consensus analysis combines the results from upstream and downstream analysis by constituting in particular the comparative analysis of the results of the two different data sets. The intersection analysis simply compares the results of the separate upstream and downstream analysis. The static consensus analysis enables setting up static consensus graphs for each time point measured in parallel. Finally, the consensus-based dynamic analysis provides the user with one final dynamic network obtained from the data changes over time based on dynamic bayesian network inference. The consensus-based dynamic analysis is self-evidently only conductable with time-series data sets measured for proteome and genome/transcriptome data in parallel. \subsection{Intersection analysis} The intersection analysis of \textit{pwOmics} is a simple comparative analysis of the results of upstream and downstream analysis. Thus, it enables a comparison of single time point data and time-series data, the latter also for non-corresponding time points in the different data sets. The comparison is possible on the three different functional levels considered in this package: On the proteome level, the transcription factor level and gene/transcript level. \begin{verbatim} getProteinIntersection(data_omics, tp_prot = 4, tp_genes = 4) getTFIntersection(data_omics, tp_prot = 4, tp_genes = 1) getGenesIntersection(data_omics, tp_prot = 4, tp_genes = 13) \end{verbatim} These functions not only enable a comparison of the same timepoints on the distinct levels, but for time-series data sets also for non-matching time points:\\ With the time resolution of measuring omics data in most cases being pre-defined by expected signaling changes and financial limitations the potential in the interpretation of the results is strongly confined to the experimental design decisions. Thus, measured signaling changes, which naturally consist of a superposition of diverse time-scales of transcriptional and translational processes and comprehend diverse frequency patterns \cite{yosef_impulse_2011}, are dependent on the sampling. This means for some of the signaling axes it might be the case, that \begin{itemize} \item changes are not detected at all as their rate is too high, \item hopefully most are represented in the data and \item some might be so slow that their change is not considered significant and thus are excluded from analysis. \end{itemize} As the corresponding signaling changes are not expected to be seen at the same time point in proteome data and gene/transcript data it is necessary to enable also the comparison of non-corresponding time points.\\ The possibility to compare such time points naturally cannot account for the changes not captured during measurement, however, it gives the possibility to consider also the time needed for regulatory control mechanisms in the interpretation of the measurement results - even if this shows considerable variations as well.\\ In case the user wants to compare the corresponding time points on the three levels simultaneously he can do so by using the following function: \begin{verbatim} gettpIntersection(data_omics) \end{verbatim} \subsection{Static consensus analysis} The static consensus analysis goes one step ahead and integrates the results gained from the comparative analysis of the corresponding time points to a consensus net for each time point. The change of this consensus net over time gives a first insight into the changes seen statically at the different time points. However, the static consensus nets do not yet include information gathered over time - as it is the case in the consensus-based dynamic analysis (see section~\ref{sec:DynCons}). This is why the static consensus analysis is also applicable for single time point measurements.\\ The static consensus analysis is conducted by generation of a Steiner tree \cite{Kleinberg_AlgorithmDesign_2006} on basis of matching proteins and TFs identified in downstream and upstream analysis for each corresponding time point. The underlying graph used is the protein-protein-interaction (PPI) graph from the STRING database reduced to the connected nodes. The matching proteins and TFs are considered as terminal nodes and are connected via a shortest path-based approximation of the Steiner tree algorithm \cite{Takahashi_Steiner_1980, Sadeghi2013} across the reduced PPI-STRING-graph. Subsequently knowledge of TF-target gene pairs from the chosen database is used to expand the graph with matching genes/transcripts from both upstream and downstream analysis. In case the consensus graph contains corresponding proteins and genes/transcripts, feedback loops are added automically. \begin{verbatim} consensusGraphs = staticConsensusNet(data_omics) \end{verbatim} \subsection{Consensus-based dynamic analysis} \label{sec:DynCons} Unlike the static consensus analysis, the consensus-based dynamic analysis takes into consideration also the signaling changes over time by applying dynamic bayesian network inference. The packages used for the consensus-based dynamic analysis are \textit{longitudinal} \cite{Opgen_longi_2006, rainer_opgen-rhein_inferring_2006} to adjust the format of the data and the actual network inference part is done via the \textit{ebdbNet} \cite{Rau_ebdbnet_2010} package. This package includes an iterative empirical Bayesian procedure with a Kalman filter estimating the posterior distributions of the network parameters. The defined prior structure of the network is used for a straightforward estimation of hyperparameters via an expectation maximization (EM)-like algorithm and the dimension of the hidden states are determined via the singular value decomposition (SVD) of a block-Hankel matrix.\\ The nodes included into the network inference step are nodes which are part of the static consensus graphs from corresponding time points of the two different measurement types, i.e.\\ \begin{enumerate}[topsep=0pt,itemsep=-1ex,partopsep=1ex,parsep=1ex] \item proteins identified in upstream and downstream analysis at the same time points,\\ \item Steiner nodes identified via static consensus analysis,\\ \item TFs identified in upstream and downstream analysis at the same time points and\\ \item genes/transcripts identified in upstream and downstream analysis at the same time points.\\ \end{enumerate} To apply dynamic network inference a reasonable number of measurements needs to be available. As in most cases of parallel protein and gene/transcript measurements only very few corresponding time steps are available it is necessary to artificially introduce additional time steps. This is done by generating smoothing splines applied on the log fold changes provided by the user under the simplifying assumption of a gradual change of signaling between the different time points.\\ This assumption, however, has to be applied consciously and carefully, as there might be higher frequency signaling components superimposed (see for a comprehensive analysis of temporal dynamics of gene expression \cite{yosef_impulse_2011}). In theory a signal has to be sampled 2 times its maximal frequency in order to be able to reconstruct it exactly from time discrete measurements (Nyquist-Shannon sampling theorem \cite{Shannon_theorem_1949, Nyquist_theorem_1928}). This means only exact interpretation of those signaling axes are possible that have a frequency which is smaller than half of the sampling frequency. However, under certain preconditions on signal structure and the sampling operator reconstruction of the original signal can be done with a lower sampling rate \cite{DBLP:journals/tit/BlumensathD09}. This is an interesting starting point for a more comprehensive dynamic analysis of the expected signals and the sampling needed for an extensive data mining of omics data sets measured in parallel, but exceeds the scope of this package. \\ The number of time points generated additionally via smoothing splines is based on simulation results of \textit{ebdbNet} analysis for median area under the curve (AUC) values of receiver operating characteristic (ROC) curves: In their results it was shown that a plateau at around 50 to 75 time points was reached. Thus in \textit{pwOmics} 50 time points are predicted with smoothing splines in order to apply dynamic bayesian network inference on omics data sets measured in parallel.\\ After generation of these time points a block-Hankel matrix of autocovariances is constructed based on the time series abundance/expression data. For this the user needs to provide the \textit{laghankel} parameter, specifying the maximum relevant time lag to be used in constructing the block-Hankel matrix. With a singular value decomposition (see function `hankel' of \textit{ebdbNet} package) the number of hidden states can be determined. Here, the user can specify the \textit{cutoffhankel} parameter to choose the cutoff to determine the desired percent of total variance explained by the singular values. Additional parameters on convergence criteria and iterations performed can be specified. For further details the user is referred to \cite{Rau_ebdbnet_2010}. \begin{verbatim} library(ebdbNet) library(longitudinal) dynInferredNet = consDynamicNet(data_omics, consensusGraphs, laghankel = 3, cutoffhankel = 0.9) \end{verbatim} \section{Time profile clustering} An additional analysis option is clustering of co-regulation patterns over time. It provides information about the signaling molecules with common dynamic behaviour and thus allows to draw conclusions in terms of signaling chronology. Time profile clustering is performed as soft clustering based on the \textit{Mfuzz} package \cite{Mfuzz2012}. The advantage of this clustering method is that a protein, TF or gene/transcript can be assigned to several clusters, thus reducing the sensitivity to noise and the information loss hard clustering exhibits. It is implemented as fuzzy c-means algorithm \cite{Hathaway_Pattern1986} and iteratively optimizes the objective function to minimize the variation of objects within the clusters. The user needs to provide a `min.std' threshold parameter if proteins or genes/transcripts with a low standard deviation should be excluded. In addition the maximum number of cluster centers which should be tested in the `minimum distance between cluster centroid test' has to by given. This number is used as initial number to determine the data-specific maximal cluster number based on the number of distinct data points. For more details see \cite{Mfuzz2012} and \cite{schwammle_simple_2010}. \begin{verbatim} library(Mfuzz) fuzzyClusters = clusterTimeProfiles(dynInferredNet, min.std = 0, ncenters = 12) \end{verbatim} \section{Visualization} To complement the results from the different comparisons and analyses (accessible via the `get...' functions) the \textit{pwOmics} package provides visualization functions for the different analyses. The consensus graphs of the static analysis for one or more corresponding time points can be plotted with the following function (see Figure~\ref{fig:Consgraph}): \begin{verbatim} plotConsensusGraph(consensusGraphs, data_omics) \end{verbatim} \begin{figure}[htbp] \centering \includegraphics{ConsensusGraphs_1.pdf} \caption{\textit{pwOmics} static consensus graph: Time point 1 hr.} \end{figure} \begin{figure}[htbp] \centering \includegraphics{ConsensusGraphs_2.pdf} \caption{\textit{pwOmics} static consensus graph: Time point 4 hrs.} \end{figure} The consensus-based dynamic analysis result can be visualized as follows (see Figure~\ref{fig:DynConsgraph}): \begin{verbatim} plotConsDynNet(dynInferredNet, sig.level = 0.65) \end{verbatim} Here, the parameter `sig.level' is the significance level used as cutoff for plotting edges in the network and has to be specified in the range between 0 and 1. Furthermore the user can indicate if unconnected nodes should be removed and provide additional \textit{igraph} \cite{igraph2006} layout parameters.\\ \begin{figure}[htbp] \centering \includegraphics{dynConsensusNet_2.pdf} \caption{\textit{pwOmics} dynamic network graph.} \label{fig:DynConsgraph} \end{figure} However, as the user can access the networks easily \textit{tkplot} from the \textit{igraph} R package is a nice interactive graph drawing alternative. In addition plot parameters can be easily changed as the result networks are of class `igraph'. In order to plot the results from time profile clustering (see Figure~\ref{fig:cluster}) the following function can be used: \begin{verbatim} plotTimeProfileClusters(fuzzyClusters) \end{verbatim} The different colours represent the different clusters. The legend is only shown if the number of genes and proteins is not too large. Otherwise the user can easily access this information by having a look to the output of the `clusterTimeProfiles' function which provides information about cluster centers, the number of data points in each cluster of the closest hard clustering, cluster indices, and additional parameters explained in detail in the `mfuzz' documentation. In the legend the attachments `\_g' and `\_p', respectively, indicate, if the node originally derives from protein or gene/transcript measurements. \begin{figure}[htbp] \centering \includegraphics{TimeProfiles.pdf} \caption{\textit{pwOmics} time profile clusters.} \label{fig:cluster} \end{figure} \newpage \section{Session Information} \begin{itemize}[topsep=0pt,itemsep=-1ex,partopsep=1ex,parsep=1ex] \item R Under development (unstable) (2015-03-11 r67980), Platform: x86\_64-unknown-linux-gnu (64-bit), Running under: Ubuntu precise (12.04.5 LTS)\\ \item Locale: LC\_CTYPE=en\_US.UTF-8, LC\_NUMERIC=C,\\ LC\_TIME=en\_US.UTF-8, LC\_COLLATE=en\_US.UTF-8,\\ LC\_MONETARY=en\_US.UTF-8, LC\_MESSAGES=en\_US.UTF-8,\\ LC\_PAPER=en\_US.UTF-8, LC\_NAME=C, LC\_ADDRESS=C,\\ LC\_TELEPHONE=C, LC\_MEASUREMENT=en\_US.UTF-8,\\ LC\_IDENTIFICATION=C\\ \item Attached base packages: tcltk, parallel, stats, graphics, grDevices, utils, datasets, methods, base\\ \item Other attached packages: ebdbNet\_1.2.3, Mfuzz\_2.27.1, DynDoc\_1.45.0, widgetTools\_1.45.0, e1071\_1.6-4, Biobase\_2.27.3, BiocGenerics\_0.13.11, longitudinal\_1.1.11, corpcor\_1.6.7, igraph\_0.7.1, pwOmics\_0.99.2\\ \item Loaded via namespace (and not attached): Rcpp\_0.11.5, XVector\_0.7.4, BiocInstaller\_1.17.7, GenomeInfoDb\_1.3.19, plyr\_1.8.1,\\ AnnotationHub\_1.99.82, tkWidgets\_1.45.0, class\_7.3-12, bitops\_1.0-6,\\ biomaRt\_2.23.5, digest\_0.6.8, RSQLite\_1.0.0, shiny\_0.11.1, DBI\_0.3.1,\\ rBiopaxParser\_2.5.0, stringr\_0.6.2, httr\_0.6.1, S4Vectors\_0.5.23,\\ gtools\_3.4.2, caTools\_1.17.1, IRanges\_2.1.44, stats4\_3.2.0,\\ data.table\_1.9.4, R6\_2.0.1, AnnotationDbi\_1.29.24, XML\_3.98-1.1,\\ RJSONIO\_1.3-0, gdata\_2.13.3, reshape2\_1.4.1, STRINGdb\_1.7.0,\\ gplots\_2.16.0, htmltools\_0.2.6, GenomicRanges\_1.19.54, mime\_0.3,\\ interactiveDisplayBase\_1.5.6, xtable\_1.7-4, httpuv\_1.3.2,\\ KernSmooth\_2.23-14, RCurl\_1.95-4.5, chron\_2.3-45\\ \end{itemize} \printbibliography \end{document}