\documentclass{article} \renewcommand{\thefigure}{\thesection-\arabic{figure}} \usepackage[utf8]{inputenc} \usepackage{chngcntr} \usepackage{hyperref} \usepackage{url} %Make section number before figure number. \counterwithout{figure}{section} \title{How to use the BiGGR package} \author{Anand K. Gavai \& Hannes Hettling} %\date{\vspace{-5ex}} \date{} %\VignetteIndexEntry{BiGGR} \begin{document} %\setkeys{Gin}{width=0.6\textwidth} \maketitle \section{Introduction} The main purpose of this package is to analyze metabolic systems and estimate the biochemical reaction rates in metabolic networks. BiGGR works with the BiGG \cite{Schellenberger2010} database and with files encoded in the Systems Biology Markup Language (SBML) from other sources. The BiGG database stores reconstructions of metabolic networks and is freely accessible. BiGGR is an entirely open source alternative for a more extensive software package, COBRA 2.0, which is available for Matlab \cite{Schellenberger2011}. BiGGR makes it easy to apply a big variety of open source R packages to the analysis of metabolic systems. Although it contains less functionality than COBRA, BiGGR may be convenient for R users. The BiGG system provides metabolic reconstructions on humans, \textit{M. barkeri}, \textit{S. cerevisiae}, \textit{H. pylori}, \textit{E. coli} and \textit{S. aureus}. BiGGR also works with the new reconstruction of human metabolism Recon 2 \cite{Thiele2013}. These reconstructions consist of genes, metabolites, reactions and proteins that are identified and connected with each other to form a network structure. The BiGGR package provides various functions to interface to the BiGG database, and to perform flux balance analysis (FBA) after importing selected reactions or pathways from the database. Other functions included in this package allow users to create metabolic models for computation, linear optimization routines, and likelihood based ensembles of possible flux distributions fitting measurement data. To this end, BiGGR interfaces with the LIM package \cite{Oevelen2009}. BiGGR provides models in standard SBML R object format for each organism within the BiGG database as well as the new reconstruction of human metabolism from the Biomodels database \cite{Thiele2013} (see 'data' directory in the package). This format allows easy construction of the stoichiometric matrix of the entire system which may serve as the core of further computational analysis. Finally, the package allows automatic visualization of reaction networks based on a hypergraph framework using the hyperdraw \cite{MurrellXXXX} package. \section{Installation} BiGGR is installed as follows from the R console: <>= source("http://bioconductor.org/biocLite.R") biocLite("BiGGR") @ BiGGR depends on the Bioconductor packages rsbml \cite{Lawrence2013}, hyperdraw \cite{MurrellXXXX} (which in turn requires the package hypergraph) and the CRAN package LIM \cite{Oevelen2009}. For detailed installation instructions of the dependencies we refer to the package documentations at \url{http://www.bioconductor.org/} and \url{http://www.cran.r-project.org/}. \section{Example: A flux balance analysis with BiGGR} The package is imported as follows: <>= library("BiGGR") @ To get an overview about the functions and databases available in the package, you can use: <>= library(help="BiGGR") @ The reference manual which describes all functions of BiGGR in detail can be found in the documentation directory ('doc') of the package. In the following we will provide a step-by-step guide demonstrating a flux estimation procedure in a model of glycolysis and TCA cycle. The general work flow using this package consists of the following steps: \begin{itemize} \item{Retrieve a model in SBML object format as provided in the package (alternatively an R object containing the model can be generated from an SBML file)} \item{Specify optimization objective and model constraints and create a LIM model file as input for the linear programmimg package LIM} \item{Estimate the reaction fluxes with linear programming} \item{Visualize the results using the hypergraph framework} \end{itemize} \subsection{Generate Model} There are several ways to create a model within BiGGR: \begin{itemize} \item{Query one of the databases contained in the BiGGR package (use the command \verb|data()| to see all available databases). You can query with a list of genes (function \verb|buildSBMLFromGenes|), a list of reaction identifiers (\verb|buildSBMLFromReactionIDs|) or for specific pathways \\ (\verb|buildSBMLFromPathways|).} \item{Alternatively: Retrieve a text file with metabolic reactions from the web interface of the BiGG database (\url{http://bigg.ucsd.edu/bigg/main.pl}). The user can query and select reactions from BiGG which can then be exported in SBML or text format. BiGG reactions saved in text format can be converted to an internal SBML object by the function \verb|buildSBMLFromBiGG|. An SBML file can be imported using the \verb|rsbml_read| function from the rsbml package.} \end{itemize} Below we will demonstrate how to build an SBML model from a set of reaction identifiers using the Recon 1 database. The list of reaction IDs can be found in the extdata subdirectory in the package. The model comprises the reactions of glycolysis and TCA cycle \cite{VanBeek2011}: <>= ##load reaction identifiers from package examples file.name <- system.file("extdata", "Glycolysis_TCA_recon1_reactionIDs.txt", package="BiGGR") reaction.ids <- scan(file.name, what=" ") ##load database data("H.sapiens_Recon_1") ##build SBML model sbml.model <- buildSBMLFromReactionIDs(reaction.ids, H.sapiens_Recon_1) @ The model object \verb|sbml.model| is an rsbml object of class \verb|Model|. It has 80 metabolites participating in 63 reactions in 3 compartments. \subsection{Specify constraints, optimization objective and estimate fluxes} After building the model, we will specify the additional parameters necessary to run the flux estimation. In the present model, several metabolites are unbalanced because not all the biochemical reactions involving them are represented inside the model. Another unbalanced situation is when metabolites accumulate inside or outside the cell. These metabolites must therefore not be subjected to the equality constraint of the linear programming routine for flux estimation. These metabolites are termed external metabolites or, in short, externals. Next we specify the term to be maximized. Additional equality constraints are given in list form in the variable eqns as well as bounded constraints for specific reaction fluxes (variable constraints). Finally a temporary LIM model file is created using the function \verb|createLIMFromSBML|. <>= ##following term is to be maximized maximize <- "R_ATPS4m - R_NDPK1m - R_HEX1 - R_PFK - R_PGK + R_PYK" ##specify the external metabolites of the system externals <- c("M_glc_DASH_D_e", "M_lac_DASH_L_e", "M_ala_DASH_L_e", "M_gln_DASH_L_c", "M_h2o_e", "M_co2_e", "M_o2_e", "M_h_e", "M_pi_c", "M_o2s_m", "M_nh4_m", "M_adp_c", "M_atp_c", "M_nadp_c", "M_nadph_c", "M_h_c") ##specify the values of following fluxes: ##R_GLCt1r=0.4, R_O2t=2.4, R_L_LACt2r=R_GLNtm=0 equation.vars <- c("R_GLCt1r", "R_O2t", "R_L_LACt2r", "R_GLNtm") equation.values <- c(0.4, 2.4, 0.0, 0.0) eqns <- list(equation.vars, equation.values) ##create LIM file which is written to the ##hard disk in the system's temporary directory limfile.path <- tempfile() createLIMFromSBML(sbml.model, maximize, equations=eqns, externals=externals, file.name=limfile.path) @ \subsection{Running simulations to estimate fluxes} \label{calc_fluxes} BiGGR uses Linear Inverse Models for estimating the fluxes as provided by the LIM package from Bioconductor. All the functionality of this package can be used in this framework. For example LIM contains a function \verb|xsample| to generate ensembles of feasible fluxes using Markov chain Monte Carlo methods. The function \verb|lsei| in LIM provides least squares estimation with equalities and inequalities which is useful to fit the model to biochemical measurements of metabolite exchange. The interface for some of the key functions provided by BiGGR is \verb|getRates| which takes the model file as an input parameter to estimate fluxes which finds the solution where one linear function (i.e. the sum of unknowns) is either minimized (a "cost" function) or maximized (a "profit" function). <>= rates <- getRates(limfile.path) @ \subsection{Visualization of networks and fluxes} BiGGR provides visualization using hypergraphs. To this end, BiGGR uses the BioConductor package hyperdraw which in turn uses the Graphviz engine. Hypergraphs are graphs which can connect multiple nodes by one edge. Metabolites are represented by nodes and reactions are represented by edges connecting the nodes. The flow rates (fluxes) of the biochemical reactions can be represented by the width of the edges (a wider edge corresponds to a higher flux value). An SBML model can be converted into a hyperdraw object using the function \verb|sbml2hyperdraw|. Since many models contain numerous metabolites and reactions, a 'human readable' automatic graphical representation of the system in one single plot is often infeasible. Therefore, specific subsets of metabolites and/or reactions can be passed as an argument to the \verb|sbml2hyperdraw| function and only metabolites or reactions belonging to the specified sets are visualized. Below we will visualize a subset of metabolites and reactions in the glycolytic pathway, which is a subset of our example model. As a second argument we pass the reaction rates calculated in \ref{calc_fluxes} in order to represent the reaction rates by the width of the edges. <>= relevant.species <- c("M_glc_DASH_D_c", "M_g6p_c", "M_f6p_c", "M_fdp_c", "M_dhap_c", "M_g3p_c", "M_13dpg_c", "M_3pg_c", "M_2pg_c", "M_pep_c", "M_pyr_c") hd <- sbml2hyperdraw(sbml.model, rates=rates, relevant.species=relevant.species, layoutType="dot", plt.margins=c(20, 0, 20, 0)) @ The hypergraph object can then simply be plotted using the plot function: <>= plot(hd) @ \begin{figure}[ht!] \begin{center} <>= plot(hd) @ \end{center} \caption{\textit{Estimated fluxes in the glycolytic pathway. For each reaction, the arrow points in the direction of the calculated flux. If that is backward relative to the direction defined as forward in the metabolic reconstruction, the arrow is colored red. Note that only a subset of all metabolites and reactions is plotted.}} \label{fig:glyc} \end{figure} The resulting plot is shown in Figure \ref{fig:glyc}. Flux values are displayed following each reaction identifier. The forward direction is defined in the BiGG database according to biochemical conventions, but if the actual calculated flux is backwards accroding to the definition the arrow is colored red. Additional graphical arguments are documented in the help file (see \verb|?sbml2hyperdraw|). Below, we give various reactions and metabolites in the TCA cycle which are present in our example model and plot all components using a circular layout (see Figure \ref{fig:tca}): <>= relevant.species <- c("M_cit_m", "M_icit_m" , "M_akg_m", "M_succoa_m", "M_succ_m", "M_fum_m", "M_mal_DASH_L_m", "M_oaa_m") relevant.reactions <- c("R_CSm", "R_ACONTm", "R_ICDHxm", "R_AKGDm", "R_SUCOAS1m", "R_SUCD1m", "R_FUMm", "R_MDHm", "R_ICDHyrm", "R_ME1m", "R_ME2m", "R_ASPTAm","R_AKGMALtm", "R_GLUDym", "R_ABTArm", "R_SSALxm","R_CITtam") hd <- sbml2hyperdraw(sbml.model, rates=rates, relevant.reactions=relevant.reactions, relevant.species=relevant.species, layoutType="circo", plt.margins=c(100, 235, 100, 230)) dev.new() ##Open a new plotting device plot(hd) @ \begin{figure}[htp!] \begin{center} <>= relevant.species <- c("M_cit_m", "M_icit_m" , "M_akg_m", "M_succoa_m", "M_succ_m", "M_fum_m", "M_mal_DASH_L_m", "M_oaa_m") relevant.reactions <- c("R_CSm", "R_ACONTm", "R_ICDHxm", "R_AKGDm", "R_SUCOAS1m", "R_SUCD1m", "R_FUMm", "R_MDHm", "R_ICDHyrm", "R_ME1m", "R_ME2m", "R_ASPTAm","R_AKGMALtm", "R_GLUDym", "R_ABTArm", "R_SSALxm","R_CITtam") hd <- sbml2hyperdraw(sbml.model, rates=rates, relevant.reactions=relevant.reactions, relevant.species=relevant.species, layoutType="circo", plt.margins=c(150, 235, 150, 230)) plot(hd) @ \end{center} \caption{\textit{Estimated fluxes in the citric acid cycle in the mitochondrion.}} \label{fig:tca} \end{figure} In this example, reactions with a flux equal to zero are displayed in grey. Note that metabolites which are not specified are not plotted, even if reactions in which they participate are drawn. This is for instance the case for the exchange reaction below: \begin{verbatim} M_akg_m + M_mal_DASH_L_c -> M_akg_c + M_mal_DASH_L_m \end{verbatim} The visualization function \verb|sbml2hyperdraw| is not restricted to FBA models, but \verb|sbml2hyperdraw| can be used as a generic plotting function for SBML models. To this end, in case that no reaction rates are given as argument, all edges are plotted with the same width and in the same color. \section{Troubleshooting BiGGR} Model building is an iterative process and requires careful selection of parameters and arguments. Some of the most common problems and solutions are described below: \begin{itemize} \item{ \textbf{Infeasible solution:} This problem can be encountered when using the \verb|linp| method form the LIM package. This problem occurs when the constraints provided by the user for the model are conflicting. (A trivial example is that a constraint says that a specific flux is greater than 5 units and another constraint says the same flux is smaller than 4. Such conflicts can be much more subtle). The reactions in the model file may sometimes be defined incorrectly, for instance with regard to their reversibility.} \item{ \textbf{Visualizing too many metabolites and reactions:} If the plotting area is too small to fit all boxes for metabolites, the following error is produced by the hyperdraw package: \begin{verbatim} Error in `[.unit`(pts$x, ref + step) : Index out of bounds (unit subsetting) \end{verbatim} In case you encounter this error when plotting your model, you can consider several possibilities: \begin{itemize} \item{ Increase the size of the plotting area: When plotting to the screen, width and height of the plotting window can be set with the \verb|x11()| command. Type \verb|?x11| for more information. Similarly, figure dimensions can be set when plotting to a jpeg, png, pdf, eps etc. device. Type for instance \verb|?pdf| for the documentation.} \item{ Consider plotting only a subset of the metabolites and reactions in the model. It is possible to pass a list or vector of relevant species and/or relevant reactions to the function \verb|sbml2hyperdraw|. See \verb|?sbml2hyperdraw| for more information.} \end{itemize} } \item{ \textbf{Resizing the plotting window:} Resizing the plotting window after plotting a model can cause the edges to get distorted. We advice not to manually resize the plotting window. Instead, if a larger plotting area is desired, the dimensions of the plotting area can be set as described above.} \end{itemize} \bibliographystyle{ieeetr} \bibliography{BiGGR} \end{document}