This Vignette accompanies the CoRegFlux package. It can be used either to get some additional information about the methods or to get examples of the use of the functions. Feel free to ask any question to the package maintainer (coregflux at gmail dot com).

4 User guide

4.1 Computing Influence using CoRegNet package function

data("SC_GRN_1")
data("SC_EXP_DATA")
data("SC_Test_data")

Testing_influence_matrix <- CoRegNet::regulatorInfluence(SC_GRN_1,SC_Test_data)
experiment_influence<- Testing_influence_matrix[,1]

Here are the main functionalities of CoRegFlux

4.2 Predict gene state/gene expression level from a condition specific experiment using a linear model

data("aliases_SC")
data("iMM904")
PredictedGeneState <- predict_linear_model_influence(network = SC_GRN_1,
                    experiment_influence = experiment_influence,
                    train_expression = SC_EXP_DATA,
                    min_Target = 4,
                    model = iMM904,
                    aliases = aliases_SC)

GeneState<-data.frame("Name" = names(PredictedGeneState),
                     "State" = unname(PredictedGeneState))

4.3 Simulations

For each simulation step, the function receives a metabolic model and performs:

update fluxes by metabolites concentrations
update fluxes by coregnet and influence value
update fluxes by gene state from the GRN simulator

The simulation result is a list containing:

objective_history: time series of objective function value for the linear program
metabolites: metabolites concentrations over time
fluxes_history: time series of the fluxes values for all the time series
metabolites_concentration_history: time series of metabolite concentrations
metabolites_fluxes_history: time series of the metabolites fluxes during the simulation
rate_history: time series of the growth rate values for all simulation
time: vector containing the simulation times
gene_state_history: list containing the values for the gene state during the simulation

The fluxes for the simulation time are stored in a matrix which row names are the fluxes reaction id.

data("aliases_SC")
data("iMM904")
metabolites<-data.frame("names" = c("D-Glucose","Ethanol"),
                        "concentrations" = c(16.6,0))

Simulation1<-Simulation(model = iMM904,
                        time = seq(1,20,by = 1),
                        metabolites = metabolites,
                        initial_biomass = 0.45,
                        aliases = aliases_SC)
# Default biomass flux index use is 1577 corresponding to  Biomass SC5 notrace
# Joining by: metabolites_id
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step
# simulation step

Simulation1$fluxes_history[1:10,1:5]
#                 [,1]    [,2]          [,3]       [,4]       [,5]
# 13BGH     0.00000000 0.00000  0.000000e+00 0.00000000 0.00000000
# 13BGHe    0.00000000 0.00000  0.000000e+00 0.00000000 0.00000000
# 13GS      0.32667000 0.32667  2.063497e-01 0.03309391 0.03309391
# 16GS      0.00000000 0.00000  0.000000e+00 0.00000000 0.00000000
# 23CAPPD   0.00000000 0.00000  0.000000e+00 0.00000000 0.00000000
# 2DDA7Ptm -0.07608291 0.00000  0.000000e+00 0.00000000 0.00000000
# 2DHPtm    0.00000000 0.00000 -1.136868e-13 0.00000000 0.00000000
# 2DOXG6PP  0.00000000 0.00000  0.000000e+00 0.00000000 0.00000000
# 2HBO      0.00000000 0.00000  0.000000e+00 0.00000000 0.00000000
# 2HBt2     0.00000000 0.00000  0.000000e+00 0.00000000 0.00000000

To have access to the gprRules users can use the sybil package, which returns a vector of size equal to the number of fluxes and the associated genes.

library(sybil)
gpr(iMM904)[1:5]
# [1] "YGR282C"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
# [2] "(YDR261C or YOR190W or YLR300W)"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
# [3] "((YLR342W and YCR034W and YMR307W) or (YLR342W and YCR034W and YMR215W) or (YMR306W and YCR034W and YLR343W) or (YMR306W and YCR034W and YOL030W) or (YGR032W and YCR034W and YOL030W) or (YLR342W and YCR034W and YOL132W) or (YGR032W and YCR034W and YMR215W) or (YGR032W and YCR034W and YLR343W) or (YMR306W and YCR034W and YMR215W) or (YMR306W and YCR034W and YOL132W) or (YLR342W and YCR034W and YOL030W) or (YLR342W and YCR034W and YLR343W) or (YGR032W and YCR034W and YOL132W) or (YGR032W and YCR034W and YMR307W) or (YMR306W and YCR034W and YMR307W))"
# [4] "(YPR159W or YGR143W)"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
# [5] "YGR247W"

If you only wish to know which gene affects which reaction; the sybil objects have a slot for obtaining the flux-gene matrix.

rxnGeneMat(iMM904)[1:10,1:10]
# 10 x 10 sparse Matrix of class "lgCMatrix"
#                          
#  [1,] | . . . . . . . . .
#  [2,] . | | | . . . . . .
#  [3,] . . . . | | | | | |
#  [4,] . . . . . . . . . .
#  [5,] . . . . . . . . . .
#  [6,] . . . . . . . . . .
#  [7,] . . . . . . . . . .
#  [8,] . . . . . . . . . .
#  [9,] . . . . . . . . . .
# [10,] . . . . . . . . . .

4.3.1 Simulate a dFBA over time (here 20h) without constraint


metabolites<-data.frame("names" = c("D-Glucose","Ethanol"),
                        "concentrations" = c(16.6,0))

Simulation1<-Simulation(model = iMM904,
                        time = seq(1,20,by = 1),
                        metabolites = metabolites,
                        initial_biomass = 0.45,
                        aliases = aliases_SC)
    
Simulation1$biomass_history
#  [1] 0.4500000 0.6001102 0.8002939 0.9598884 0.9882935 1.0175393 1.0476505
#  [8] 1.0786527 1.1105724 1.1434366 1.1772734 1.2121114 1.2479804 1.2849109
# [15] 1.3229342 1.3620827 1.4023896 1.4175428 1.4175428 1.4175428
Simulation1$met_concentration_history
#           [,1]      [,2]      [,3]     [,4]     [,5]     [,6]     [,7]
# D-Glucose 16.6 11.385409  4.431344  0.00000  0.00000  0.00000  0.00000
# Ethanol    0.0  8.247123 19.245307 26.42949 24.77683 23.07527 21.32335
#               [,8]     [,9]    [,10]    [,11]    [,12]    [,13]    [,14]
# D-Glucose  0.00000  0.00000  0.00000  0.00000  0.00000 0.000000 0.000000
# Ethanol   19.51959 17.66245 15.75036 13.78168 11.75474 9.667829 7.519157
#              [,15]    [,16]     [,17] [,18] [,19] [,20]
# D-Glucose 0.000000 0.000000 0.0000000     0     0     0
# Ethanol   5.306901 3.029179 0.6840547     0     0     0

4.4 Constraining the model

When provided with different kind of constraints, CoRegFlux process the given information in the following order:

Gene expression is first integrated
TF KO or OV is carried out, starting with the first line in the regulator table and going along the rows.
Gene KO or OV is carried out. starting with the first line in the gene table and going along the rows. Thus order in the regulator table and gene table might play a role and potentially give different results.

The different functions used by CoRegFlux to constraint the model are individually accessible to allow the combination of CoRegFlux’s models with other algorithms and parameters, provided by sybil for instance. A model can be constrain iteratively through the different function. In that case, the recommended order is as follows: uptake constraint, gene expression, TF KO or OV, gene KO or OV.


regulator_table <- data.frame("regulator" = c("MET32","CAT8"),
                              "influence" =  c(-1.20322,-2.4),
                              "expression" = c(0,0),
                              stringsAsFactors = FALSE)

model_TF_KO_OV_constraints <- update_fluxes_constraints_influence(model= iMM904,
                                           coregnet = SC_GRN_1,
                                           regulator_table = regulator_table,
                                           aliases = aliases_SC )

sol<-sybil::optimizeProb(model_TF_KO_OV_constraints) 
#Additional parameters from sybil can then be integrated such as the chosen 
# algorithms

sol
# solver:                                   glpkAPI
# method:                                   simplex
# algorithm:                                fba
# number of variables:                      1577
# number of constraints:                    1226
# return value of solver:                   solution process was successful
# solution status:                          solution is optimal
# value of objective function (fba):        0.287866
# value of objective function (model):      0.287866

4.4.1 Simulate a dFBA with gene expression as a constraint

Simulation2<-Simulation(model = iMM904,
                        time = seq(1,20,by = 1),
                        metabolites = metabolites,
                        initial_biomass = 0.45,
                        aliases = aliases_SC,
                        gene_state_function = function(a,b){GeneState})
    
Simulation2$biomass_history
#  [1] 0.4500000 0.5248670 0.6121898 0.7140405 0.8328362 0.9713961 1.0679966
#  [8] 1.0996010 1.1321406 1.1656430 1.2001369 1.2356516 1.2722172 1.2729969
# [15] 1.2729969 1.2729969 1.2729969 1.2729969 1.2729969 1.2729969
Simulation2$met_concentration_history
#           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]     [,7]
# D-Glucose 16.6 14.477830 12.002591  9.115544  5.748176  1.820576  0.00000
# Ethanol    0.0  3.005114  6.510192 10.598414 15.366798 20.928504 22.88805
#               [,8]     [,9]   [,10]    [,11]    [,12]     [,13] [,14]
# D-Glucose  0.00000  0.00000  0.0000  0.00000 0.000000 0.0000000     0
# Ethanol   17.76373 15.87053 13.9213 11.91439 6.156059 0.2273263     0
#           [,15] [,16] [,17] [,18] [,19] [,20]
# D-Glucose     0     0     0     0     0     0
# Ethanol       0     0     0     0     0     0

4.4.2 Simulate a dFBA with TF knock-out (KO) while constraining the model with gene expression

If the simulated mutant have several TFs KO or OV, CoRegFlux will constrain the model according to the order of the TFs in the regulator table. While this example also constraint the model with gene expression, it is possible to run the simulation without such constraints.

regulator_table <- data.frame("regulator" = "MET32",
                              "influence" =  -1.20322,
                              "expression" = 0,
                              stringsAsFactors = FALSE)

SimulationTFKO<-Simulation(model = iMM904,
                        time = seq(1,20,by = 1),
                        metabolites = metabolites,
                        initial_biomass = 0.45,
                        aliases = aliases_SC,
                        coregnet = SC_GRN_1,
                        regulator_table = regulator_table ,
                        gene_state_function = function(a,b){GeneState})

SimulationTFKO$biomass_history ## This KO is predicted as non-lethal 
#  [1] 0.4500000 0.5248670 0.6121898 0.7140405 0.8328362 0.8574816 0.8828564
#  [8] 0.9089820 0.9358807 0.9635755 0.9920898 1.0214478 1.0515062 1.0515062
# [15] 1.0515062 1.0515062 1.0515062 1.0515062 1.0515062 1.0515062

4.4.3 Simulate a dFBA with TF over-expression (OV) while constraining the model with gene expression

regulator_table <- data.frame("regulator" = "MET32",
                                  "influence" = -1.20322 ,
                                  "expression" = 3,
                                  stringsAsFactors = FALSE)

SimulationTFOV<-Simulation(model = iMM904,
                            time = seq(1,20,by = 1),
                            metabolites = metabolites,
                            initial_biomass = 0.45,
                            aliases = aliases_SC,
                            coregnet = SC_GRN_1,
                            regulator_table = regulator_table,
                            gene_state_function = function(a,b){GeneState})

SimulationTFOV$biomass_history ## This OV is predicted as non-lethal
#  [1] 0.4500000 0.5248670 0.6121898 0.7140405 0.8328362 0.9276605 0.9551120
#  [8] 0.9833758 1.0124761 1.0424374 1.0424374 1.0424374 1.0424374 1.0424374
# [15] 1.0424374 1.0424374 1.0424374 1.0424374 1.0424374 1.0424374

4.4.4 Simulate a dFBA with gene(s) knock-out or over-expression simulation while constraining the model with gene expression

If the simulated mutant have several gene KO or gene OV, CoRegFlux will constrain the model according to the order of the genes in the gene table. While this example also constraint the model with gene expression, it is possible to run the simulation without such constraints.

gene_table <- data.frame("gene" = c("YJL026W","YIL162W"),
                                  "expression" =c(2,0),
                                  stringsAsFactors = FALSE)

SimulationGeneKO_OV<-Simulation(model = iMM904,
                                time = seq(1,20,by = 1),
                                metabolites = metabolites,
                                initial_biomass = 0.45,
                                aliases = aliases_SC,
                                coregnet = SC_GRN_1,
                                gene_table = gene_table,
                                gene_state_function = function(a,b){GeneState})

SimulationGeneKO_OV$biomass_history ## This OV is predicted as non-lethal
#  [1] 0.4500000 0.5248670 0.6121898 0.7140405 0.8328362 0.8574816 0.8828564
#  [8] 0.9089820 0.9358807 0.9635755 0.9920898 1.0214478 1.0516747 1.0516747
# [15] 1.0516747 1.0516747 1.0516747 1.0516747 1.0516747 1.0516747

4.4.5 Constraining the model according to gene expression, TF KO or OV, gene KO or OV to run various FBA using sybil


metabolites_rates <- data.frame("name"=c("D-Glucose"),
                               "concentrations"=c(16.6),
                               "rates"=c(-2.81))

model_uptake_constraints <- adjust_constraints_to_observed_rates(model = iMM904, 
                                    metabolites_with_rates = metabolites_rates)

model_gene_constraints <- coregflux_static(model= iMM904,
                                           predicted_gene_expression = 
                                               PredictedGeneState,
                                           aliases = aliases_SC)$model

model_TF_KO_OV_constraints <- update_fluxes_constraints_influence(model= iMM904,
                                           coregnet = SC_GRN_1,
                                           regulator_table = regulator_table,
                                           aliases = aliases_SC )

model_gene_KO_OV_constraints <- update_fluxes_constraints_geneKOOV(
                                            model= iMM904,
                                            gene_table =  gene_table,
                                            aliases = aliases_SC)

sol <- sybil::optimizeProb(model_TF_KO_OV_constraints)   

sol
# solver:                                   glpkAPI
# method:                                   simplex
# algorithm:                                fba
# number of variables:                      1577
# number of constraints:                    1226
# return value of solver:                   solution process was successful
# solution status:                          solution is optimal
# value of objective function (fba):        0.287866
# value of objective function (model):      0.287866

4.5 From observations to fluxes

Here we will compute the fluxes from the observed growth rates (which can be obtained directly from the growth curves)

Assuming we have an observed growth rate of 0.3

fluxes_obs <- 
  get_fba_fluxes_from_observations(iMM904,0.3)
fluxes_obs[1:10,]
#    13BGH   13BGHe     13GS     16GS  23CAPPD 2DDA7Ptm   2DHPtm 2DOXG6PP 
#  0.00000  0.00000  0.32667  0.00000  0.00000  0.00000  0.00000  0.00000 
#     2HBO    2HBt2 
#  0.00000  0.00000

Given that the fba solution is not unique, if you wish to see the intervals of maximum and minimum allowed fluxes for a reaction, flux variability analysis should be used

fluxes_intervals_obs <-
  get_fva_intervals_from_observations(iMM904,0.3) 
# calculating 3154 optimizations ...
# 
# |            :            |            :            | 100 %
# |===================================================| :-)
# OK
# Done.
fluxes_intervals_obs[1:10,]
#                    min          max
# 13BGH     0.000000e+00 2.962010e-05
# 13BGHe    0.000000e+00 0.000000e+00
# 13GS      3.266692e-01 3.266988e-01
# 16GS      0.000000e+00 0.000000e+00
# 23CAPPD   0.000000e+00 0.000000e+00
# 2DDA7Ptm -7.609314e-02 0.000000e+00
# 2DHPtm    0.000000e+00 5.558111e-06
# 2DOXG6PP  0.000000e+00 0.000000e+00
# 2HBO     -1.880648e-05 0.000000e+00
# 2HBt2    -1.880648e-05 0.000000e+00

It worth noting that none of the two methods guarantee that the observed growth rate will be reached.

fluxes_obs[get_biomass_flux_position(iMM904),]
# BIOMASS_SC5_notrace 
#           0.2878657
fluxes_intervals_obs[get_biomass_flux_position(iMM904),]
#       min       max 
# 0.2878650 0.2878657

This could mean that the uptake rates for the limiting substrates (most commonly glucose uptake rate) does not allow for higher growth.

To constraint the model using the substrate uptake rate, the user must also provide the metabolites_rates argument

metabolites_rates <- data.frame("name"=c("D-Glucose","Ethanol"),
                               "rates"=c(-10,-1))
fluxes_obs <- 
  get_fba_fluxes_from_observations(
    model = iMM904,
    observed_growth_rate =  0.3,
    metabolites_rates = metabolites_rates) 
# Joining by: metabolites_id

fluxes_obs[get_biomass_flux_position(iMM904),]
# BIOMASS_SC5_notrace 
#           0.2878657

fluxes_interval_obs <- 
  get_fva_intervals_from_observations(
    model = iMM904,
    observed_growth_rate =0.3,
    metabolites_rates = metabolites_rates) 
# Joining by: metabolites_id
# calculating 3154 optimizations ...
# 
# |            :            |            :            | 100 %
# |===================================================| :-)
# OK
# Done.
fluxes_interval_obs[get_biomass_flux_position(iMM904),]
#       min       max 
# 0.2878650 0.2878657

4.5.1 Adjusting the fluxes bounds based on observed growth rates, and visualized its effects on metabolic genes

During this step, you might get a message from R.cache to choose where the cached files should be saved. Since those files are only temporary files, you can create a dedicated folder in your working directory which you can remove afterward, or pick a location near the installation folder of the R.cache package.


FBA_bounds_from_growthrate<- get_fba_fluxes_from_observations(
    model = iMM904,observed_growth_rate = 0.3,
    metabolites_rates = metabolites_rates)

FVA_bounds_from_growthrate<- get_fva_intervals_from_observations(
    model = iMM904,observed_growth_rate = 0.3,
    metabolites_rates = metabolites_rates)
# 
# |            :            |            :            | 100 %
# |===================================================| :-)

ODs<-seq.int(0.099,1.8,length.out = 5)
times = seq(0.5,2,by=0.5)

ODcurveToMetCurve<- ODCurveToMetabolicGeneCurves(times = times,
                             ODs = ODs,
                             model = iMM904,
                             aliases = aliases_SC,
                             metabolites_rates = metabolites_rates) 
# 
# |            :            |            :            | 100 %
# |===================================================| :-) 
# 
# |            :            |            :            | 100 %
# |===================================================| :-) 
# 
# |            :            |            :            | 100 %
# |===================================================| :-)

visMetabolicGeneCurves(ODcurveToMetCurve,genes = "YJR077C")


ODtoflux<-ODCurveToFluxCurves(model = iMM904,
                              ODs = ODs,
                              times = times,
                              metabolites_rates = metabolites_rates)
# 
# |            :            |            :            | 100 %
# |===================================================| :-) 
# 
# |            :            |            :            | 100 %
# |===================================================| :-) 
# 
# |            :            |            :            | 100 %
# |===================================================| :-)

visFluxCurves(ODtoflux, genes ="ADK3")

4.6 Calibration: identifying the softplus parameter using bayesian optimization

To translate the gene expression to fluxes in the GEM, CoRegFlux use the softplus function

where \(\theta\) is the softplus parameter applied to all fluxes, \(gpr_{i}\left(X\right)\) is the result of evaluating the gene-protein-reaction rules for a set of gene expression levels of the metabolic genes \(X\). These rules relate genes to reactions and are logical form. CoRegflux transform these rules as follows

AND are substitued by MIN()
OR are substituted by SUM().

Given a known growth rate and predicted gene expressions obtained through the function predict_linear_model_influence, the users have the possibility to adjust the softplus parameter \(\theta\) to calibrate the integration of the gene expression in the GEM. This step requires the installation of the package rBayesianOptimization.

library(rBayesianOptimization)
gRates <- 0.1

opF<-function(p){
        CoRegFlux_model<-coregflux_static(model = model_uptake_constraints,
                                          gene_parameter = p,
                                          predicted_gene_expression = 
                                              PredictedGeneState)
        ts<-optimizeProb(CoRegFlux_model$model)
        list(Score=-1*log(abs(lp_obj(ts)-gRates)/gRates),Pred=0)
    }

result<-BayesianOptimization(FUN = opF,
                             bounds = list(p = c(-10,10)),
                             data.frame(p = seq(-10,10,by =  0.5)),
                             n_iter = 10, 
                             verbose = TRUE)

CoRegFlux

Pauline Trébulle, Daniel Trejo-Banos, Mohamed Elati

December 2018

1 Installation

2 Introduction

3 Data requirement