Chemical and Pharmaceutical Engineering, Singapore-MIT Alliance, Singapore, 117576, Singapore

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Institute for Chemical and Bioengineering, ETH Zürich, 8093, Zürich, Switzerland

Abstract

Background

An efficient and reliable parameter estimation method is essential for the creation of biological models using ordinary differential equation (ODE). Most of the existing estimation methods involve finding the global minimum of data fitting residuals over the entire parameter space simultaneously. Unfortunately, the associated computational requirement often becomes prohibitively high due to the large number of parameters and the lack of complete parameter identifiability (i.e. not all parameters can be uniquely identified).

Results

In this work, an incremental approach was applied to the parameter estimation of ODE models from concentration time profiles. Particularly, the method was developed to address a commonly encountered circumstance in the modeling of metabolic networks, where the number of metabolic fluxes (reaction rates) exceeds that of metabolites (chemical species). Here, the minimization of model residuals was performed over a subset of the parameter space that is associated with the degrees of freedom in the dynamic flux estimation from the concentration time-slopes. The efficacy of this method was demonstrated using two generalized mass action (GMA) models, where the method significantly outperformed single-step estimations. In addition, an extension of the estimation method to handle missing data is also presented.

Conclusions

The proposed incremental estimation method is able to tackle the issue on the lack of complete parameter identifiability and to significantly reduce the computational efforts in estimating model parameters, which will facilitate kinetic modeling of genome-scale cellular metabolism in the future.

Background

The estimation of unknown kinetic parameters from time-series measurements of biological molecules is a major bottleneck in the ODE model building process in systems biology and metabolic engineering

Here, we consider the modeling of cellular metabolism using the canonical power-law formalism, specifically the generalized mass action (GMA) systems

The integration of ODE often constitutes a major part of the computational cost in the parameter estimation, especially when the ODE model is stiff

As noted above, the new parameter estimation method in this work is built on the concept of incremental identification

Methods

The generalized mass action model of cellular metabolism describes the mass balance of metabolites, taking into account all metabolic influxes and effluxes and their stoichiometric ratios, as follows:

where **X**(**p**) is the vector of metabolic concentration time profiles, **S** ∈ **R**
^{
m × n
} is the stoichiometric matrix for **v**(**X**,**p**) denotes the vector of metabolic fluxes (i.e. reaction rates). Here, each flux is described by a power-law equation:

where _{
j
} is the rate constant of the _{
ji
} is the kinetic order parameter, representing the influence of metabolite _{
i
} on the _{
i
} is an activating factor or a substrate, negative: _{
i
} is an inhibiting factor). In incremental parameter identification, a data pre-processing step (e.g. smoothing or filtering) is usually applied to the noisy time-course concentration data **X**
_{
m
}(_{
k
}), in order to improve the time-slope estimates
**v**(_{
k
}) are estimated from Equation (1) by substituting
_{
j
} and _{
ji
}’s) can be calculated using a least square regression of the power law flux function in Equation (2) against the estimated _{
j
}(_{
k
}). Note that for GMA models, the least square parameter regressions in the last step are linear in the logarithmic scale and thus, can be performed very efficiently.

A unique set of dynamic flux values **v**(_{
k
}) can only be computed from
**v**(_{
k
}) that satisfy
_{
DOF
} = **S** has a full row rank, i.e. there is no redundant ODE in Equation (1)). The positive DOF means that the values of _{
DOF
} selected fluxes can be independently set, from which the remaining fluxes can be computed. This relationship forms the basis of the proposed estimation method, in which the model goodness of fit to data is optimized by adjusting only a subset of parameters associated with the independent fluxes above.

Specifically, we start by decomposing the fluxes into two groups: **v**(_{
k
}) = [ **v**
_{
I
}(_{
k
})^{
T
} **v**
_{
D
}(_{
k
})^{
T
} ]^{
T
} , where the subscripts **p** and the stoichiometric matrix **S** can be structured correspondingly as **p** = [ **p**
_{
I
} **p**
_{
D
} ] and **S** = [ **S**
_{
I
} **S**
_{
D
} ]. The relationship between the independent and dependent fluxes can be formulated by rearranging

In this case, given **p**
_{
I
}, one can compute the independent fluxes **v**
_{
I
}(**X**
_{
m
}(_{
k
}),**p**
_{
I
}) using the concentration data **X**
_{
m
}(_{
k
}), and subsequently obtain **v**
_{
D
}(_{
k
}) from Equation (3). Finally, **p**
_{
D
} can be estimated by a simple least square fitting of **v**
_{
D
}(**X**
_{
m
}(_{
k
}),**p**
_{
D
}) to the computed **v**
_{
D
}(_{
k
}) one flux at a time, when there are more time points than the number of parameters in each flux.

In this study, two formulations of the parameter estimation of ODE models in Equation (1) are investigated, involving the minimization of concentration and slope errors. The objective function for the concentration error is given by

and that for the slope error is given by

where **X**(_{
k
},**p**) is the concentration prediction (i.e. the solution to the ODE model in Equation (1)). Figure
_{C} requires an integration of the ODE model and thus, the estimation using this objective function is expected to be computationally costlier than that using Φ_{S}. On the other hand, metabolic mass balance is only approximately satisfied at discrete time points _{
k
} during the parameter estimation using Φ_{S}, as the ODE model is not integrated.

Flowchart of the incremental parameter estimation

**Flowchart of the incremental parameter estimation.**

There are several important practical considerations in the implementation of the proposed method. The first consideration is on the selection of the independent fluxes. Here, the set of these fluxes is selected such that (i) the **S**
_{
D
} is invertible, (ii) the total number of the independent parameters **p**
_{
I
} is small, and (iii) the prior knowledge of the corresponding **p**
_{
I
} is maximized. The last two aspects should lead to a reduction in the parameter search space and the cost of finding the global optimal solution of the minimization problem in Figure

So far, we have assumed that the time-course concentration data are available for all metabolites. However, the method above can be modified to accommodate more general circumstances, in which data for one or several metabolites are missing. In this case, the ODE model is first rewritten to separate the mass balances associated with measured and unmeasured metabolites, such that

where the subscripts **v**
_{
I
} and **v**
_{
D
} as above, the following relationship still applies for the measured metabolites:

Naturally, the degree of freedom associated with the dynamic flux estimation is higher by the number of component in **X**
_{
U
} than before. Figure
**X**
_{
M
} is set as an external variable, whose time-profiles are interpolated from the measured concentrations. The set of independent fluxes **v**
_{
I
} are now selected to include all fluxes that appear in
**S**
_{
D,M
}. If **S**
_{
D,M
} is a non-square matrix, then a pseudo-inverse will be done in Equation (7). Of course, the same considerations mentioned above are equally relevant in this case. Note that the initial conditions of **X**
_{
U
} will also need to be estimated.

Flowchart of the incremental parameter estimation when metabolites are not completely measured

**Flowchart of the incremental parameter estimation when metabolites are not completely measured.**

Results

Two case studies: a generic branched pathway
_{C} = 10^{3} for the branched pathway and 10^{5} for the glycolytic pathway). Alternatively, one could also set a maximum allowable integration time and penalize the associated parameter values upon violation, as described above. In this study, the optimization problems were solved in MATLAB using publicly available eSSM GO (Enhanced Scatter Search Method for Global Optimization) toolbox, a population-based metaheuristic global optimization method incorporating probabilistic and deterministic strategies

**Incremental Estimation Code.** Additional file

Click here for file

A generic branched pathway

The generic branched pathway in this example consists of four metabolites and six fluxes, describing the transformations among the metabolites (double-line arrows), with feedback activation and inhibition (dashed arrows with plus or minus signs, respectively), as shown in Figure
^{2}

A generic branched pathway. (A) Metabolic pathway map and (B) the GMA model equations

**A generic branched pathway.** (**A**) Metabolic pathway map and (**B**) the GMA model equations

Here, _{
1
} and _{
6
} were chosen as the independent fluxes as they comprise the least number of kinetic parameters and lead to an invertible **S**
_{
D
}. The two rate constants and two kinetic orders were constrained to within [0,25] and [0,2], respectively. In addition, all the reactions are assumed to be irreversible.

Table
_{C} in less than 96 seconds on average with good concentration fit and parameter accuracy (see Figure
_{S}, the simultaneous estimation of parameters could be completed in roughly 10 minutes duration, but this was much slower than the incremental estimation using Φ_{C}. In this case, the incremental method was able to converge in below 2 seconds or over 250 times faster. The goodness of fit to concentration data and the accuracy of parameter estimates were relatively equal for all three completed estimations (see Figure

**Supplementary Tables.** Additional file 2 contains the parameter estimation results of the branched pathway model using noise-free data and analytical slopes, the parameter estimates of the two case studies, and the parameter estimation results of five repeated runs.

Click here for file

**Simultaneous method**

**Incremental method**

**min Φ**
_{
C
}

**min Φ**
_{
S
}

a. CPU time was based on a workstation with dual Intel Quad-Core 2.83 GHz processors.

b. Only one out of five runs completed with a relative improvement of the objective function below 1% between iterations. The rest did not converge within the 5-day time limit after iterating for 583, 989, 777, and 661 times. The corresponding Φ_{C} at termination were 4.85× 10^{-2}, 1.39 × 10^{-2}, 1.75 × 10^{-2} and 3.75 × 10^{-2}, respectively.

c. Mean value ± standard deviation out of five repeats.

d. Root mean square error of model predictions, where the underlined value refers to the objective function of the minimization.

CPU time (sec) ^{
a
}

56.00 h

620.81 ± 64.30

95.95 ± 11.09

1.56 ± 0.19

eSSM GO iterations

323

4390 ± 391

14 ± 4

10 ± 2

Parameter error (%)

49.10

36.91% ± 1.09

21.56% ± 7.57 × 10^{-2}

36.85% ± 6.48 × 10^{-3}

- 4.54 × 10

6.54 × 10^{-3} ± 5.20 × 10^{-5}

- 4.03 × 10

6.00 × 10^{-3} ± 5.05 × 10^{-7}

7.01 × 10^{-2}

- 2.72 × 10

3.92 × 10^{-2} ± 9.86 × 10^{-6}

- 2.76 × 10

Simultaneous and incremental estimation of the branched pathway using in silico noise-free-data (x)

**Simultaneous and incremental estimation of the branched pathway using ****noise-free data (×).** (**A**) concentration predictions using parameter estimates from incremental method by Φ_{C} minimization (–––); (**B**) concentration predictions using parameter estimates from simultaneous method (○) and proposed method (---) by Φ_{S} minimization.

Table
_{C} met stiffness problem and three out of five runs did not finish within the five-day time limit. The incremental approach using either one of the objective functions offered a significant reduction in the computational time over the simultaneous estimation using Φ_{S}, while providing comparable parameter accuracy and concentration and slope fit (see Figure

**Simultaneous method**

**Incremental method**

**min Φ**
_{
S
}

**min Φ**
_{
C
}

**min Φ**
_{
S
}

a. Two out of five runs completed with a relative improvement of the objective function below 1% between iterations. The rest did not converge within the 5-day time limit after iterating for 805, 699, and 568 times. The corresponding Φ_{
C
} at termination were 4.08 × 10^{-2}, 5.05 × 10^{-2} and 6.25 × 10^{-2}, respectively.

CPU time (sec)

17.86 h

534.83 ± 22.12

71.88 ± 6.33

1.17 ± 0.12

44.63 h

eSSM GO iterations

254

3494 ± 348

12 ± 2

10 ± 3

426

Parameter error (%)

75.42

54.36 ± 4.47

75.77 ± 6.11 × 10^{-3}

51.15 ± 1.38 × 10^{-3}

34.98

Φ_{
C
}

- 3.62 × 10

6.06 × 10^{-2} ± 1.14 × 10^{-3}

- 3.52 × 10

4.76 × 10^{-2} ± 3.81 × 10^{-7}

- 3.27 × 10

Φ_{
S
}

2.06 × 10^{-1}

- 1.34 × 10

1.64 × 10^{-1} ± 2.23 × 10^{-5}

- 1.38 × 10

1.60 × 10^{-1}

Simultaneous and incremental estimation of the branched pathway using in silico noisy data (x)

**Simultaneous and incremental estimation of the branched pathway using ****noisy data (×).** (**A**) concentration predictions using parameter estimates from incremental method by Φ_{C} minimization (–––); (**B**) concentration predictions using parameter estimates from simultaneous method (○) and proposed method (---) by Φ_{S} minimization.

Finally, the estimation strategy described in Figure
_{
3
} data were missing. Fluxes _{
3
} and _{
4
} that appear in
_{
1
} was also added to the set such that the dependent fluxes can be uniquely determined from Equation (7). In addition to the parameters associated with the aforementioned fluxes, the initial condition _{
3
}(_{
0
}) was also estimated. The bounds for the rate constants and kinetic orders were kept the same as above, while the initial concentration was bounded within [0, 5].

Table
_{C} simultaneous optimization were again prematurely terminated after 5 days. Meanwhile, the rest of the estimations could provide reasonably good data fitting with the exception of fitting to _{
3
} data as expected (see Figure
**X**
_{
U
} and the larger number of independent parameters. The detailed values of the parameter estimates in this case study can be found in the Additional file

**Simultaneous method**

**Incremental method**

**min Φ**
_{
S
}

**min Φ**
_{
C
}

**min Φ**
_{
S
}

a. Only one out of five runs completed with a relative improvement of the objective function below 1% between iterations. The rest did not converge within the 5-day time limit after iterating for 471, 435, 863 and 786 times. The corresponding Φ_{
C
} at termination were 4.99× 10^{-2}, 4.92 × 10^{-2}, 1.17 × 10^{-2} and 1.57 × 10^{-2}, respectively.

CPU time (sec)

85.03 h

4002.01 ± 696.11

1404.22 ± 120.71

445.47 ± 35.94

eSSM GO iterations

308

365 ± 91

67 ± 10

48 ± 10

Parameter error (%)

71.90

43.50 ± 2.34

68.85 ± 4.57

40.47 ± 0.59

Φ_{
C
}

- 4.54 × 10

6.46 × 10^{-3} ± 4.08 × 10^{-4}

- 3.38 × 10

5.94 × 10^{-3} ± 3.23 × 10^{-5}

Φ_{
S
}

1.03

- 2.99 × 10

8.32 × 10^{-2} ± 4.04 × 10^{-3}

- 2.94 × 10

Simultaneous incremental estimation of the branched pathway with missing X_{3} in silico noisy-free data (x)

**Simultaneous and incremental estimation of the branched pathway with missing **_{3}**: ****noisy-free data (×).** (**A**) concentration predictions using parameter estimates from incremental method by Φ_{C} minimization (---); (**B**) concentration predictions using parameter estimates from simultaneous method (○) and proposed method (–––) by Φ_{S} minimization.

The glycolytic pathway in

The second case study was taken from the GMA modeling of the glycolytic pathway in _{
1
}, fructose 1, 6-biphosphate (FBP) – _{
2
}, 3-phosphoglycerate (3-PGA) – _{
3
}, phosphoenolpyruvate (PEP) - _{
4
}, Pyruvate – _{
5
}, Lactate – _{
6
}, and nine metabolic fluxes. In addition, external glucose (Glu), ATP and Pi are treated as off-line variables, whose values were interpolated from measurement data. The pathway connectivity is given in Figure

L. lactis glycolytic pathway

**glycolytic pathway.** (**A**) Metabolic pathway map (Double-lined arrows: flow of material; dashed arrows with plus or minus signs: activation or inhibition, respectively) and (**B**) the GMA model equations

The time-course concentration dataset of all metabolites were measured using _{
6
}, were directly used for the concentration slope calculation in this case study. In the case of _{
6
}, a saturating Hill-type equation: _{
1
}
^{
n
} / (_{
2
} ^{
n
}) where _{
1
}, _{
2
},

Incremental estimation of the L. lactis model

**Incremental estimation of the **
**
L. lactis
**

Fluxes _{
4
}, _{
7
} and _{
9
} were selected as the DOF, again to give the least number of **p**
_{
I
} and to ensure that **S**
_{
D
} is invertible. All rate constants were constrained to within [0, 50], while the independent and dependent kinetic orders were allowed within [0, 5] and [-5, 5], respectively. The difference between the bounds for the independent and dependent kinetic orders was done on purpose to simulate a scenario where the signs of the independent kinetic orders were known

Table
_{C} and Φ_{S}. The values of the parameter estimates are given in the Additional file
_{C} simultaneous minimization converged within the five-day time limit, even after relaxing the convergence criteria of the objective function to 1%. On the other hand, the incremental estimation using Φ_{C} was not only able to converge, but was also faster than the simultaneous estimation of Φ_{S} that did not require any ODE integration. The incremental estimation using Φ_{C} was able to provide parameters with the best overall concentration fit (see Figure
_{S} does not guarantee that the resulting ODE is numerically solvable, as was the case of simultaneous estimation, due to numerical stiffness. But the incremental parameter estimation from minimizing Φ_{S} can produce solvable ODEs with good concentration and slope fits.

**Simultaneous method**

**Incremental method**

**min Φ**
_{
S
}

**min Φ**
_{
C
}

**min Φ**
_{
S
}

a. None of five runs finished with a relative improvement of the objective function below 1% within the 5-day time limit, after iterating for 60, 147, 93, 79 and 31 times. The corresponding Φ_{
C
} at termination were 9.31, 7.57, 8.77, 9.39 and 12.9, respectively.

CPU time (sec)

>5 days

3476.89 ± 349.63

976.72 ± 31.01

20.82 ± 2.71

eSSM GO iterations

—

1662 ± 282

4 ± 1

33 ± 7

Φ_{
C
}

—

Stiff ODE

- 2.20

6.18 ± 7.28 × 10^{-2}

Φ_{
S
}

—

- 2.67

1.51 × 10^{3} ± 52.50

- 5.79

Discussion

In this study, an incremental strategy is used to develop a computationally efficient method for the parameter estimation of ODE models. Unlike most commonly used methods, where the parameter estimation is performed to minimize model residuals over the entire parameter space simultaneously, here the estimation is done in two incremental steps, involving the estimation of dynamic reaction rates or fluxes and flux-based parameter regressions. Importantly, the proposed strategy is designed to handle systems in which there exist extra degrees of freedom in the dynamic flux estimation, when the number of metabolic fluxes exceeds that of metabolites. The positive DOF means that there exist infinitely many solutions to the dynamic flux estimation, which is one of the factors underlying the parameter identifiability issues plaguing many estimation problems in systems biology

The main premise of the new method is in recognizing that while many equivalent solutions exist for the dynamic flux estimation, the subsequent flux-based regression will give parameter values with different goodness-of-fit, as measured by Φ_{C} or Φ_{S}. In other words, given any two dynamic flux vectors **v**(_{
k
}) satisfying
**p**
_{
I
}, **p**
_{
D
}) may not predict the slope or concentration data equally well, due to differences in the quality of parameter regression for each **v**(_{
k
}). Also, because of the DOF, the minimization of model residuals needs to be done only over a subset of parameters that are associated with the flux degrees of freedom, resulting in much reduced parameter search space and correspondingly much faster convergence to the (global) optimal solution. The superior performance of the proposed method over simultaneous estimation was convincingly demonstrated in the two GMA modeling case studies in the previous section. The minimization of slope error, also known as slope-estimation-decoupling strategy method

There are many factors, including data-related, model-related, computational and mathematical issues, which contribute to the difficulty in estimating kinetic parameters of ODE models from time-course concentration data

The appropriateness of using a particular mathematical formulation, like power law, is an example of model-related issues. As discussed above, this issue can be addressed after the dynamic fluxes are estimated, where the chosen functional dependence of the fluxes on a specific set of metabolite concentrations can be tested prior to the parameter regression
**X**
_{
m
}(_{
k
}) and thus, can be straightforwardly detected.

The proposed estimation method has several weaknesses that are common among incremental estimation methods. As demonstrated in the first case study, the accuracy of the identified parameter relies on the ability to obtain good estimates of the concentration slopes. Direct slope estimation from the raw data, for example using central finite difference approximation, is usually not advisable due to high degree of noise in the typical biological data. Hence, pre-smoothing of the time-course data is often required, as done in this study. Many algorithms are available for such purpose, from simplistic polynomial regression and splines to more advanced artificial neural network

In addition to the drawback discussed above, the proposed strategy requires

Conclusions

The estimation of kinetic parameters of ODE models from time-course concentration data remains a key bottleneck in model building in systems biology. The lack of complete parameter identifiability has been blamed as the root cause of the difficulty in such estimation. In this study, a new incremental estimation method is proposed that is able to overcome the existence of extra degrees of freedom in the dynamic flux estimation from concentration slopes and to significantly reduce the computational requirements in finding parameter estimates. The method can also be applied, after minor modifications, to circumstances where concentration data for a few molecules are missing. While the present work concerns with the GMA modeling of metabolic networks, the estimation strategies discussed in this work have general applicability to any kinetic models that can be written as

Competing interest

The authors declare that they have no competing interests.

Authors’ contributions

GJ conceived of the study, carried out the parameter estimation and wrote the manuscript. GS participated in the design of the study. RG conceived and guided the study and wrote the manuscript. All authors have read and approved the final manuscript.

Funding

Singapore-MIT Alliance and ETH Zurich.