Whitaker Biomedical Engineering Institute, The Johns Hopkins University, Baltimore, MD 21218, USA

Abstract

Background

The dynamics of biochemical reaction systems are constrained by the fundamental laws of thermodynamics, which impose well-defined relationships among the reaction rate constants characterizing these systems. Constructing biochemical reaction systems from experimental observations often leads to parameter values that do not satisfy the necessary thermodynamic constraints. This can result in models that are not physically realizable and may lead to inaccurate, or even erroneous, descriptions of cellular function.

Results

We introduce a thermodynamically consistent model calibration (TCMC) method that can be effectively used to provide thermodynamically feasible values for the parameters of an

Conclusions

TCMC is a simple and flexible method for obtaining physically plausible values for the kinetic parameters of open biochemical reaction systems. It can be effectively used to recalculate a thermodynamically consistent set of parameter values for existing thermodynamically infeasible biochemical reaction models of cellular function as well as to estimate thermodynamically feasible values for the parameters of new models. Furthermore, TCMC can provide dimensionality reduction, better estimation performance, and lower computational complexity, and can help to alleviate the problem of data overfitting.

Background

Physical systems are constrained to operate according to the fundamental laws of thermodynamics. The conservation of mass and energy and the production of entropy (or heat dissipation) dictate that certain events are physically impossible. A broken glass, for example, will not spontaneously reassemble, and a bar of gold will not fortuitously appear from thin air. Not all physical constraints imposed by thermodynamics are intuitively obvious. As a matter of fact, thermodynamic constraints imposed on biochemical reaction systems are routinely overlooked in the literature, either due to ignorance of their existence or difficulties in understanding the implications of modern non-equilibrium thermodynamics. There is an increasing consensus, however, that care must be taken to ensure that the kinetic parameters of a biochemical reaction system meet these thermodynamic constraints

There are many publications discussing the problem of estimating the kinetic parameters of a biochemical reaction system from experimental data of molecular concentrations, when the underlying stoichiometry is known

Recently, there have been several attempts to address the issue of thermodynamic constraints in chemical kinetics

To address these problems, we have recently proposed two techniques for estimating the kinetic parameters of

In this paper, we propose a method for calibrating the kinetic parameters of biochemical reaction models of cellular function so that the resulting systems are thermodynamically feasible. The method, which we refer to as

We exemplify practical aspects of the proposed technique by recalculating the kinetic parameters of a well-known model of the EGF/ERK signaling cascade

We believe that TCMC can be effectively used to recalculate the parameter values of

Results

Biochemical Reaction Systems

Most biological processes of interest to systems biology are modeled by means of

An open biochemical reaction system is comprised of _{1}, _{2},..., _{N }

where _{
nm
},

initialized by setting _{
n
}(0) = _{
n
}, _{
n
}, _{m}
**k**) is the net flux of the ^{th }reaction, ^{th }molecular species associated with the ^{th }reaction, **k **is a vector of kinetic parameters **k **characterize the biochemical reaction system at hand and are independent of the molecular concentrations

By appropriately pruning and modifying an open biochemical reaction system, we can derive a _{0 }⊆ ℳ. Finally, we remove all species that are no longer involved with any of the reactions in ℳ_{0}, leaving only the molecular species

The main rationale behind the second step is that the kinetic parameters **k **considered in this paper are assumed to be independent of the molecular concentrations. As a consequence, the values of these parameters will not change if the concentrations of the clamped species are allowed to vary. Therefore, we can construct a (possibly artificial) situation in which the concentrations of the clamped species vary as if they were dynamic species. Because our goal in creating the closed subsystem is to discover and enforce thermodynamic constraints on the kinetic parameters, we must include the clamped species in our model. This is contrary to simply removing all clamped species and the associated reactions, since this approach will not allow us to determine thermodynamically feasible values for the kinetic parameters of the removed reactions.

The third step is due to a simplification imposed on us by the current state of non-equilibrium thermodynamics. Thermodynamically dependent reactions influence each other, since the thermodynamic force of one reaction may drive the other reaction and vice versa. Unfortunately, it is not clear at this point how to deal with thermodynamically coupled reactions. Future research may be necessary to address this issue.

The resulting closed biochemical reaction subsystem is comprised of _{0 }molecular species _{0 }coupled

initialized by _{n}
_{n}
_{0 }by the generalized mass-action rate law

for _{0}, where _{2m-1}, _{2m
}are the (generalized) rate constants of the ^{th }forward and reverse reactions, respectively. The quantity _{m}
**
x
**(

It is a direct consequence of thermodynamic analysis that a closed biochemical reaction system will asymptotically reach a

As a consequence,

These constraints must be satisfied by the rate constants in order for the closed biochemical reaction system to be thermodynamically feasible.

The constraints implied by (5) correspond to the reaction 'cycles' in the system. A reaction cycle is comprised of those reactions corresponding to the nonzero elements of a vector in the null space null (**
s
**the

which are the well-known Wegscheider conditions **k **by **k **that satisfies (6) is a member of

We want to emphasize that, in open biochemical reaction systems, the rate constants of the reversible reactions must also be constrained by the Wegscheider conditions, even if the system is far from equilibrium. To identify these constraints, we need to prune an open biochemical reaction system into a closed subsystem, by employing the technique discussed previously, and use the resulting stoichiometry matrix

An equally important observation is that the rate constants of the reactions pruned from an open system are not constrained by the Wegscheider conditions, since (4) must only be satisfied by the reactions in the closed subsystem. Furthermore, if a reaction _{m }
**
b
**∈ null (

It can be shown (see Additional file _{0 }equal to the elements of a vector in the null space of the stoichiometry matrix

**In this document, we provide supplementary mathematical and computational details required to fully understand the material presented in the Main Text**.

Click here for file

where ^{23}mol^{-1 }is the Avogadro number, _{B }
^{-23}JK^{-1 }is the Boltzmann constant. According to the second law of thermodynamics, the entropy production rate must always be greater than or equal to zero, with equality if and only if the system is at thermodynamic equilibrium. It is therefore clear from (7) that the Wegscheider conditions imply that the entropy production rate must be zero in this case (i.e., the system must be at thermodynamic equilibrium). As a consequence, the chemical motive force (which is the amount of energy added to the system per unit time due to mass exchange through its boundary) and the heat dissipation rate must also be zero. This makes intuitive sense, since a reaction cycle leaves all molecular concentrations unchanged and, therefore, there is no change in internal energy or mass flow through the system boundary. Clearly, we can think of the Wegscheider conditions as being a direct consequence of the thermodynamic requirement that **
b
**) = 0, for every

Linear constraints

Unfortunately, (6) imposes a possibly infinite number of non-linear constraints on the rate constants of a closed biochemical reaction system. However, it is sufficient to satisfy (6) for _{2 }= _{0 }- _{1 }basis vectors {**
b
**

where ^{th }component of the ^{th }basis vector **
b
**

where **
κ
**:= ln(

We should note that there might be additional linear constraints that we may wish to impose on the logarithms of the kinetic parameters of a biochemical reaction system. Here are some examples:

• By using an appropriate experimental procedure, we may be able to accurately measure the equilibrium constant _{m }
^{th }reversible reaction. For a (generalized) mass-action reaction, we have _{m }
_{2m-1}/_{2m
}and thus we obtain a linear constraint ^{th }reaction, where

• For a reversible Michaelis-Menten reaction, the Haldane relationship implies a linear constraint between the logarithms of the kinetic parameters of a reversible Michaelis-Menten reaction and

• By using experimental techniques, such as plasmon resonance or atomic force microscopy, it may be possible to obtain a highly accurate measurement _{
j
}. In this case, we must impose the (trivial) constraint **
e
**

• To reduce the dimensionality of parameter estimation, we may employ a sensitivity analysis approach, such as the one proposed in

• We may want to expand an existing (validated) thermodynamically feasible model to include additional reactions and molecular species. We can do this by fixing the parameters of the existing model using linear constraints ^{th }parameter value of the existing model. Then estimation takes place only on the parameters associated with the new reactions.

For a given biochemical reaction system, we can combine all possible linear constraints on the logarithms **
κ
**of the kinetic parameters

where **
c
**is an

Model calibration

We will now assume that we have obtained noisy measurements **
y
**in some well-defined sense. Note that, for a given

There are many estimation techniques we can use to address the previous problem, such as maximum likelihood or Bayesian inference. The final product of these techniques is a cost function **
κ
**|

where **
κ
**'s satisfying the linear constraints given by (10); i.e.,

This error criterion is a consequence of a maximum likelihood approach to parameter estimation under the assumption of normally distributed observation errors. Note that the cost **
κ
**through the molecular concentrations

In this paper, we refer to the constrained optimization problem given by (11) as **κ**
_{0 }is a ^{d }
**
κ
**

where

Note that we assume here that there is more than one solution to **
κ
**that will simultaneously satisfy all necessary constraints, indicating that we must reformulate the problem.

The objective function _{0 }is non-convex with possibly many local minima. As a consequence, a gradient-based optimization algorithm for solving (13) may prematurely terminate at a local minimum with much larger cost than the globally minimum cost. To ameliorate this problem, we have decided in this paper to use a stochastic optimization algorithm, namely simulated annealing (SA) **
v
**is proposed nearby the current value. The proposed value becomes the new value with a certain probability based on cost improvement. If the proposed value is not accepted, then the current value is used. The proposed value is usually drawn from an appropriately chosen probability distribution around the current value (e.g., a Gaussian distribution centered at the current value). See Additional file

A natural question that arises here is whether different choices for **
κ
**

Example

We now demonstrate the proposed TCMC method by re-estimating the kinetic parameters of a classical model of the EGF/ERK signaling cascade

Although the Schoeberl model has provided valuable insights into the biological mechanisms underlying EGF/ERK signaling, the values of the kinetic parameters published in the literature are thermodynamically infeasible. As a consequence, the concentration dynamics produced by the published model are physically impossible and could not occur in nature. By using TCMC to recompute thermodynamically feasible values for the kinetic parameters, we can construct a physically realizable model whose dynamics are expected to reflect the true behavior of EGF/ERK signaling more accurately than the dynamics produced by the published model.

We use the version of the Schoeberl model published in the BioModels database

We implement the following TCMC procedure to re-estimate the values of the kinetic parameters in a thermodynamically consistent manner. First, we find the closed subsystem of the Schoeberl model (see Additional file _{0 }= 93 molecular species and _{0 }= 83 reversible elementary (monomolecular and bimolecular) reactions. Then, we determine the thermodynamic constraints by using the 93 _{2 }= 83 - 65 = 18 independent reaction cycles, determined by the columns of matrix **Equation (S.1) **of Additional file **Table S.2 **in Additional file

Next, we construct matrix **
c
**by combining the 18 thermodynamic constraints with 167 linear equality constraints originally built into the model that relate various parameters across reactions (see Additional file

We take the thermodynamically feasible log kinetic parameter values **
κ
**

In Figure **Figure S1 **of Additional file

ERK-PP concentration dynamics, measured in mol/m^{3}, under two different input EGF concentrations

**ERK-PP concentration dynamics, measured in mol/m ^{3}, under two different input EGF concentrations**. The Additional file

We should note here that the ERK-PP dynamics originally published in

The solid curves in Figure **Figure S1 **of Additional file

To separate the effect of collective fitting versus imposing the underlying thermodynamic constraints on the kinetic parameters, we use the same simulated annealing algorithm employed by TCMC to estimate the kinetic parameters without including the thermodynamic constraints. The resulting (thermodynamically infeasible) estimated dynamics are depicted by the dashed curves in Figure **Figure S1 **of Additional file

Discussion

Qualitative/quantitative value of thermodynamic consistency

Due to lack of thermodynamic consistency in the parameter values of the published Schoeberl model, the molecular dynamics produced by this model cannot possibly occur in nature. Because the values estimated by the proposed TCMC method satisfy all necessary thermodynamic constraints, it is expected that the resulting TCMC model will provide a more accurate representation of EGF/ERK signaling than the published Schoeberl model.

To provide an example of a potentially important difference between the published model and the TCMC model calculated in this paper, we consider the

In view of the fact that differences in the integrated response of ERK-PP activity may cause distinct biological outcomes, it is reasonable to believe that a key objective of EGF/ERK signaling is to maintain robust integrated response to changes in input EGF concentration while producing a quick and sharp 'switch-like' transition between states of differing biological outcomes. In Figure ^{-2 }ng/mL, when the EGF concentration decreases below 10^{-2 }ng/mL, the integrated response of ERK-PP activity predicted by the TCMC model exhibits a sharper transition from large to small values than the one predicted by the published model. As a matter of fact, the TCMC model predicts seven orders of magnitude decrease in the integrated response, when the input EGF concentration decreases from 10^{-2 }ng/mL to 10^{-3 }ng/mL, whereas, the published model predicts only four orders of magnitude decrease. By considering the discussion in

Integrated response of ERK-PP activity, measured in mol · min/m^{3}, as a function of input EGF concentrations

**Integrated response of ERK-PP activity, measured in mol · min/m ^{3}, as a function of input EGF concentrations**.

We will now show that the TCMC model may result in a biologically plausible prediction of ERK-PP activity that can also be different than the one produced by the estimated thermodynamically infeasible model using collective fitting. In Figure

Long-term ERK-PP concentration dynamics, measured in mol/m^{3}, predicted by the estimated thermodynamically infeasible model (dashed curves) and the TCMC model (solid curves) under 0.125 ng/mL (blue curves) and 0.0625 ng/mL (red curves) input EGF concentrations

**Long-term ERK-PP concentration dynamics, measured in mol/m ^{3}, predicted by the estimated thermodynamically infeasible model (dashed curves) and the TCMC model (solid curves) under 0.125 ng/mL (blue curves) and 0.0625 ng/mL (red curves) input EGF concentrations**. The inset shows the short-term behavior predicted by the two models as well as the corresponding densitometric data (blue and red circles).

Our previous examples show that thermodynamic consistency may result in model behavior that is different than the one predicted by thermodynamically infeasible models of cellular function. However, more research is needed to experimentally validate observed differences and demonstrate that lack of thermodynamic consistency may indeed result in inaccurate (or even false) biological predictions.

Flux analysis

Flux-based analysis of biochemical reaction systems is a widely used method for understanding the principles underlying the production and regulation of mass flow in cellular systems, such as signaling or metabolic pathways. It turns-out that the Wegscheider conditions, given by (6), constrain the reaction fluxes. If flux analysis does not take into account these constraints, then it may lead to inaccurate or misleading conclusions about the behavior and properties of mass flow in biochemical reaction systems.

If {**
b
**

where ^{th }element of the basis vector **
b
**

Since TCMC always leads to a thermodynamically feasible biochemical reaction system with parameters satisfying the Wegscheider conditions, the flux constraints imposed by (15) are satisfied as well. Thus, thermodynamically consistent flux analysis can be performed on the resulting system without any additional considerations, and the behavior of the system is always physically realizable.

Bias-variance tradeoff and overfitting

In addition to the previous advantages, there is an important statistical benefit for thermodynamically constraining the parameters of a biochemical reaction system. By searching for kinetic parameter values within a thermodynamically consistent subset of the parameter space, we may reduce the variance of estimation and thus lower the estimation error through the well-known bias-variance tradeoff.

The mean-square error (MSE) **
κ
**

Generally speaking, imposing constraints on the parameters may increase the bias term but decrease the variance. However, since the true parameter values must satisfy the thermodynamic constraints, we expect a decrease in variance without an increase in bias. As a consequence, searching for parameter values within the thermodynamically consistent subspace of the parameter space may lead to a lower mean square error in cost due to smaller variance. Since the volume of a search space grows exponentially in the dimension of the space, gains in variance (and hence improvements in the mean square error) are expected to be large.

A related statistical notion in estimation problems is data

Most often, the behavior of biochemical reaction systems is only influenced by a small number of parameters (due to robustness of such systems to the underlying kinetic parameters). This reduces the actual number of parameters that must be estimated with precision. Moreover, the thermodynamic constraints further reduce the number of parameters to be estimated, alleviating some overfitting concerns. Imposing additional parameter constraints, such as the ones employed by the Schoeberl model, may further be used to combat this issue. Unfortunately, model complexity is much higher than the amount and quality of available data in most problems of systems biology and overfitting remains a major concern even when using TCMC. In the example considered in this paper, time series data is only available for one crucial chemical species. As a consequence, it is natural to expect that the dynamics produced by the TCMC model overfit the available data to a certain extent. Thus, when more experimental data become available, TCMC must be rerun in order to produce a better calibration of the model, with a new cost function that includes the additional data.

In light of these concerns, some may argue that collective fitting of model parameters is not the correct approach, and that a reductionist approach is more appropriate (i.e., attempting to measure parameters individually and then combine the results to determine an appropriate model calibration). Unfortunately, the reductionist approach is time consuming, extremely expensive, and, in most cases, impossible with current experimental techniques. Moreover, incorrect implementation of a reductionist approach may lead to a thermodynamically infeasible model calibration. This is clearly the case with the Schoeberl model (and most probably with other models published in the literature).

TCMC is a collective fitting procedure, but offers a pragmatic compromise to the reductionist approach. In light of the fact that some parameters may be measured individually with extreme precision, TCMC allows for these parameters to be fixed to their measured values using matrix

Computational advantages

According to the 'curse of dimensionality,' which refers to an exponential increase in the volume of the parameter space as its dimension grows, estimation becomes substantially harder in high dimensional spaces. A 'naïve' search of that space, in an effort to find the 'true' parameter values of a biochemical reaction system [assuming that these values minimize the cost function given by (12)], is hopeless. As a matter of fact, the probability of obtaining parameter values that satisfy the Wegscheider conditions and other underlying log-linear constraints by uniformly sampling the entire parameter space (which is an 'easier' problem than finding the 'true' parameter values) is zero. As a consequence, the constraints must be explicitly considered by the optimization problem at hand to have any hope of successfully solving the problem of model calibration.

As a matter of fact, since the feasible manifold is of lower dimension than the entire parameter space, methods that do not consider the underlying thermodynamic and non-thermodynamic constraints will spend most time searching the immense infeasible portions of the parameter space. The imposition of constraints among the kinetic parameters of a biochemical reaction system reduces the dimensionality of the parameter space to a smaller feasible region and make parameter estimation computationally easier. TCMC makes this explicit, by performing optimization over a lower dimensional space, spanned by the lower dimensional vectors **
v
**, instead of the entire parameter space, spanned by the higher dimensional vectors

Conclusions

For a biochemical reaction system to be physically realizable, it is required that the underlying kinetic parameters satisfy certain thermodynamic constraints, known as Wegscheider conditions. This issue has been largely ignored in the literature, as evidenced by the fact that many published models violate these constraints. The model calibration method we have proposed in this paper can be effectively used to determine thermodynamically consistent values for the kinetic parameters of any set of reactions in an ideal homogeneous mixture at constant temperature and volume. Our method is simple to understand and implement. Moreover, it can be easily incorporated into any existing or newly proposed calibration technique in order to make sure that the resulting model satisfies the fundamental laws of thermodynamics as well as other desirable conditions and constraints.

There are two major issues associated with calibrating biochemical reaction systems:

1. The quality and quantity of available data are inadequate to allow sufficient estimation of all underlying parameter values.

2. Biochemical reaction models contain many parameters whose numbers dramatically increase with model size and detail. As a consequence, the curse of dimensionality seriously hampers estimation algorithms.

The first issue is primarily associated with current limitations of experimental methods and approaches. To address this issue, we need substantial improvements in experimental equipment and methodologies. However, TCMC scales well with future improvements in data quality and quantity. Matrix

The second issue is the largest obstacle facing model calibration techniques. To reduce dimensionality, we must attempt to exploit mathematical structure particular to the biological problem at hand. TCMC attempts to address this problem in two ways:

• First, TCMC uses the fact that there are fundamental physical principles underlying biochemical reaction systems that may constrain the set of possible kinetic parameter values. As a consequence of the fundamental laws of thermodynamics, most complex biochemical networks contain reaction cycles that constrain the kinetic parameters according to (9). These constraints allow TCMC to reduce dimensionality by restricting the estimation problem on a smaller thermodynamically feasible subset of the parameter space.

• Second, experimental data and mathematical analysis can often provide other forms of constraints on the underlying parameters (e.g., through directly measuring rate or equilibrium constants, by determining Haldane relationships between enzymatic parameters, etc.). In particular, sensitivity analysis may reveal non-influential kinetic parameters that can be set to some nominal values without appreciably affecting system behavior. All these additional constraints can be accounted for by the

As we mentioned before, dimensionality reduction is made explicit by TCMC, since optimization takes place over a smaller dimensional vector **
v
**, instead of the higher dimensional vector

Recently, a method has been proposed in the literature for inferring a complete and consistent set of kinetic parameter values from incomplete and inconsistent data

A problem that we have not addressed in this paper is the influence of ions, such as K^{+}, and Ca^{2+}, and certain environmental factors, such as the temperature and pH, on the thermodynamic behavior of a biochemical reaction system

Another problem that we have not addressed here is constructing new biochemical reaction models of cellular function. Since, in this paper, we only address the model calibration problem, we take (1) as given and proceed to determine the parameter vector **k **from data. In general, determining the structure (i.e., the stoichiometry) of a biochemical reaction network is an extremely laborious task. Preliminary work indicates that thermodynamics can also play a key role in estimating the structural complexity of biochemical reaction systems

Authors' contributions

GJ developed the general methodology and coded a substantial portion of the software. JG derived many theoretical results and ideas and wrote substantial portions of the final document. GJ and JG both interpreted the obtained computational results and approved the final version of the paper.

Acknowledgements

This research was supported in part by DoD, High Performance Computing Modernization Program, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a, and in part by the National Science Foundation (NSF), GRANT CCF-0849907.