The importance and power of isotope-based metabolic flux analysis and its contribution to understanding the metabolic network is increasingly recognized. Its application is, however, still limited partly due to computational inefficiency. 13C metabolic flux analysis aims to compute in vivo metabolic fluxes in terms of metabolite balancing extended by carbon isotopomer balances and involves a nonlinear least-squares problem. To solve the problem more efficiently, improved numerical optimization techniques are necessary.
For flux computation, we developed a gradient-based hybrid optimization algorithm. Here, independent flux variables were compactified into [0, 1)-ranged variables using a single transformation rule. The compactified parameters could be discriminated between non-identifiable and identifiable variables after model linearization. The developed hybrid algorithm was applied to the central metabolism of Bacillus subtilis with only succinate and glutamate as carbon sources. This creates difficulties caused by symmetry of succinate leading to limited introduction of 13C labeling information into the system. The algorithm was found to be superior to its parent algorithms and to global optimization methods both in accuracy and speed. The hybrid optimization with tolerance adjustment quickly converged to the minimum with close to zero deviation and exactly re-estimated flux variables. In the metabolic network studied, some fluxes were found to be either non-identifiable or nonlinearly correlated. The non-identifiable fluxes could correctly be predicted a priori using the model identification method applied, whereas the nonlinear flux correlation was revealed only by identification runs using different starting values a posteriori.
This fast, robust and accurate optimization method is useful for high-throughput metabolic flux analysis, a posteriori identification of possible parameter correlations, and also for Monte Carlo simulations to obtain statistical qualities for flux estimates. In this way, it contributes to future quantitative studies of central metabolic networks in the framework of systems biology.
In recent years, metabolic flux analysis (MFA) has become an important tool for quantifying metabolic pathways which is essential for in-depth understanding of biological systems. Among the developed tools, 13C flux analysis utilizing 13C labeling patterns of metabolic products that result from feeding 13C-labeled substrates provides detailed information about intracellular pathway fluxes in vivo [1-3]
13C-based MFA requires carbon flux modeling through the metabolic network, which describes the mathematical relationship between unknown fluxes and the available measurement data set. It requires modeling two connected equation systems, which describe reaction stoichiometry between metabolites and between carbon isotopomers, respectively. Using the model, fluxes can be computed from the measurements by solving a nonlinear least-squares problem (NLSP). The stoichiometric network is a linear equation system of material balances given for the reactions between metabolites, i.e., S·ν = 0 with S denoting the stoichiometric matrix and ν the fluxes. The fluxes consist of non-measured intracellular fluxes νu and measured effluxes νm, i.e., ν = (νu, νm)T. Typically, realistic models are underdetermined, that is, the rank of S is smaller than the number of entries in νu . This difference between rank(S) and the number of entries of νu equals the number of fluxes that have to be chosen as the design parameters Θ (independent flux variables) and are required when parametrizing the network such that ν = Fflux(Θ).
Typically, modeling carbon isotopomer networks involves an equation system containing a few to several hundred variables to balance reactions between isotopomers [1,3,4]. The equation system of carbon isotopomer reactions is an implicit function Fcarbon(x, ν) = 0, bilinear but square with respect to carbon isotopomer fractions (x) and nonlinear with respect to fluxes, where x consists of non-measured xu and measured xm. Among the available tools, the cumomer concept developed by Wiechert et al. (1999) provides explicit solutions for carbon isotopomer fractions (x) by transforming the bilinear system into a cascade of linear systems . In addition to this, the explicit partial derivatives of the cumomer network with respect to fluxes are obtainable, which is useful for gradient-based optimization algorithms. When 13C labeling information is applied, intracellular fluxes are determined by means of numerical optimization that seeks a constrained minimum of independent flux variables (Θ) to an objective function, e.g.,
Here f(Θ) denotes the objective function to be minimized with respect to Θ and (Θ) signifies the model function corresponding to the measured data set η = (η1, η2,..., ηn)T consisting of measured 13C labeling data (xm). Such data are typically received as mass isotopomer distributions measured by mass spectrometry (MS) and/or fractional carbon labeling measured by nuclear magnetic resonance (NMR) techniques, and effluxes (νm). The measurement error ε is typically assumed to have a normal distribution such that ε ∈ N(0, Ση), where Ση is the covariance matrix of measurements.
For 13C-based MFA, the applied algorithms for numerical flux estimation are mainly gradient-based local optimization [6,7] or gradient-free global optimization [8-11] such as simulated annealing (SA) or genetic algorithms (GAs). Also, a hybrid technique of global-local optimization has been applied . Such algorithms are described in detail elsewhere [13-15]. The stochastic global optimization methods can be inefficient due to the time required to obtain the so-called asymptotic convergence or reachability in high dimensional parameter spaces [11,16-21]. Moreover, such algorithms of random nature may fail to find the global solution unless the number of samplings tends to infinity, which is practically impossible [21,22]. In comparison, the gradient-based local optimizations have a much higher convergence speed, but the solution quality depends heavily on starting points [14,19]. Further to this, reaching the global optimum is ensured only for convex problems, whereas it is nontrivial to determine the convexity of general nonlinear problems with nonlinear constraints . Thus, one may obtain solutions that are not necessarily global  and that might vary depending on starting points.
In this regard, a robust method is highly desirable which guarantees a convergence with speed and accuracy for the nonlinear flux estimation problem. In the present work, we developed a method for efficient parametrization of metabolic network by compactification. Additionally, a mathematical method is suggested for model identification that solves a priori flux identifiability problems regarding 13C labeling experiments in terms of model linearization. On this basis, an optimization algorithm was developed that hybridizes two gradient-based optimization tools. The developed approaches were evaluated using the central metabolism of Bacillus subtilis with succinate and glutamate feeding as the only carbon sources. We examined whether global flux solutions are obtainable using the gradient-based deterministic algorithm.
In this section, we describe the mathematical and computational procedures developed and their application to the nonlinear problem of metabolic flux estimation in a realistic metabolic network of Bacillus subtilis. The model represents an inherent difficulty arising from only succinate and glutamate feeding as carbon sources. Succinate is a symmetric molecule having only two distinguishable carbon atoms. On the other hand, glutamate is also converted into succinate. Due to this, the information that is introduced by the 13C-labels of the substrates is severely limited.
Parametrization of stoichiometric network
To formulate a non-linear least squares problem (NLSP), the system of interest has to be parametrized. The stoichiometric network, which is a rank-deficient linear system (S·ν = 0), was parametrized using the following three steps.
The stoichiometric matrix S is transformed into its reduced row echelon form SRRE using Gauss-Jordan elimination with partial pivoting. Subsequently, each row and column of SRRE is analyzed to identify the dependent and independent variables. The first non-zero element in each row of SRRE that is always 1 is called 'leading 1'. If the ith column of SRRE contains only zeros and one leading 1, then the ith element of ν is dependent, otherwise it is independent. The number of fluxes selected as independent equals the degrees of freedom in the stoichiometric network, that is, the difference between the number of variables and the rank of S. Once this step has been completed, we will see that certain intracellular fluxes and all effluxes (νm) are chosen as independent.
Physiologically, metabolic fluxes are constrained such that 0 ≤ ν < ∞, if reversible reactions are considered as two independent reactions. For the effluxes, the corresponding measurements are available. Hence, the effluxes can be bounded, e.g., mean value ± × standard deviation, where denotes the inverse of χ2-cumulative distribution function at a certain confidence level of φ, and the resulting range covers 100 × φ % of possible experimental observations.
The intracellular fluxes selected as independent can be compactified using a single rule such that:
The above compactified flux variables ϕ, the [0, 1)-fluxes, are a bijection of [0, ∞) domain onto [0, 1) range: if νi → 0, ϕi → 0 and if νi → ∞, ϕi → 1 for a certain real positive number of the parameter scaling constant α. These [0, 1)-fluxes can potentially elevate the output sensitivity and, thus, the convergence speed. The output sensitivities of a carbon flux system with respect to fluxes and [0, 1)-fluxes are ∂xm/∂ν and ∂xm/∂ϕ, respectively, where the latter equals (∂xm/∂ν)·(∂ν/∂ϕ) by the chain rule. Differentiating νi with respect to ϕi results in
from (2). If α > (1 - ϕi)2 holds and, thus, dνi/dϕi > 1, the sensitivity given by ∂xm/∂ϕ = (∂xm/∂ν)·(∂ν/∂ϕ) increases. Particularly, a higher sensitivity can always be obtained by setting α ≥ 1 due to finite values of fluxes, that is, 0 ≤ νi < ∞ or 0 ≤ ϕi < 1. Hence, setting the parameter scaling constant α ≥ 1 is more preferable for numerical optimization than α > 0. Moreover, the mapping such as (2) has proven to yield an extremely low curvature of 13C labeling in the parameter space and is advantageous for model linearization . The constant α can be adjusted during the optimization. This is described in detail in the Appendix.
By symbolically solving the equation system consisting of stoichiometric balances and flux constraints such as (2) for the dependent fluxes (νdepend), we get explicit expressions for all dependent fluxes such that νdepend = Fflux(Θ) with Θ = (ϕ1, ϕ2,..., νm1, νm2,...)T.
Flux identifiability by model linearization
One important question to be answered prior to the numerical flux computation is the a priori parameter identifiability. To this end, the so-called Buchberger's algorithm  can be applied to compute the Gröbner basis . However, this polynomial algebraic method can be very time consuming and, thus, is restricted to small systems, corresponding to the fact that the Gröbner basis can be extremely large for a metabolic carbon labeling system [27,28].
In previous works [28,29], the Jacobian matrix of the measurement model has been utilized for model identification: the matrix determinant or rank were applied to the flux identifiability analysis in carbon labeling systems. However, these approaches do not tell how to sort out identifiable flux variables in case not all fluxes are identifiable.
As an alternative, the first order optimality condition and null space analysis can be applied to model identification. For this purpose, the linear algebra approach is now extended to answer the question of the a priori flux identifiability problem regarding 13C isotopomer analysis, which begins with model linearization. At the solution of the nonlinear problem (1), the gradient ∇f is expected to be 0 by the first order optimality condition, as in:
Here, J( ) represents the Jacobian matrix that equals ∂ /∂Θ evaluated at . Assuming that the parameter estimate is close to its true value , the above equation can linearly be approximated in the neighborhood of by the first-order Taylor series expansion such that:
Solving the above equation for gives the particular solution of (1) as follows:
Assuming that the above linearization is a good approximation in the vicinity of solution, the a priori identification problem can be answered using linear system theory, similarly as applied for stoichiometric metabolite balancing . The theory is now extended to the linearized carbon labeling system: the general solution of (1) can be formulated as a linear combination of the particular solution (6) with its homogeneous part, i.e.,
Here, null(J( )) is the null space of J( ) and β is a vector containing arbitrary non-zero values. The ith element of the search step whose corresponding row of null(J( )) consists of zeros can be determined uniquely from the measurement η. Thus, the unique solution of the corresponding ith design parameter can be computed numerically.
Hybrid optimization algorithm
Due to the parametrization introduced above, the flux estimation problem (1) can now be formulated such that:
where f(Θ) denotes the objective function specified in (1) and Θ = (ϕ1, ϕ2,...,ϕn, νm1, νm2,...)T the design parameter vector consisting of the [0, 1)-fluxes ϕi. The effluxes νm comprise substrate uptake as well as product and biomass formation with known standard deviation σm, and νdepend corresponds to all unknown intracellular fluxes.
To solve the above constrained NLSP, we developed a logical algorithm (Figure 1) that interactively hybridizes two gradient-based local optimization methods, that is, the sequential quadratic programming (SQP)  and the subspace trust-region method based on the interior-reflective Newton method (STRiN) . The developed method performs a series of sub-optimization trials by interactively switching between SQP and STRiN using the following features.
Figure 1. Developed hybrid optimization algorithm with tolerance adjustment consisting of the features: initialization within the feasible region (A); initial optimization using the SQP (B); interactive hybrid process using SQP (C); STRiN (D); and optimization control algorithm (E). f(Θ*k): objective function value at the current local minimizer Θ*k; χ2UL: upper limit of f(Θ*k) to invoke STRiN (if fΘ*k) <χ2UL; α: parameter scaling constant; Θ0: initial guess; Θ*: local minimizer from a successful sub-optimization; Θ°: iterate recorded for the smallest function value up to the current optimization trial; : ultimate minimizer.
Analytical gradient and Hessian
Typically, a gradient-based optimization problem is solved more accurately and with higher efficiency when analytical gradients are provided compared to numerical gradients, which are calculated by finite-difference approximation. By providing the analytical Jacobian matrix consisting of ∂xm/∂ϕ and ∂νm/∂Θ, the gradient of the objective function formulated in (A.1) of the Appendix can be calculated. The analytical Hessian (A.2) is calculated when STRiN is used. In comparison to this, SQP updates the Hessian using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula based on linearization of the gradient .
Adjustment of tolerance and parameter scaling constant
Usually, the tolerances placed on parameters or on the objective function which are utilized as termination criteria for numerical optimization cannot clearly be defined in advance. They depend on the errors associated with user-supplied data as well as output sensitivities. Thus, tolerances are set rather empirically. Here, we suggest a method to adjust tolerance values and find a proper value by repeatedly restarting optimization trials. The tolerance adjustment by the restart is implemented in the developed algorithm as shown in Figure 1-E. Due to this, the complete process consists of a series of sub-optimization trials. At each kth trials, tolerance values are adjusted.
Further to this, the parameter scaling constant α is also adjusted during the optimization. As mentioned in the previous section, α ≥ 1 is the preferable choice due to the finite values of metabolic fluxes. In practice, α is regulated to render the model's Jacobian matrix into better condition. This is obviously advantageous for the gradient-based method, which typically involves matrix inversion to compute the search direction as shown by (6). The exact procedure how to adjust tolerance and the parameter scaling constant is described in the Appendix.
SQP represents the state-of-the-art for solving constrained nonlinear optimization problems. By introducing the Lagrangian function, SQP solves a series of quadratic programming sub-problems, each involving the minimization of a quadratic approximation of the objective function subject to a linear approximation of the constraints. It shows its strength when solving problems with significant nonlinearity and from remote starting points [14,33].
STRiN solves a nonlinear large problem by linear approximation and the method of preconditioned conjugate gradients. This algorithm is based on the trust-region algorithm, is computationally inexpensive, and provides rapid convergence in the vicinity of the solution. A downside of STRiN is that it accepts any direction of negative curvature, even when this direction gives an insignificant reduction in the objective function 
The key concept of the developed algorithm is hybridizing the merits of these two optimization algorithms, i.e., the loose starting-point-dependency of SQP and the convergence speed of STRiN in the solution vicinity. During the optimization trails with the tolerance adjustment, we get a feasible point that may be a more suitable initial guess for the next trial. Accordingly, we initiate a few trials under "relaxed" tolerance conditions using a robust algorithm that is less dependent on the quality of the initial points. Subsequently, the local minima obtained can be tested by another algorithm whether it gives a significant improvement with a rapid convergence.
In this context, the hybridization is carried out as shown in the flow chart in Figure 1. The algorithm is forced to initiate within the feasible region by generating an arbitrary starting point Θ0 subject to the inequality constraint νdepend(Θ0) ≥ 0 (Figure 1-A). This can be done by generating random numbers of design parameters within their bounds given by (8). For cases in which the random generation is expensive, e.g., when a design parameter has a very narrow range that satisfies the constraints, a deterministic method might be employed to get a starting point in the physically possible region. This is discussed in the Appendix.
As soon as a feasible set of initial values is obtained, the optimization starts using the SQP algorithm under relaxed tolerance criteria (Figure 1-B). After a few successful trials, an upper limit (χ2UL) that schedules the initiation of STRiN is placed, e.g., half of the current objective function value, χ2UL = 1/2 f(Θ*k), where Θ*k is the local minimizer isolated from the kth sub-optimization trial. Afterwards, if the objective value f(Θ*k) resulting from the current SQP trial (Figure 1-C) is smaller than χ2UL, the STRiN optimization is activated (Figure 1-D); otherwise, the SQP is repeated. During the STRiN optimization, the progress of trials is monitored to prevent the non-reducing problem associated with the STRiN algorithm mentioned above. As a criterion of the progress check, the relative improvement of |f(Θ*k) - f(Θ*k-1)|/f(Θ*k) can be measured at each STRiN termination. If the improvement is insignificant, i.e., less than a user-specified value (e.g., 0.05), the STRiN receives penalty. In case a few successive trials show insignificant improvement, the algorithm sets a new upperlimit χ2UL and returns to the SQP optimization. For the restart, the tolerance of the previous SQP trial is reutilized. This prevents too large changes in tolerance, which can be caused by the STRiN trials with insignificant improvement.
As a termination criterion of the hybrid optimization, the changes in local optima are measured (Figure 1-C), e.g., the absolute value of the slope resulting from the linear regression of y = (f(Θ*k-4), f(Θ*k-3),...f(Θ*k))T with respect to x = (1, 2,..., 5)T. If the absolute slope is less than a user-specified small value for a few further trials, the optimization can be terminated ultimately.
Application to Bacillus subtilis metabolic network
To evaluate the developed methods, a metabolic network of the wild type B.subtilis was constructed as shown in Figure 2 based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database specified for the strain. The network is composed of catabolic reactions of the central metabolism incorporating glycolysis/gluconeogenesis, pentose phosphate pathway, TCA cycle, C3/C4 inter-conversion and anabolic reactions. All effluxes including the biomass yield YXS in Figure 2 were assumed to be measured experimentally from a batch-cultivation on succinate and glutamate as carbon sources. All flux values specified in Figure 2 are generated from arbitrary values of Θdefault and normalized by the glutamate uptake rate. Note that each anabolic flux given in Figure 2 is the product of YXS (biomass production [gDW L-1 h-1] normalized by glutamate uptake [mM h-1]) and a value that specifies the precursor requirement for growth (mmol precursor per g biomass) adopted from literature data . For bidirectional reactions, the fluxes in the gluconeogenetic direction were declared as forward.
Figure 2. Metabolic network of the central metabolism of Bacillus subtilis utilizing glutamate and succinate as co-substrates. All flux values denoted in parentheses were generated by obeying the given stoichiometry. Effluxes and biomass yield were measured experimentally. The symbol 'ν' indicates the flux, the subscript 'r' the reverse flux of the bidirectional flux pair, the subscript 'ex' extracellular pools of substrates and products and YXS the biomass yield in g(biomass)/mmol(glutamate). All flux values are normalized by the glutamate uptake rate.
The metabolic network in Figure 2 has 27 intracellular fluxes (10 bidirectional and 17 unidirectional fluxes), 5 effluxes, and 10 anabolic fluxes expressed in terms of YXS. For 15 intracellular metabolites defined in the network, 15 flux balances were set up that are linearly independent of each other (Appendix). The stoichiometric matrix S was obtained by symbolic differentiation of the balances with respect to the whole fluxes including YXS. When considering that the flux of glutamate uptake is unity due to the normalization, the network owns 17 degrees of freedom. Accordingly, 12 intracellular fluxes ν2r, ν3r, ν4r, ν5r, ν6r, ν7r, ν8r, ν14, ν14r, ν15r, ν16, ν17r, 4 effluxes, and YXS were recognized as independent variables from the reduced row echelon form SRRE of S. These independent intracellular fluxes were transformed into [0, 1)-fluxes as given by (2). Note that the counterpart of a backward flux of a bidirectional reaction can also be selected as an independent variable. This just depends on how to arrange the entries in the flux vector. For instance, when ν2r is followed by ν2 in the flux vector, ν2 will be recognized as an independent variable instead of ν2r.
Solving the equation system of the flux balances (Appendix) and the [0, 1)-flux equations for the dependent fluxes gives their explicit analytical expressions with respect to design parameters, i.e., νi = nflux(Θ) with 19 design parameters of Θ = (ϕ2r, ϕ3r, ϕ4r, ϕ5r, ϕ6r, ϕ7r, ϕ8r, ϕ14, ϕ14r, ϕ15r, ϕ16, ϕ17r, succinateex, acetateex, α-ketoglutarateex, fumarateex, YXS, pCO2,1, pCO2,2)T. The parameters pCO2,1 and pCO2,2 estimate the CO2 labeling pattern for each 13C labeling experiment: CO2 has only one carbon and is incorporated into other metabolites by carboxylation. Thus, its labeling state can simply be determined from other 13C labeling data as an additional parameter of NLSP in case it is not measured.
For flux re-estimation studies, the default values of the design parameter were Θdefault = (0, 0.6341, 0.4337, 0.3148, 0.5162, 0.3436, 0.5884, 0.7066, 0.0643, 0.6835, 0.7125, 0.6756, 1.164, 0.021, 0.006, 0.02, 0.0859, 0.5601, 0.5601) at α = 1, which results in the flux values given in Figure 2.
For the identifiability studies, the main question was whether Θ has one unique solution set that can be estimated from the available mass isotopomer measurements in terms of the gradient-based optimization. The effluxes (succinateex, acetateex, α-ketoglutarateex, fumarateex) and YXS are excluded for the identifiability analysis because they can be estimated as long as the corresponding measurements are available. To examine various flux scenarios, 200 stochastic simulations were performed for each of the 630 possible 13C-experimental designs. To this end, uniform random numbers between 0 and 0.95 were created for Θ. At each simulation the null space of J(Θ) was calculated.
According to the stochastic simulations, none of the designs had an empty null space and the entry of null(J(Θ)) corresponding to ϕ2r was always non-zero disregarding the flux state and designs. Most designs, including the design considered here, yielded an identical null space of:
Accordingly, the design parameter ϕ2r is not expected to have a unique solution, i.e., it has infinitively many solutions. Except ν2 and ν2r, other fluxes were not functions of ϕ2r. This means that the bidirectional glucose 6-P isomerase cannot be determined from any 13C substrates or dual substrate combinations. Only the net flux of the reaction can be calculated from the stoichiometry. Therefore, one can either regard the glucose 6-P isomerase as a unidirectional reaction or set ϕ2 as an arbitrary constant. This renders the carbon flux network to have an empty null space of J(Θ), i.e., full rank for J(Θ), and all other design parameters are theoretically expected to have unique solutions. This theoretical expectation is examined later using the developed algorithm.
Hybrid optimization with tolerance adjustment
To evaluate the efficiency of the developed optimization algorithm, the flux values calculated in advance were re-estimated numerically from the carbon mass isotopomer distributions (MDVs) of output metabolites matching the default flux values (νdefault). To this end, the default flux values, depicted in Figure 2, were calculated by the flux function Fflux(Θdefault) obtained by parametrization. Subsequently, the 13C labeling data of output metabolites were calculated from νdefault and Θdefault. The default fluxes were re-estimated by solving the constrained NLSP (8) whose inputs (η) were the effluxes, YXS, and the MDVs.
Prior to testing the hybrid optimization with tolerance adjustment (HATA), we examined whether the tolerance adjustment is beneficial for optimization. This was checked by performing the SQP optimization by providing the gradients for the objective function and for the flux inequality constraints (∇c = (-∂νdepend/∂Θ)T) analytically. As shown in Figure 3-A, the objective function value f(Θ*) of each optimization trial decreased with respect to tolerance adjusted at each optimization restart. It was observed that f(Θ*k) <f(Θ*k-1) always holds when starting the kth trial from the (k - 1)th local minimizer Θ*k-1. Restarting the failed (k - 1)th trial from the feasible iterate Θ° recorded for the smallest function value up to the current trial and increasing the tolerance was observed to give the same result, i.e., f(Θ*k) <f(Θ°). The efficiency of tolerance adjustment was further compared to the SQP optimization carried out at a constant tolerance of 1 × 10-20 (Figure 3-B). The SQP using the tolerance adjustment was observed to be more efficient in accuracy but more time-consuming than the case without adjusting. The SQP without adjustment reached a local minimum of around 10-7 much more rapidly but did not result in any further improvement, whereas the SQP with adjustment made slower but continuous progression. This gives an idea that the tolerance adjustment strategy might be useful to escape from possible local stationary regions and to achieve a lower minimum.
Figure 3. Decrease of the objective function at each termination of SQP sub-optimization using tolerance adjustment (A) and its comparison with SQP carried out at a constant tolerance during optimization (B).
Using the tolerance adjustment, the hybrid algorithm consisting of SQP and STRiN (HATA, Figure 1) was compared with its parent algorithms and two global optimization methods. All optimizations except the genetic algorithm (GA) were initiated from an identical starting point for the numerical flux re-estimation. The GA applied does not need an external initial value set. At each initiation of the HATA trials, α was updated by choosing an integer between 1 and 10 that yields the best-conditioned Jacobian matrix of the model as mentioned previously. As shown in Figure 4-A, HATA accomplished the re-estimation with the best efficiency regarding its accuracy and speed. It took about 300 seconds until the objective function became 10-16. The SQP optimization with analytical gradient (SQP ∇user) yielded the next satisfactory result. In comparison, the SQP optimization using the numerical gradient obtained by the finite difference (SQP ∇finite) resulted in a poorer progress and was much slower and less accurate than HATA or SQP ∇user. During the optimization using SQP ∇finite, we observed a discrepancy between the gradient obtained by the finite difference and the analytical approach (A.1). The inaccuracy of the finite difference seemed to cause the poorer result of SQP ∇finite.
Figure 4. Comparison of the hybrid optimization with tolerance adjustment (HATA) with its parent algorithms of the SQP with user-supplied analytical gradient (SQP ∇user) and the STRiN with user-supplied analytical gradient and Hessian (STRiN ∇user Huser) as well as with other algorithms, such as SQP with numerical gradient by finite differentiation (SQP ∇finite), genetic algorithm (GA), and simulated annealing (SA) (A). All algorithms were initiated from an identical starting point. The objective function value at the ith iterate is registered only if f(Θi) ≤ f(Θi-1). Time efficiency of the HATA represented by histogram plot of time taken for termination of 200 runs of optimization using different starting points (B).
The worst algorithm among the local methods was the STRiN using analytical gradient and Hessian (STRiN ∇user Huser). It was rapid at the beginning but improved the objective only from 107 to 105. Afterwards, the optimization crashed with the carbon labeling system becoming completely singular. As previously mentioned, this seems to be due to the STRiN limitation of accepting any direction of negative curvature even if it does not give significant reduction in the objective function. Also, the global optimization methods tested, which were the GA and the SA, were notably time-inefficient compared to the local methods. For 1500 seconds of optimization time, the objective function was very slowly decreased by two orders of magnitude only. SA decreased f(Θ) only to 1 × 103 after 5.1 hours and GA to 2 × 102 after 6.7 hours. Thus, global optimization is expected to demand much greater time to get the equivalent result as HATA.
HATA optimization from different starting points
The flux re-estimation was also performed from 200 different starting points created randomly and by simultaneously generating uniform random numbers for the [0, 1)-flux ϕ2r = ν2r/(α + ν2r). This was to examine whether the quality and speed of convergence has any starting-point-dependency and whether ϕ2r, which a priori expected to have infinite solutions, has any influence on optimization results.
All 200 optimization runs using the hybrid algorithm were terminated successfully. On average, HATA required 5.3 ± 1.6 min to converge to a weighted objective function value smaller than 1 × 10-10, which equals the non-weighted sum of squares less than 2 × 10-18. As shown in Figure 4-A, 99.5 % of the flux re-estimations were completed within 10 min and about 76 % in less than 6 min.
Further to this, the flux re-estimates resulting from the 200 random simulations were compared with the true flux solutions calculated from Θdefault in advance. As shown in Figure 5-A, all fluxes except the flux pairs of transketolase 1 (TK1; ν3, ν3r) and transaldolase (TA; ν4, ν4r) could be re-estimated correctly. When plotting the flux re-estimates versus the true flux values given in Figure 5, the data points except ν3, ν3r, ν4, and ν4r lie exactly on a line with a slope of 1 (1:1 line). In comparison to this, only the net fluxes of the TK1 (ν3 - ν3r) and TA (ν4 - ν4r) reactions could be recalculated properly: the net fluxes could be calculated from stoichiometric relations with other flux re-estimates. The same was observed for the net flux of glucose 6-P isomerase, of which [0, 1)-flux ϕ2r had an arbitrary value at each simulation run.
Figure 5. Flux re-estimation initiated from 200 random starting points. The fluxes which were successfully re-estimated lie on the 1:1 line when plotted against true flux values (A). Correlation analysis by plotting the transaldolase (TA) [0, 1)-flux re-estimates (ϕ4r) versus those of transketolase 1 (TK1) (ϕ3r) (B) and the [0, 1)-flux re-estimate of TK1 (C) as well as of TA (D) versus ϕ2r of glucose 6-P isomerase. The dotted line indicates the solution of ϕ3r = 0.4337 and ϕ3r = 0.6341, respectively.
By the null space investigation (9) based on (7), the design parameters ϕ3r and ϕ4r were not expected to be correlated. In accordance with this, the fluxes could rarely be re-fitted in case one or both of these parameters were set arbitrarily. This indicates that the parameters ϕ3r and ϕ4r have a certain significant correlation that must be fulfilled at termination to yield unique solutions for other fluxes. As expected, the flux estimates of TK1 and those of TA were found to be nonlinearly correlated with each other as depicted in Figure 5-B for ν3r and ν4r. The true solution, (ϕ3r, ϕ4r) = (0.6341, 0.4337) also lies on the curve. At the termination, it is obvious that the output sensitivities of ∂xm/∂ϕ3r and ∂xm/∂ϕ4r are low: the objective function reached a very small value of nearly zero (f( ) < 1 × 10-10) and we get arbitrary but correlated values for ϕ3r and ϕ4r. Due to this, we cannot practically get ϕ3r and ϕ4r estimated correctly and, accordingly, ν3, ν3r, ν4, and ν4r, while the corresponding net fluxes can always be calculated from other correct flux estimates in terms of stoichiometry.
We observed that a unique estimation of all four fluxes would be possible only if the mass isotopomers of sedoheptulose 7-phosphate were additionally measured. It was observed that providing these mass isotopomers renders the output sensitivities of ∂xm/∂ϕ3r and ∂xm/∂ϕ4r much less dependent on starting points. In contrast, the output sensitivities vary strongly and even become zero when the mass isotopomers of sedoheptulose 7-phosphate are not involved in the objective.
Further to this, the fluxes were not found to be influenced by the glucose 6-phosphate isomerase fluxes represented by ϕ2r. Disregarding the random values of ϕ2r, the fluxes except ν3, ν3r, ν4, and ν4r converged to their true values. Also ϕ3r and ϕ 4r did not show any correlation with ϕ2r as depicted in Figure 5-C and Figure 5-D, respectively. Hence, we can conclude that the observed nonlinear correlation between ϕ3r and ϕ4r becomes visible solely from using the different starting points for the optimization. In addition, including ϕ2r as an additional design parameter gives the same result for the fluxes, whereas ϕ2r results in randomly distributed values without any obvious pattern. This shows that the flux identification carried out by the model linearization and null space investigation is appropriate for its practical use.
The tradeoff between robustness and convergence speed is a central issue in numerical optimization . The developed method fulfills both criteria and enables exact metabolic flux estimation. This was accomplished on the basis of parametrizing the metabolic flux network by compactification and hybridizing the merits of two different gradient-based algorithms. Interestingly, HATA, which hybridizes the worst local method with SQP ∇user and utilizes the tolerance adjustment strategy, has proven to be superior to other approaches. This may be by virtue of providing STRiN with a more appropriate starting point advanced by SQP. Thus, because STRiN is computationally inexpensive and rapid in the vicinity of solution, it achieves a fast convergence. Furthermore, the interactive switching between the algorithms prevents STRiN from accepting an unfavorable search direction without significant objective improvement.
Further to this, we observed that the parametrization by the compactification of independent [0, ∞)-fluxes into [0, 1)-variables is advantageous compared to the non-compactified case. When the optimization was performed by selecting the [0, ∞)-fluxes (ν2r, ν3r, ν4r, ν5r, ν6r, ν7r, ν8r, ν14, ν14r, ν15r, ν16, ν17r with upper bounds of 200) directly as design parameters, the convergence took drastically longer (27 ± 45 min) than the cases using the compactified fluxes. Moreover, optimization runs often failed to converge or terminated at suboptimal points. In comparison to this, introducing [0, 1)-fluxes improved both the robustness and speed of convergence as demonstrated above. The same advantage is probably achieved when using the [0, 1)-compactifications such as exchange fluxes or flux partitioning ratios applied elsewhere [9,24,35]. Our compactification approach allows a straight parametrization because independent fluxes can easily be recognized from the stoichiometric matrix and compactified using a single rule of (2). Moreover, it is straightforwardly differentiable when considering that compactified [0, 1)-fluxes are continuous and smooth in the [0, ∞)-flux space and vice versa.
The parameter scaling constant α introduced during the compactification provides a way to implement numerical flux estimation with efficiency. In particular, the parameter scaling constant α is updated during optimization to render the model's Jacobian matrix into better condition. This is advantageous for the gradient-based methods for computing the search direction because it typically involves the inverse of the Jacobian matrix as given by (6). We examined whether different α values affect the efficiency of optimization. The α value was observed to affect the convergence speed but not the accuracy of the optimization. When starting from the same initial guess applied to the optimization trial in Figure 4, the fastest convergence speed was obtained when α is fixed at 10 (about 200 s for f(Θ) < 10-14). The speed efficiency decreased if α is smaller or larger than 10. A trial with α adjustment initiated at α = 1 automatically updated α to 10 after a few sub-optimization trials and took about 300 s to achieve f(Θ) < 10-14. Although it was slower than when α was fixed at 10, the proposed strategy with α adjustment recognized the optimal value. Among the 200 HATA optimization runs with α adjusting between 1 and 10 initialized with α = 1, 112 cases updated α to 10 after a few to several sub-optimization trials while α remained 1 in 78 cases. Moreover, the time taken for the optimization runs using different starting points was not necessarily shorter when α was updated to 10. This indicates that the most adequate α value for the optimization may depend on starting conditions.
We could find unique flux solutions except the fluxes recognized a priori as non-identifiable or those that were a posteriori found to have a significant nonlinear correlation. Such nonlinear parameter correlation can be identified only by the tedious process of executing flux estimation using different sets of starting values a posteriori . The non-correlated [0, 1)-fluxes of which true values could be re-estimated, e.g., ϕ5r and ϕ6r, give a unique minimum of f(Θ), which is obviously achieved disregarding starting points (Figure 6-A). In comparison to this, the nonlinearly correlated [0, 1)-fluxes ϕ3r and ϕ4r result in an infinite number of minima that meet the correlation shown in Figure 5-B depending on starting points (Figure 6-B). Because the objective function value is nearly zero at each minimum (f( ) < 1 × 10-10 with a weighting factor of 108), the system can be considered to have an infinite number of global minima when parameters are nonlinearly correlated. In this case, a global optimization method can be extremely expensive for identifying such unknown parameter correlations due to its time-inefficiency. As a consequence, a fast and reliable numerical method of flux estimation is highly desirable to repeat optimization runs at different starting points.
Figure 6. Behaviors of the objective function in the parameter space of non-correlated ϕ5r and ϕ6r (A) as well as correlated ϕ3r and ϕ4r (B). Two parameters were varied from 0.05 to 0.95 with a step of 0.005 while other parameters were fixed at the default solution Θdefault.
One interest of numerical optimization is whether the global solution can be obtained by the algorithm applied. We observed that the local minimizers from 200 random simulations agreed with each other and also with the true flux values, the global solution. Depending on experimental noise or metabolic state, the problem may contain a few to several local saddle points. In this case, one can consider a further hybridization with a global optimization approach. Global optimization algorithms are inefficient in terms of convergence speed. On the other hand, gradient-based algorithms can converge rapidly but lack a global perspective for non-convex problems. Also in other cases, the combination of global and local search procedures has proven to offer the advantages of both methods while offsetting their disadvantages [17,37-39]. Such hybridization with a global optimization algorithm is useful for non-convex problems containing several local minima and can be done in many ways [17,39,40]. For instance, the local minima obtained from a gradient-based method can be used to update the individuals in the population to prepare the next generation when combining with the GA [17,40,39,41]. Because we could prove the speed and accuracy of the developed optimization method, we consider it very useful for investigating metabolic fluxes and for developing efficient algorithms combined with a global optimization method.
In practice, the quality of the flux fitting can be judged by the χ2-test [42,43] when using noisy measurement data. In our case, the upper critical χ2 limit is 206 at 95 % confidence level at the given degree of freedom of 174 (188 13C labeling data + 5 effluxes - 19 design parameters). Hence, if the weighted objective value (1) is smaller than the critical value of 103 (this is due to the factor 1/2 in the objective function), the fit can be regarded as acceptable. This value can also be used as a termination criterion for the optimization control in Figure 1-E. As shown in Figure 4-A, the value was achieved at 150 s by HATA, at 170 s by SQP with ∇user, and at 750 s of optimization time by SQP with ∇finite. In comparison to this, GA or SA was expected to achieve the acceptable f(Θ) after 7-8 hours or more (by extrapolation of log10 f(Θ)with respect to optimization time). Hence, an efficient Monte Carlo simulation cannot be carried out by the global optimization approaches.
Further, we compared HATA with SQP with ∇user by conducting 100 stochastic simulations under identical conditions. At each simulation run, normal random numbers were generated for the experimental data set (13C mass isotopomer fractions and extracellular fluxes) by assuming a unit standard deviation of 1 × 10-3 and uniform random numbers for the starting points of design parameters. Subsequently, parameter re-estimation was performed in terms of HATA and SQP with ∇user, respectively. Each optimization run was stopped if f(Θ) becomes smaller than the critical χ2 limit of 103 or does not give any further improvement. On average, f(Θ) at termination was 102 ± 26 for HATA and 167 ± 567 for SQP with ∇user, and the time taken for the optimization was 353 ± 216 s for HATA and 1234 ± 1356 s for SQP with ∇ user. The results support that the developed hybrid algorithm using tolerance adjustment will be more efficient and robust when computing metabolic fluxes from noisy experimental data in a real-case application. In contrast, the efficiency of the SQP algorithm seems to be affected by initial conditions (starting points and experimental data set) when considering the large variation in f(Θ) as well as in the computation time. In this regard, HATA will also achieve the most efficient optimization in the practical sense whenever repeated optimization runs are necessary. A further reduction in computation time can be obtained by utilizing a minimal set of isotopomer balances described by Antoniewicz et al. (2007) .
We developed and examined a developed hybrid algorithm for numerical 13C flux estimation using a metabolic network model parametrized using a single compactification rule. We could prove its practical usefulness using the central metabolic network of Bacillus subtilis, which is a prokaryotic model organism and an industrially relevant strain. Based on the parametrization by compactification and model identification, the hybrid algorithm with tolerance adjustment allowed a fast and robust flux computation from the 13C labeling data created from 13C-labeled succinate and glutamate feeding. Such fast optimization was also found to be essential for the a posteriori identification of possible parameter correlations and also for the Monte Carlo approach to obtain statistical qualities for metabolic flux estimates. Therefore, this work represents an important contribution to the quantitative study of metabolic networks in the framework of systems biology.
For the flux re-estimation simulations of the central metabolic network of B. subtilis, the following inputs and outputs were applied.
13C Input Substrates
The choice of input tracer substrates was made by means of the D-optimality criterion . Since parallel experiments using different 13C input labels yield more information as theoretically shown for the respirometric flux analysis utilizing CO2 labeling measurement . Therefore, experimental designs for two parallel cultivations were also considered here. Among the investigated experimental designs (630 possible designs from commercially available 13C glutamate and 13C succinate species; refer to Additional file 1!), a design combining data from a 13C labeling experiment using non-labeled glutamate and [2,3-13C2] succinate with those from an experiment using [1,2-13C2] glutamate and [1,4-13C2] succinate was expected to provide the richest information at a reasonable expense. This design was selected to study flux identifiability and to examine the developed hybrid optimization.
13C Output Metabolites
For simulation studies, we assumed that the mass isotopomers of proteinogenic amino acids such as alanine, valine, serine, threonine, glycine, aspartate, isoleucine, leucine, phenylalanine, tyrosine, arginine, histidine, and glutamate are measurable. In addition, the mass isotopomers of ribose 5-phosphate from RNA hydrolysate and hexose from carbohydrate hydrolysate are also assumed to be measurable. This totally gives a mass isotopomer data size of 188 for the parallel experiment. The measurements of mass isotopomer distributions were assumed to have a unit standard deviation of 1 × 10-4.
All simulations were carried out on a PC equipped with a CPU of 2.4 GHz. The hybrid optimization algorithm depicted in Figure 1 was implemented in MATLAB (The Mathworks Inc., Natick, MA). Numerical optimization was carried out using the Optimization Toolbox (Version 3.0.4) and the Genetic Algorithm and Direct Search Toolbox (Version 2.1) of MATLAB. The symbolic operations required for parametrization and for computing partial derivatives were conducted using the Symbolic Math Toolbox (Version 3.1.4) of MATLAB. During simulations, random numbers were generated using the Statistics Toolbox (Version 5.2) of MATLAB. The random numbers for [0, 1)-fluxes (ϕi) were bounded such that the relative values of the corresponding dependent fluxes (νdepend) lie between 0 and 20.
An example of the hybrid optimization for the problem described in the current work was coded using MATLAB (Hybridoptimizer.m) and supplemented (see Additional file 2). The method can be incorporated with any isotopomeric models [2,3,5,8] that are analytically differentiable.
Format: EXE Size: 183KB Download file
List of abbreviations used
BFGS: Broyden-Fletcher-Goldfarb-Shanno; GA: genetic algorithm; HATA: hybrid algorithm with tolerance adjustment; KEGG: Kyoto Encyclopedia of Genes and Genomes; MDV: mass isotopomer distribution vector; MFA; metabolic flux analysis; NLSP: nonlinear least-squares problem; SA: simulated annealing; SQP: sequential quadratic programming SQP∇user: SQP optimization with analytical gradient; SQP∇finite: SQP optimization using the numerical gradient obtained by finite difference; STRiN: subspace trust-region method based on the interior-reflective Newton method; STRiN∇userHuser: STRiN using analytical gradient and Hessian; TA: transaldolase; TK: transketolase
TH Yang conceived the study, developed the concepts, implemented simulation, and drafted the manuscript. O. Frick prepared the data required for modeling of the stoichiometric and isotopomer network of B. subtilis. E. Heinzle supervised the work and was involved in writing the manuscript. All authors read and approved the final manuscript.
Gradient and Hessian
At each kth iteration of (1), the model function is evaluated at the kth iterate Θk to get the values corresponding to η = (νm, xm)T. This is done by generating a certain flux state by νk = Fflux(Θk) and, subsequently, solving Fcarbon(xk, νk) = 0 for xk at the kth flux state νk. Since is differentiable with respect to Θk, the gradient ∇f can be computed analytically by:
In practice, the constrained problem (1) is formulated as the Lagrangian function, a linear combination of the objective function and the constraints .
Adjustment of tolerance and parameter scaling constant
The tolerance adjustment by the restart in Figure 1-E requires a series of sub-optimization trials. The first optimization trial starts with "sufficiently relaxed" tolerances both placed on parameters and objective function, e.g., of 100. When the kth trial is acceptable (e.g., first order optimality conditions are satisfied to the specified tolerance or changes in parameters or function are smaller than the specified tolerance), the (k + 1)th trial restarts with a 10-fold decreased tolerance, where the kth local minimizer Θ*k is used for the new starting point. The previous trial provides a more adequate starting point for the next, and it can be advantageous for local search methods. This is because the efficiency of local methods depends heavily on the quality of starting points [14,19]. In case the current trial fails (e.g., by exceeding the maximum number of iterations allowed in the estimation process), the tolerance is increased and the next trial starts from the iterate Θ° recorded for the smallest function value up to the current trial in the feasible region (Figure 1-E). The tolerance at which an optimization trial fails is registered. In case optimization trials successively fail, which is rarely the case, the step of tolerance increase is set smaller such as 5-, 10-, 50-, or 100-fold, etc., of the registered value. Correspondingly, the tolerance decrease is also made by this reduced step until the current tolerance matches the registered. This allows finding an appropriate tolerance value for NLSP.
At the beginning and at every failed trial, α is re-adjusted by referencing the condition number of the model's Jacobian matrix. When the new starting point for the kth trial is Θ° that is isolated from the previous (k - n)th trial with αk-n, the [0, 1)-fluxes have to be rescaled in accordance with the new scaling constant αk. Since ν = αk-nϕk-n/(1 - ϕk-n) from (2), substituting ν in ϕk = ν/(αk + ν) results in:
Deterministic Method of Feasible Starting Point Generation
When the random generation of starting points (see Figure 1-A!) is difficult, the feasible starting points can be found deterministically by minimizing a suitable objective. For instance, one can minimize the following objective:
Here, Nneg.flux signifies the number of negative-valued fluxes, bflux the steady-state stoichiometric flux balances set up for metabolites (see the next section!) evaluated at the current iterate Θk, and n the number of the balances. By the first term in the objective and the first constraint, the minimization is directed such that all fluxes become non-negative. The second term is to prevent flux values calculated that may not obey stoichiometry due to the limited floating-point accuracy (round-off error). This approach is useful for preparing initial values while preserving random nature for Monte Carlo simulations. In practice, the upper bound of ϕi can be set to a value smaller than 1 that gives sufficiently large flux values so that ϕi has a closed interval.
Stationary State Stoichiometric Balances are set up around 15 metabolites specified in Figure 2 as follows. The values multiplied with YXS equals the strain specific precursor demand for growth (mmol precursor per gram biomass).
glucose 6-P: ν2 - (ν1 + ν2r + 0.4217YXS) = 0
fructose 6-P: (ν2r + ν4 + ν5 + ν6) - (ν2 + ν4r + ν5r + ν6r) = 0
glyceraldehyde 3-P: (ν3 + ν4r + ν5 + 2ν6r + ν7) - (ν3r + ν4 + ν5r + 2ν6 + ν7r +0.4659YXS) = 0
3-phosphoglycerate: (ν7r + ν8) - (ν7 + ν8r + 0.2209YXS) = 0
phosphoenolpyruvate: (ν16 + ν8r) - (ν8 + ν9 + 1.7854YXS) = 0
pyruvate: (ν9 + ν15) - (ν10 + ν15r + 0.9964YXS) = 0
pentose 5-P: (ν1 + 2ν3r + ν5r) - (2ν3 + ν5 + 1.1622YXS) = 0
erythrose 4-P: (ν4 + ν5r) - (ν4r + ν5 + 0.4659YXS) = 0
sedoheptulose 7-P: (ν3 + ν4r) - (ν3r + ν4) = 0
acetyl-CoA: ν10 - (ν11 + acetateex + 3.1585YXS) = 0
isocitrate: ν11 - ν12 = 0
α-ketoglutarate: (ν12 + ν17) - (ν13 + ν17r + α-ketoglutarateex) = 0
glutamate: (ν17r + glutamateex) - (ν17 + 1.4769YXS) = 0
succinate: (ν13 + ν14r + succinateex) - (ν14 + fumarateex) = 0
oxaloacetate: (ν14 + ν15r) - (ν11 + ν14r + ν15 + ν16 + 1.3131YXS) = 0
O. Frick thanks DFG (Deutsche Forschungsgemeinschaft), project He 3092/6 for the support of his Ph.D. We wish to thank Andrew Marsh at the University of Louisville's James Graham Brown Cancer Center for editing.
J Biotechnol 1999, 71(1-3):175-189. PubMed Abstract
Wiechert W: Algebraic methods for the analysis of redundancy and identifiability in metabolic 13C labelling systems. In Bioinformatics: From nucleic acids and proteins to cell metabolism. Edited by Lessel U. Weinheim , Verlag Chemie; 1995:169-184.
Ground water 2003, 41(4):420-430. PubMed Abstract
Antoniewicz MR, Kraynie DF, Laffend LA, Gonzalez-Lergier J, Kelleher JK, Stephanopoulos G: Metabolic flux analysis in a nonstationary system: fed-batch fermentation of a high yielding strain of E. coli producing 1,3-propanediol.