Abstract
Background
The ability to perform quantitative studies using isotope tracers and metabolic flux analysis (MFA) is critical for detecting pathway bottlenecks and elucidating network regulation in biological systems, especially those that have been engineered to alter their native metabolic capacities. Mathematically, MFA models are traditionally formulated using separate state variables for reaction fluxes and isotopomer abundances. Analysis of isotope labeling experiments using this set of variables results in a nonconvex optimization problem that suffers from both implementation complexity and convergence problems.
Results
This article addresses the mathematical and computational formulation of ^{13}C MFA models using a new set of variables referred to as fluxomers. These composite variables combine both fluxes and isotopomer abundances, which results in a simplyposed formulation and an improved error model that is insensitive to isotopomer measurement normalization. A powerful fluxomer iterative algorithm (FIA) is developed and applied to solve the MFA optimization problem. For moderatesized networks, the algorithm is shown to outperform the commonly used 13CFLUX cumomerbased algorithm and the more recently introduced OpenFLUX software that relies upon an elementary metabolite unit (EMU) network decomposition, both in terms of convergence time and output variability.
Conclusions
Substantial improvements in convergence time and statistical quality of results can be achieved by applying fluxomer variables and the FIA algorithm to compute bestfit solutions to MFA models. We expect that the fluxomer formulation will provide a more suitable basis for future algorithms that analyze very large scale networks and design optimal isotope labeling experiments.
Background
Metabolic Pathway Analysis
Metabolism is the complete set of chemical reactions taking place in living cells. These chemical processes form the basis of all life, allowing cells to grow, reproduce, maintain their structure and respond to environmental changes. Metabolic reactions are divided into groups called metabolic pathways, which are typically constructed heuristically according to their connectivity and presumed function [1]. Each metabolic pathway is characterized by a set of chemical reactions that transform substrates into end products while generating intermediate byproducts. Due to its importance in medicine and biotechnology, metabolic pathway research has become a highly active field of investigation [2].
Initially, the structure of metabolic pathways was examined by identifying their intermediate compounds. Subsequently, the various biochemical reactions connecting these compounds were mapped. Due to the success of this research, the topological structure of many metabolic pathways is nowadays fully documented [3]. The next step in the progression of metabolic pathway research involves quantification of the rates of these various chemical reactions, also known as "fluxes". The values of these rates are affected by various environmental conditions and can change rapidly in response to perturbations. Nevertheless, if the environmental parameters are held fixed and stable, the network can attain a steady state in which the concentrations of all network metabolites are assumed constant over time. This, of course, implies that the rates of their input and output reactions must balance. The latter imposes a set of linear constraints on the metabolic fluxes, known as "stoichiometric balance equations" [4]. Unfortunately, since the number of unknown fluxes typically exceeds the number of independent stoichiometric balances, these constraints are insufficient to completely identify the metabolic network. In order to overcome this lack of information, additional constraints must be provided to the stoichiometric mathematical model to estimate the values of the network fluxes [5].
^{13}C Isotope Labeling Experiments
Various experimental techniques have been developed to enable measurement of intracellular metabolic fluxes, either directly or indirectly. One of these approaches makes use of isotope labeling experiments. In this method, the metabolic system is administered a known amount of an isotopically labeled substrate (such as glucose labeled with ^{13}C at specific atom positions). By measuring the resulting labeling patterns of intracellular metabolites after steady state has been achieved, additional flux information is obtained.
One major drawback of this experimental approach is the high complexity and computational intensity of the metabolic flux analysis (MFA) required to interpret these labeling measurements. In their series of articles, Wiechert et al. [69] constructed a systematic approach for performing this analysis. They show that measurements of the relative abundance of various isotope isomers, also known as "isotopomers", contain enough information to fully identify the metabolic fluxes of the network. Formulating the problem using isotopomer variables (or a transformed set of isotopomer variables referred to as "cumomers"), Wiechert et al. posed the flux estimation problem as a nonconvex leastsquares minimization, assuming random error is added to their isotopomer measurements. The resulting highdimensional nonconvex problem suffers from various drawbacks, such as slow convergence and notable probability of attaining local minima. Several optimization algorithms have been developed in order to address these drawbacks. Early approaches used iterative parameterfitting algorithms [8], evolutionary algorithms [10] and simulated annealing [11]. Furthermore, several investigations have been conducted in order to assess the accuracy of these results [9,12,13]. Recently, a novel method to decompose the metabolic network into Elementary Metabolite Units (EMUs) was introduced [14] and implemented into the OpenFLUX software package [15]. This decomposition effectively reduces the size of the optimization problem by efficiently simulating only those isotopomers that contribute to the measurement residuals. Nevertheless, all of these algorithms suffer from several major drawbacks due to the standard isotopomerflux variables used in formulating the optimization problem:
• Presence of unstable local minima: due to the nonconvex nature of the objective function.
• Complex mathematical representation and computational implementation. This results in the need for adhoc algorithms and mathematical analysis, and long running times are required for reliable convergence.
The OpenFLUX implementation, for example, may require several dozens of convergence iterations with various initial values in order to achieve acceptable probability of obtaining the optimal set of fluxes in any one of its attempts. In addition, due to the chosen objective function, it is also commonly required to estimate scaling factors for each isotopomer measurement, because of the fact that the available experimental techniques are only capable of measuring isotopomer fractions up to a proportional scaling factor (see Mollney et al. [9] for further details).
Our Contribution
This article introduces a new set of variables for simulating ^{13}C isotope labeling experiments. The main idea underlying this reformulation is that, instead of treating fluxes and isotopomer variables separately, we identify a set of "isotopically labeled fluxes" as our state variables of interest. We refer to these variables as fluxomers. Fluxomers combine flux variables with isotopomer variables and consequently reduce the complexity and nonlinearity of the original isotopomer balance equations. In this article, we show that by reformulating the flux estimation problem in terms of fluxomer variables, it is possible to construct an algorithm that has the following key benefits:
• Provides efficient computation of all isotopomers in a metabolic pathway
• Is robust to measurement noise (i.e., suppresses the effects of measurement errors) and initial conditions
• Eliminates the need for measurement scaling factor estimation
• Poses the problem using simple mathematical expressions, allowing the use of generic optimization algorithms
The rest of the article is constructed as follows. The Results and Discussion section illustrates the advantage of our approach via simulation results comparing fluxomer variables to the commonly used cumomer approach and the more recently introduced EMU approach. The Methods section presents the detailed formulation of the fluxomers optimization problem and the fluxomers iterative algorithm (FIA) that provides a reliable and efficient method for solving it. All source code and executables for our algorithms are freely available at the author's website [16].
Results and Discussion
We compared our FIA algorithm to the widely used MFA software 13CFLUX [17], which relies on the cumomer approach, and to the more recent OpenFLUX [15] software, which is based on the EMU [14] approach. In order to compare the methods, we conducted flux estimations for various wellstudied metabolic pathways. Our first example is based upon the tutorial which Wiechert et al. provide with their 13CFLUX software: the EmbdenMeyerhof and Pentose Phosphate metabolic pathways of Escherichia coli [17]. This example compares the running time and robustness of both algorithms in response to input noise. Our second example compares the results and performance of FIA to both an adhoc method and the OpenFLUX algorithm for the analysis of lysine production by C. glutamicum, as described by Becker et al. [18] and Quek et al. [15].
FIA vs. 13CFLUX Comparison: EmbdenMeyerhof and Pentose Phosphate Pathways
In this section we examine a network representing the EmbdenMeyerhof and Pentose Phosphate pathways of E. coli, which is based upon the tutorial supplied by Wiechert et al. as part of their 13CFLUX software package. Since our FIA implementation natively supports 13CFLUX input files (i.e. "FTBL" files), the same input files can be used for both algorithms. (Note, however, that FIA does not require definition of free fluxes nor initial values, and thus these are simply ignored when imported). Figure 1 shows the simple network used along with the nomenclature used in previous publications. In addition to the network structure, the models are provided with flux and isotopic measurements as shown in Table 1.
Figure 1. E.Coli EMP and PPP Metabolic Pathways. The EmbdenMeyerhof and Pentose Phosphate metabolic pathways of Escherichia coli.
Table 1. EMP & PPP simulation data
First, we examined the output of the two algorithms for the traditional "noiseless" input file. In order to run the analysis, 13CFLUX requires the user to define a set of "free fluxes" along with their associated initial values [7]. Note that a bad choice of free fluxes or their associated values can result in poor algorithmic performance (both in computation time and accuracy). In fact, under various initial guesses the algorithm did not converge at all. As for FIA, none of the above is required. Since the network along with the given measurements are well defined, in the noiseless case the two algorithms returned similar values for unidirectional fluxes, as can be seen in Table 2. Some slight disagreements were observed for the bidirectional fluxes, which are more poorly identified.
Table 2. Comparison of FIA with 13CFLUX for the simple E.coli metabolic network
We next compared the algorithms' sensitivities to noise. In a series of 10 experiments, white Gaussian noise was added to all of the measured isotopomer values, and the outputs and computation times for both algorithms were recorded. As can be seen in Figure 2, unidirectional fluxes remain quite constant and hardly suffer from the added experimental error (for both algorithms). However, the bidirectional fluxes are affected by the added noise. 13CFLUX suffers from a higher variance spread of the estimated values than FIA (thus is more sensitive to the added measurement noise). Note that the difference arises not only due to the mathematical model used, but also due to the stability properties of the optimization method chosen.
Figure 2. Measured fluxes values. Bidirectional fluxes calculated using FIA and 13CFLUX for noisy measurement set.
We next examined the computational performance of the two methods. Table 3 shows the algorithm running time for convergence (in seconds). The average running time for 13CFLUX was 133 seconds, while for FIA this time was 7 seconds. The running time ratio (13CFLUX/FIA) for individual experiments varied between ×9 to ×75.
Table 3. Algorithm running time comparison for FIA vs. 13CFLUX
FIA vs. OpenFLUX Comparison: Lysine Production by C. glutamicum
In this section we examine the analysis of the central metabolism of two lysineoverproducing strains of Corynebacterium glutamicum: ATCC 13032 (lysC^{fbr}) and its P_{EFTU}fbp mutant. Both express feedbackresistant isoforms of the aspartokinase enzyme lysC, while the latter is additionally engineered to overexpress the glycolytic enzyme fructose1,6bisphosphatase. The example is based upon the measurements provided by Becker et al. [18], who implemented an adhoc program to estimate the values of various metabolic fluxes. In their more recent article introducing the OpenFLUX software package [15], Quek et al. chose to compare their results to those of Becker et al. Therefore, we will expand upon their comparison using our FIA implementation. The input file for FIA was constructed using the measurements and pathway structure given in [18] and [15]. As described in [15], the published mass isotopomer fractions were modified for mass interference from noncarbon backbone isotopes using the molecular formula of the amino acid fragments. FIA supports automatic generation of the naturally occurring isotopes correction matrix when the measured molecular formulas are supplied. This adjusts the measured fluxomers vector appearing in the objective function during the process of optimization. If necessary, it is possible not to use this feature but instead to directly supply the algorithm with the corrected measurement values.
When comparing the running times of FIA with OpenFLUX, the different algorithmic approaches of the two must be kept in mind. While OpenFLUX requires the user to supply it with sets of free fluxes, FIA requires no free fluxes nor initial values. OpenFLUX rapidly evaluates dozens of different optimization cycles with random initial values and seeks the best fitting result among them, while FIA uses only one single (longer) run. As such, the convergence probability of OpenFLUX depends on the number of attempts and random values generated during its operation, while the FIA results do not depend on any random value. Furthermore, in its analysis, EMU based algorithms evaluate only the fluxes necessary for measurement comparison, and thus their running time depends both on the metabolic network structure and the amount and location of the given measurements. FIA, on the other hand, can supply the entire set of metabolic fluxes at any given time, with no additional computation requirement (which depends mainly on the network structure).
Measured fluxes as constants
First, we ran the exact same simulation as Quek et al. performed in their article. They supply very accurate (mean error in the order of 0.15 mol%) values for the label measurements, and used the given measured fluxes as if they were noiseless measurements (thus as constants). We start by comparing the simulation time for this simple case. According to [15] and as validated by us using our computer, OpenFLUX required 50 iterations of about 16 seconds each in order to find a decent minimal point, hence about 800 seconds in total. While so, the FIA analysis took 60 seconds for initial analysis and matrices creation, and 300 further seconds for convergence, thus 360 seconds as a whole. Regarding the simulation results, as one can see in Table 4 and Table 5 the fluxes are very close to those calculated before, and the estimated fluxes FIA returned had the lowest residual value compared to the other methods.
Table 4. Relative mass isotopomer fractions comparison for wildtype and mutant C. glutamicum
Table 5. Metabolic fluxes comparison for wildtype and mutant C. glutamicum
Measured fluxes as measurements
We can also run the same optimization, but weight the given flux measurements by their variances. When running this optimization using OpenFLUX (again using 50 iterations), the amount of time was greatly increased, and ended in around 48 minutes. For FIA, on the other hand, the running time was the same as before, thus about 6 minutes. Comparing the results of the algorithms, OpenFLUX suffered from severe convergence problems. Most of its iterations ended without converging at all, while those that did converge yielded useless results, far from the measurements. FIA, on the other hand, succeeded in converging for all scenarios. For the wildtype lysine producing pathway, the results were very close to the ones before (since the fluxes and measurements were quite accurate). For the mutant example, which was less accurate, a reduction of the residual value was achieved by small changes to the measured fluxes. fluxes and residual values can be examined in Table 4 and Table 5.
Using nonnormalized MS measurements
We now show that FIA can easily use incomplete or nonnormalzied measurements by examining its performance in the example above. The supplied MS measurements were normalized to the n +1 backbone carbon atoms of the measured metabolites. Instead of using the supplied normalized data, we multiply each set of metabolite measurements by a random constant number. By doing so, we simulate the case in which only the first 3 (2 for GLY) MS peaks were measured, and had not been normalized. The original and supplied nonnormalized measurement values can be found in Table 4. Note that the values were corrected by the molecular formulas of the measured fragments (again, can be automatically performed by FIA). In the absence of normalized data, FIA gave estimated fluxes very close to the previous cases, with very low residual values, as can be seen in Table 5. The running time of the algorithm was not affected by the change.
Conclusions
The main contribution of this article is the introduction of fluxomersa new set of state variables used to simulate ^{13}C metabolic tracer experiments. The fluxomers approach allows the central optimization problem of MFA to be reformulated as a sequence of quadratic programs, which form the basis of the fluxomers iterative algorithm (FIA). Both fluxomers and FIA result in several important benefits compared to fluxisotopomer variables. Among these advantages are (i) a reduction in algorithm running time required for simulation of isotopomer distributions and metabolic flux estimation, (ii) reduced sensitivity to measurement noise and initial flux values and (iii) availability of complete isotopomer information for a given network (as opposed to the EMU approach, which only supplies partial information) without the need for user specification of free fluxes or initial flux values. Additionally, the error model used by the FIA algorithm has the advantage that it depends solely upon isotopomer ratios rather than complete isotopomer fractions, and therefore it eliminates the need to estimate a normalization factor for each measured isotopomer distribution. Our current results show significant improvements even with regards to simplistic tracer experiments (the running times have been improved by an order of ×3 to ×20 compared to the 13CFLUX algorithm, and about ×2 to ×8 compared to the OpenFLUX implementation). It is important to note that the total time required to obtain an MFA solution is controlled both by (i) the time of each iteration and (ii) the number of optimization iterations that are required to achieve a reliable solution. While a single OpenFLUX iteration is certainly faster than a single iteration of FIA, the FIA algorithm was expressly constructed to provide high reliability in achieving the optimal solution. Therefore, FIA was able to consistently find a better optimal solution in less total time in comparison to the other algorithms examined. Furthermore, extending the fluxomers formulation to other global optimization techniques is straightforward. We expect that reformulating more sophisticated MFA problemsfor example, involving optimal experimental design or largescale metabolic networksin terms of fluxomer variables will lead to dramatic enhancements of algorithmic efficiency and robustness.
Methods
In the following, we show how to construct and solve MFA problems using fluxomer variables. First we define and explain the basic properties of fluxomers. Then we show how to express MFA balance equations and measurements in terms of fluxomers. Finally, we formulate the MFA optimization problem and present the FIA algorithm for solving it. Throughout this section we use boldface uppercase letters A to denote matrices, lowercase boldface letters x to denote vectors, and lowercase letters u for scalars. We use the < ○ > product z = x○y to represent the elementwise product vector, i.e. z_{i }= x_{i}y_{i}. The model formulation will be illustrated using the simple metabolic network shown in Figure 3.
Figure 3. Simple metabolic network. (a) Standard network representation. Carbon atoms are drawn explicitly with arrows to indicate atom transitions. Unidirectional arrows represent unidirectional fluxes while bidirectional fluxes (such as flux 5) are represented by bidirectional arrows. (b) Fluxomers representation. Each arrow is a group of fluxomers. X's appear on the appropriate atom positions to indicate summation of divergent fluxomers.
Fluxomers overview
Traditional MFA approaches construct distinct variables for each flux and for each possible labeling state (isotopomer) associated with all metabolites in the network. Fluxomers, on the other hand, are a composite of these two and therefore allow the network state to be described using only one variable type.
Definition 1 (Fluxomer) A fluxomer is the rate that a metabolic reaction transfers labeling from one or more specific substrate isotopomers into product isotopomers.
Taking each fluxomer to be a transformation from one set of labeled atoms into others, we can write its labeling state as an array of binary elements representing the state of each atom it consumes (0 representing an unlabeled atom and 1 representing a labeled atom). Thus, f_{i}(1001) is a fluxomer of reaction i consuming 4 atoms, with its first and last atoms labeled and two middle atoms unlabeled. When using x as an index for one (or more) of the atoms, we denote a sum of fluxomers where the indicated atom can be either labeled or unlabeled (e.g., f_{i}(1x01) is the sum of f_{i}(1001) and f_{i}(1101)). See Figure 3b for a detailed example.
Traditional metabolic fluxes and isotopomer variables can be easily expressed using fluxomers. We start with metabolic fluxes, which are just a sum of their associated fluxomers. For the simple network in Figure 3b we have:
We can also express isotopomer abundances in terms of fluxomer variables for the same example. Because of the assumption that enzymes do not differentiate between the various isotopomers of a given metabolite, the isotopomers within each metabolite pool are distributed uniformly across the outgoing fluxes emanating from that pool. Therefore, the fractional abundance of a given isotopomer within a metabolite pool will determine the fractional contribution of its corresponding fluxomers to the fluxes leaving that pool:
Fluxomer balance equations
We now examine the fluxomer balance equations that describe how fluxomers are propagated through the metabolic network. These balance equations represent the main mathematical device for calculating steadystate fluxomer values for a given network. For ease of notation, let us define the vector of metabolic fluxes in our system by and the vector of fluxomers as . As shown above, the metabolic fluxes are calculated from a linear transformation of the fluxomers. Denoting this linear transformation matrix as U, we can write u = Ux. We now assume that we are given a certain u vector and wish to calculate the fluxomers in our system. We start by considering balances on "simple fluxomers", i.e. those that originate exclusively from a single metabolite pool. (An example of a simple fluxomer is f_{5}(01) in Figure 3, which derives solely from pool B.) Under conditions of metabolic and isotopic steady state, the rate of 01labeled molecules entering pool B must balance the rate that 01labeled molecules leave that pool. Therefore, we can construct a balance on fluxomers around pool B as
However, according to eq. 2 the lefthand side of this equation can be reexpressed as . Substituting this latter result into the flux balance equation and solving for the fluxomer f_{5}(01) yields
where g(u) is a function of u alone, and h is a constant vector. Thus, for this simple case we can solve for the outgoing fluxomer f_{5}(01) directly in terms of the fluxomers entering pool B and the total fluxes f_{2 }and f_{5 }leaving pool B.
We now turn to the more complex situation in which the output fluxomer originates from more than one metabolic pool. For example, consider fluxomer f_{4}(0001) coming from pools C and D. Here, the fraction of 0001labeling carried by flux f_{4 }is proportional to the abundance of 01labeling in C and 00labeling in D:
As before, the outgoing fluxomer f_{4}(0001) can be expressed solely in terms of ga pure function of u (always a rational function of outgoing fluxes)and a product of linear projections of x.
Without loss of generality, we restrict ourselves to fluxes coming from at most 2 metabolic pools (referred to subsequently as the "left" and "right" pools). When the system reaches steady state, we have
where g is a function , and (H_{1}, H_{2}) are two m × m matrices. This equation allows for the output fluxomers emanating from a specific metabolite pool to be expressed in terms of the total flux vector u and the fluxomers entering the pool. This enables each outgoing fluxomer to be solved "locally" for the incoming fluxomers. Note that this local calculation does not involve any matrix inversions or other expensive computational procedures. If there are no recycle loops in the network so that all possible paths through the network are nonselfintersecting, this equation can be used to solve sequentially for all "downstream" fluxomers in terms of previously calculated "upstream" fluxomers. In the presence of recycle loops an iterative approach can be constructed to solve for the fluxomers while still avoiding repeated matrix inversions.
Constructing the system matrices
The matrices H_{1}, are defined by
In matrix form,
Isotopomer measurement formulation
In the following, we develop a systematic method for expressing measured isotopomer variables using fluxomer notation. The final result of the analysis shows that isotopomer measurements can be written simply as the norm of a linear transformation of fluxomers, thus Err ~ Ax^{2}. First, we briefly summarize the available isotopomer measurements provided by Nuclear Magnetic Resonance (NMR) and Mass Spectrometry (MS) methods. We then discuss the mathematical modeling of these measurements using fluxomer variables.
Available isotopomer measurements
MFA experiments are typically carried out by (i) introducing a labeled substrate into a cell culture at metabolic steady state, (ii) allowing the system to reach an isotopic steady state, and (iii) measuring isotopomer abundances of metabolic intermediates and byproducts using either MS or NMR analysis. These two measurement techniques provide qualitatively different information about isotopic labeling.
• ^{1}H NMR: Measures the fractional ^{13}C enrichment of each protonbound carbon atom, irrespective of the labeling of its neighboring carbon atoms. Both ^{12}C and ^{13}C atoms are distinguishable in the same spectrum, and therefore the peak areas corresponding to different carbon isotopes can be normalized directly.
• ^{13}C NMR: Quantifies isotopomers based on the presence of multiplet peaks (e.g., doublets, triplets, doublet doublets, etc.) in the spectrum caused by two or more neighboring ^{13}C atoms. Because ^{12}C atoms are undetectable by ^{13}C NMR, it is impossible to quantify the overall fraction of each isotopomer unless ^{1}H NMR spectra are simultaneously obtained. Instead, only the relative ratio of different isotopomers can be assessed by ^{13}C NMR.
• MS: This technique is usually preceded by some form of chromatographic separation (GC or LC) to resolve mixtures into their individual components. These components are then ionized and fragmented in the MS ion source. The ionized particles are separated according to their masses by an electromagnetic filter, and a detector measures the relative abundance of each mass isotopomer. These abundances can be normalized to a fractional scale if all MS peaks corresponding to a particular ion are simultaneously measured.
Previous studies based on fluxisotopomer variables have modeled the measurement error as Gaussian noise added to the fractional isotopomer enrichments. Therefore, if is the vector of measured isotopomer fractions, this model states that , where e is the Gaussian error term. However, a more accurate error model would add the measurement noise directly to the physically measured values. The motivation for the traditionally chosen error model is its relative simplicity when expressed using fluxisotopomer variables. Furthermore, since some isotopomers of a specific metabolite may be unmeasurable, the isotopomer fractions cannot be experimentally determined in many cases. This implies the need for an alternative error model that avoids these shortcomings.
Measurement Error Model
We denote the measured isotopomer abundances by a vector . For NMR analysis, the elements of are proportional to the areas under the different spectral peaks. For MS, they are proportional to the integrated ion counts associated with each mass isotopomer. Since is the measured quantity, the correct error model is an addition of Gaussian noise so that , where m is the "true" measurement value. The measured isotopomer fractions are then expressed as
Let ε_{j }represent the residual between the modelpredicted and experimentally measured abundance of a single isotopomer. After multiplying eq. 11 by and rearranging, the residual expression becomes
where ε_{j }is a sum of Gaussian variables. Noting that each measurement m_{j }is simply proportional to a linear combination of fluxomers, the residual expression eq. 12 takes the form
where T and V are transformation matrices needed to convert fluxomers to isotopomer measurements and the diag operator converts its vector argument into a diagonal matrix. The resulting expression is both a simple sum of Gaussian vectors and affine in x.
The advantage of this objective function is that it only depends upon the relative isotopomer intensities in the vector but does not depend upon how these intensities have been normalized (as long as the transformation matrix T is constructed accordingly). This eliminates the need to estimate optimal normalization factors that are required by previous algorithms in order to convert experimental measurements into isotopomer fractions. This is true for both MS and NMR measurements, either when conducted alone or used together in the same experiment.
The MFA optimization problem using fluxomers
Now that we have defined both the isotopomer measurements and the feasible solution set, we can formulate the leastsquares MFA optimization problem in terms of fluxomer variables. Our objective is to find the flux vector u that minimizes the measurement error. In addition to the fluxomer balances, usually upper bounds u_{ub }are provided for all fluxes. As has been proven by Wiechert et al. [69], once the inputs to the system and u are set, the solution (x, u) is unique. In other words, the steadystate fluxomer balance equation, eq. 6, is actually an implicit definition of x(u). With this in mind, the MFA optimization problem can be simply defined as
with
where A selects the measured elements of the fluxomers vector x(u), b contains their associated values, and S is the stoichiometric matrix of the reaction network. Note that b may contain nonzero elements only when associated with measurements of absolute flux values. For isotopomer measurements, the associated elements of b are zero.
Eq. 14 can be solved using various nonconvex global optimizing techniques. These optimizers typically require the user to provide subroutines for computing the value of the objective function and its first derivatives at various points along their convergence path. Furthermore, evaluation of the function x(u) and its derivatives are the main (practically only) timeconsuming procedures when solving the optimization problem in eq. 14. The mathematical formulation of eq. 14 is similar to the optimization problem resulting when using the labels and fluxes variables, with one exception  the implicit formula for x(u). As shown above, using fluxomers we are able to formulate the propagation equation (and thus solving x(u)) as a multiplication of homogeneous functions of fluxes, and second order functions of fluxomers. Using labels and fluxes, formulating the same equation results in a sum of functions of the same structure, and the homogeneous separation property vanishes. The following sections exploit this unique property of the fluxomers propagation equation in order to achieve great reduction in the system computational complexity, leading to the FIA algorithm.
Fluxomers Iterative Algorithm (FIA)
This section deals with the evaluation of x(u) along with its gradient using the fluxomer formulation. First, we show that x(u) can be calculated iteratively while avoiding repeated matrix inversions. Then, we demonstrate how the number of iterations can be reduced using a Newtontype gradientbased algorithm. Finally, we explain how it is possible to greatly increase the sparsity of the system using a simple linear transformation of variables, which further reduces the number of iterations needed for convergence.
Solving the fluxomer balance equations
A simple approach for computing x given u is to imitate nature. Once a metabolic network reaches steady state (namely, when u is constant), changing its input labeling does not affect its flux values u, but only influences the labeling of its intermediate metabolite pools. The metabolite labeling patterns become gradually mixed and propagated throughout the network until isotopic equilibrium is reached. Accordingly, a simple approach for solving eq. 6 is by using its iterative representation (which is similar to the process taking place in nature):
where x_{t }is the fluxomer vector at iteration t and x_{t+1 }is the fluxomer vector at iteration t + 1. In order to simulate the steadystate labeling, we initialize the system with the vector x_{0 }in which only the input fluxomers are labeled and all others are unlabeled. By recursively substituting x back into the equation, steady state is eventually reached and the final value of x is obtained. (This equation represents a nonlinear timeinvariant Markov chain.) For the EmbdenMeyerhof and Pentose Phosphate Pathway example in the Results and Discussion section, it takes a few hundred iterations to achieve complete stability of the solution (maximal fluxomer value change on the order of 1e17). Algorithm convergence for a given input vector is retrieved exactly as in the real biological system, and thus a unique solution always exists (for realistic metabolic networks).
We now show it is possible to reach pathway convergence in much fewer iterations. First, we write eq. 6 as
Now, in order to find the values of (x, u) one needs to solve F(x, u) = 0 while holding u constant. We choose to use one of the classic and powerful algorithms for finding roots of an equation, the well known NewtonRaphson [1921] method. Roughly speaking, the change of the x vector at each iteration is calculated by
with . The main concern now is the evaluation of the expression . Here, it turns out that due to the decomposable nature of F(x, u), the derivative at a point (x, u) is the simple matrix
Therefore, finding is equivalent to solving the linear system of equations
In order to determine the root of the propagation equation, FIA starts with an iteration or two using Newton's correction and then continues with the simple "natural" approach. Applying this method to the EmbdenMeyerhof and Pentose Phosphate Pathway example in the Results and Discussion section, only a few dozen iterations are now needed. In the next section we show how to reduce both the number of variables and the number of iterations required for convergence by another order of magnitude, without affecting system convergence stability.
Reducing system complexity
The following section introduces a mathematical approach for reducing the number of nonzero elements in our system. Variable reduction techniques such as the recently developed Elementary Metabolite Unit (EMU) network decomposition [14] were developed for application to systems that are modeled using fluxisotopomer variables. Fluxomers and the FIA algorithm, as opposed to prior approaches, allow us to effectively reduce the number of system variables using a simple linear transformation on x. Our main goal here is to find a transformation for the fluxomer vector x, y = Kx that:
• Reduces the number of its nonzero elements.
• Reduces the computational complexity of solving eq. 16.
• Eases the evaluation of eq. 15.
From eq. 16 we see that the greatest expense is due to inversion of a sum of two linear transformations (H_{1 }and H_{2}) of x. From the iterative propagation equation, eq. 15, we see that x is iteratively calculated by computing the product of the same two matrices. Had it been possible to find a sparse, closetodiagonal representation for both H_{1 }and H_{2 }by simply multiplying them by the matrix from the right, both problems would be solved.
In order to acomplish the above, we examine the properties of the concatenation of these two matrices which we denote by H. Next we find the LU factorization of H,
with L_{H }lower triangular and U_{H }upper triangular matrices. The matrix L_{H1 }contains the first m rows of L_{H }and L_{H2 }contains the last m rows of L_{H}. Our new set of variables now becomes y = U_{H}x, and the new propagation equation is
When expressed in terms of the variable y, our system becomes much more sparse. This is illustrated in Figure 4 which shows H_{1}, H_{2}, L_{H1}, L_{H2 }and U_{H }for the EmbdenMeyerhof and Pentose Phosphate Pathway example. The transformation has two essential benefits:
Figure 4. System matrices complexity reduction. H_{1}, H_{2}, L_{H1}, L_{H2 }and U_{H }for the simple E. coli example. A substantial reduction in nonzero elements between the H and L matrices can clearly be seen.
1. Reduced computational complexity  note that the derivative depends upon the matrices H_{1 }and H_{2 }which have already been factored, and thus solving Newton's step is straightforward.
2. Fewer iterations needed for convergence.
As a matter of fact, this transformation reduced the number of iterations needed for convergence of the simple E. coli example to a total of 5.
As discussed above, our optimization problem seeks the minimum of Ax(u)  b^{2}. In order to converge rapidly, the gradient of the objective function must be computed at each iteration of the algorithm. The key step for computing it is the evaluation of (the derivative of the fluxomers as a function of the metabolic fluxes). Since we have an implicit function F(x, u) along with a valid solution for F(x, u) = 0, the implicit function theorem [22,23] can be used to compute . Because F(x, u) is continuously differentiable around its root, we can write
In the previous section we showed that can be directly expressed in terms of the system matrices and the vectors x and u. The same procedure can be carried out in order to determine
Keeping in mind that F_{x}(x, u) is in its reduced form due to our variable transformation, solving the equation can be accomplished efficiently.
The initial point
The generation of the initial point for the FIA algorithm is very similar to the standard method used by many iterior point algorithms for finding a valid initial point over a convex linear set. We added a fluxesmeasurement regularization factor λ in order to generate an initial point closer to the final estimation (and thus speed up the convergence process). The initial point is generated by solving the following simple convex optimization problem:
with
with A a matrix that selects the measurable elements of u, the meaured elements of u (if they exist), 0 a vector of zeros, and u_{b }a vector of the flux upper bounds. The regularization factor λ starts with some large value, and if necessary is reduced until the optimal value of s is greater than 0. Note that when λ → 0 the problem reduces to finding a feasibile solution of u, and thus always has a solution (for wellstructured networks).
The algorithm
Summarizing the above discussion leads to the following algorithm for efficient solution of the MFA optimization problem using fluxomers:
i. Matrix preparation (run once per network):
0. Calculate using LU (PQ) factorization.
ii. Call the optimizer in order to solve
When requested by the optimizer, return x(u) and its first derivative:
1. Set y_{0 }= y_{init}.
2. Set i = 1.
3. Calculate
y_{i }= U_{H }[g(u) ○ (L_{H1}y_{i  1}) ○ (L_{H2}y_{i  1})].
4. If y_{i } y_{i  1}^{2 }> ε_{N}
(b) Set y_{i }= y_{i } r according to Newton's method.
5. If y_{i } y_{i  1}^{2 }> ε_{e }go to 3.
6. Calculate x = [g(u) ○ (L_{H1}y_{i}) ○ (L_{H2}y_{i})].
The supplied software uses a variant of the "sequential leastsquares" algorithm [24,25] for solving the nonconvex optimization problem in eq. 14. This essentially breaks the problem into a sequence of convex optimization problems for which the solution can be readily computed. Note that other algorithms can be easily used with the same procedures described above.
Authors' contributions
OS developed and implemented the theory, algorithm, simulations and software. YCE and JDY initiated and supervisied the study, and contributed to writing the manuscript. All authors reviewed, edited and approved the final manuscript.
Acknowledgements
JDY is supported by an NSF CAREER Award (CBET0955251) and by the Vanderbilt Discovery Award program.
References

Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P: [http:/ / www.amazon.com/ exec/ obidos/ redirect?tag=citeulike0720&path=AS IN/ 0815332181] webcite
Molecular Biology of the Cell, Fourth Edition Garland. 2008.

Steuer R: Review: On the analysis and interpretation of correlations in metabolomic data. [http://bib.oxfordjournals.org/cgi/content/abstract/7/2/151] webcite
Brief Bioinform 2006, 7(2):151158. PubMed Abstract  Publisher Full Text

Berg JM, Tymoczko JL, Stryer L: [http:/ / www.amazon.com/ exec/ obidos/ redirect?tag=citeulike0720&path=AS IN/ 0716746840] webcite
Biochemistry, Fifth Edition : International Version. Freeman WH; 2002.

Varma A, Palsson BO: Metabolic Flux Balancing: Basic Concepts, Scientific and Practical Use. [http://dx.doi.org/10.1038/nbt1094994] webcite
Nat Biotech 1994, 12(10):994998. Publisher Full Text

Marx A, de Graaf AA, Wiechert W, Eggeling L, Sahm H: Determination of the Fluxes in the Central Metabolism of Corynebacterium Glutamicum by Nuclear Magnetic Resonance Spectroscopy Combined with Metabolite Balancing.
Biotechnol Bioeng 1996, 49:111129. PubMed Abstract  Publisher Full Text

Wiechert W, Mollney M, Isermann N, Wurzel M, de Graaf AA: Bidirectional reaction steps in metabolic networks: III. Explicit solution and analysis of isotopomer labeling systems.
Biotechnol Bioeng 1999, 66(2):6985. PubMed Abstract  Publisher Full Text

Wiechert W, de Graaf AA: Bidirectional reaction steps in metabolic networks: I. Modeling and simulation of carbon isotope labeling experiments.
Biotechnol Bioeng 1997, 55:10117. PubMed Abstract  Publisher Full Text

Wiechert W, Siefke C, de Graaf AA, Marx A: Bidirectional reaction steps in metabolic networks: II. Flux estimation and statistical analysis.
Biotechnol Bioeng 1997, 55:11835. PubMed Abstract  Publisher Full Text

Mollney M, Wiechert W, Kownatzki D, de Graaf AA: Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments.
Biotechnol Bioeng 1999, 66(2):86103. PubMed Abstract  Publisher Full Text

Schmidt K, Nielsen J, Villadsen J: Quantitative analysis of metabolic fluxes in Escherichia coli, using twodimensional NMR spectroscopy and complete isotopomer models.
J Biotechnol 1999, 71(13):17589. PubMed Abstract  Publisher Full Text

Schmidt K, Norregaard LC, Pedersen B, Meissner A, Duus JO, Nielsen JO, Villadsen J: Quantification of intracellular metabolic fluxes from fractional enrichment and 13C13C coupling constraints on the isotopomer distribution in labeled biomass components.
Metab Eng 1999, 1(2):16679. PubMed Abstract  Publisher Full Text

Antoniewicz MR, Kelleher JK, Stephanopoulos G: Determination of confidence intervals of metabolic fluxes estimated from stable isotope measurements.
Metab Eng 2006, 8(4):32437. PubMed Abstract  Publisher Full Text

Yang TH, Heinzle E, Wittmann C: Theoretical aspects of 13C metabolic flux analysis with sole quantification of carbon dioxide labeling.
Comput Biol Chem 2005, 29(2):12133. PubMed Abstract  Publisher Full Text

Antoniewicz MR, Kelleher JK, Stephanopoulos G: Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions. [http://dx.doi.org/10.1016/j.ymben.2006.09.001] webcite
Metabolic engineering 2007, 9:6886. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Quek LE, Wittmann C, Nielsen L, Kromer J: OpenFLUX: efficient modelling software for 13Cbased metabolic flux analysis. [http://www.microbialcellfactories.com/content/8/1/25] webcite
Microbial Cell Factories 2009, 8:25. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Srour O: FIA Software. [http:/ / webee.technion.ac.il/ Sites/ People/ YoninaEldar/ Info/ software/ FIA/ FIA/ FIA%20Software.html] webcite

Wiechert W, Möllney M, Petersen S, de Graaf AA: A universal framework for 13C metabolic flux analysis. [http://dx.doi.org/10.1006/mben.2001.0188] webcite
Metabolic engineering 2001, 3(3):265283. PubMed Abstract  Publisher Full Text

Becker J, Klopprogge C, Zelder O, Heinzle E, Wittmann C: Amplified expression of fructose 1,6bisphosphatase in Corynebacterium glutamicum increases in vivo flux through the pentose phosphate pathway and lysine production on different carbon sources.
Appl Environ Microbiol 2005, 71(12):85878596. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ypma TJ: Historical Development of the NewtonRaphson Method. [http://link.aip.org/link/?SIR/37/531/1] webcite
SIAM Review 1995, 37(4):531551. Publisher Full Text

Deuflhard P: [http:/ / www.amazon.com/ NewtonMethodsNonlinearProblemsD euflhard/ dp/ 3540210997%3FSubscriptionId%3D0JYN1 NVW651KCA56C102%26tag%3Dtechkie20% 26linkCode%3Dxm2%26camp%3D2025%26cr eative%3D165953%26creativeASIN%3D35 40210997] webcite

Ortega JM, Rheinboldt WC: [http:/ / www.amazon.com/ IterativeNonlinearEquationsVaria blesMathematics/ dp/ 0898714613] webcite
Iterative Solution of Nonlinear Equations in Several Variables (Classics in Applied Mathematics, 30). Society for Industrial Mathematics; 1987.

Jittorntrum K: An implicit function theorem. [http://www.springerlink.com/content/x746u2457466u402/] webcite
Journal of Optimization Theory and Applications 1978, 25(4):575577. Publisher Full Text

Krantz SG, Parks HR: [http:/ / www.amazon.com/ ImplicitFunctionTheoremHistoryA pplications/ dp/ 0817642854%3FSubscriptionId%3D0JYN1 NVW651KCA56C102%26tag%3Dtechkie20% 26linkCode%3Dxm2%26camp%3D2025%26cr eative%3D165953%26creativeASIN%3D08 17642854] webcite
The Implicit Function Theorem: History, Theory, and Applications. Birkhäuser Boston; 2002.

Kraft D: Algorithm 733: TOMPFortran modules for optimal control calculations.
ACM Trans Math Softw 1994, 20(3):262281. Publisher Full Text

Gill PE, Murray W, Saunders MA: SNOPT: An SQP Algorithm for LargeScale Constrained Optimization.
2002, 12(4):9791006.