Integrative BioSystems Institute and The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA, 30332, USA

Abstract

Background

Advances in modern high-throughput techniques of molecular biology have enabled top-down approaches for the estimation of parameter values in metabolic systems, based on time series data. Special among them is the recent method of dynamic flux estimation (DFE), which uses such data not only for parameter estimation but also for the identification of functional forms of the processes governing a metabolic system. DFE furthermore provides diagnostic tools for the evaluation of model validity and of the quality of a model fit beyond residual errors. Unfortunately, DFE works only when the data are more or less complete and the system contains as many independent fluxes as metabolites. These drawbacks may be ameliorated with other types of estimation and information. However, such supplementations incur their own limitations. In particular, assumptions must be made regarding the functional forms of some processes and detailed kinetic information must be available, in addition to the time series data.

Results

The authors propose here a systematic approach that supplements DFE and overcomes some of its shortcomings. Like DFE, the approach is model-free and requires only minimal assumptions. If sufficient time series data are available, the approach allows the determination of a subset of fluxes that enables the subsequent applicability of DFE to the rest of the flux system. The authors demonstrate the procedure with three artificial pathway systems exhibiting distinct characteristics and with actual data of the trehalose pathway in

Conclusions

The results demonstrate that the proposed method successfully complements DFE under various situations and without

Background

A grand challenge of biomathematical modeling is the conversion of a biological system into a computational structure that formalizes the underlying system. An important and very challenging component of this process is the estimation of parameter values. The task is typically pursued with one of two generic approaches, namely a forward (bottom-up) or an inverse (top-down) method. Until recently, essentially all models of metabolic pathway systems were developed according to the first strategy, that is, by characterizing model components and processes one at a time and subsequently merging all “local” information about kinetic reaction steps into one comprehensive dynamic model. Although this forward approach is theoretically straightforward, implementation procedures often fail and, moreover, have intrinsic disadvantages

The second, top-down approach uses data that characterize the entire system and attempts to estimate all parameter values at once with a sophisticated optimization algorithm. Specifically, this type of method employs time series data that describe the full dynamic response of a pathway system to some stimulus, such as an environmental stress (

Whether a forward or inverse approach is used, the estimation of parameter values necessitates assumptions regarding the functions or rate laws that describe the reactions of interest. As a prominent example, the typical default for enzymatic reactions in a metabolic pathway is the Michaelis-Menten rate law (MMRL) or one of its variations. While such assumptions are understandable, they create an immediate conundrum. Namely, the true mechanisms governing a biological process are in reality unknown or at least unclear. As a result, the estimation process is from the start unguided, uncertain, or maybe even based on modestly or entirely wrong assumptions. Also, descriptions of more complex enzyme mechanisms contain numerous parameters if several substrates or reactions are involved, so that the alleged functions cannot be identified from the typically sparse data

In addition to the troublesome issue of model selection, most proposed methods for estimation from time series data face significant problems related to the data themselves, to inefficient algorithms, and to a variety of computational issues. To complicate matters further, these issues are usually superimposed. The data may be overly noisy, incomplete, collinear with each other, or non-informative. The computational algorithms are often slow to converge, converge to a locally but not globally optimal solution, or do not converge at all. Finally, there is a mathematical issue, especially for systems with many parameters, namely that a system may admit solutions that are distinctly different yet equivalent, or essentially equivalent, with respect to the residual error. This type of result, referred to as sloppiness and unidentifiability, may be due to redundancies in candidate parameter sets and has received much attention in recent times

A different type of sloppiness may be caused by the fact that different model structures may give essentially identical residual errors. For instance, several probability density functions often model the same data equally well

Recently we proposed a novel approach to metabolic systems estimation, called

The left-hand side of this ODE can be interpreted as the slope of the time course of the variable _{
i
} at a given point in time. Therefore, assuming that the time series data are more or less complete and smooth—or can be validly smoothed (see later) —one can estimate the slope of the time course at each time point and substitute the slopes for the derivatives. If the system contains

The result of this first phase of DFE is a representation of each flux as a numerically characterized function of time and as a function of all contributing metabolites. This representation is not explicit, but purely numerical and consists of points in plots of flux

DFE offers substantial advantages. It makes almost no assumptions and is straightforward if the right data are available. It reveals inconsistencies within the data, avoids compensation among and within equations, and permits quantitative diagnostic tools of whether the assumed mathematical formulations are appropriate or in need of improvement. In addition, since DFE identifies parameters based on explicit single-flux representations, the estimation of parameter values is much easier and more reliable than in other top-down approaches. As a result, DFE promises significantly improved extrapolation capacity toward new data or experimental conditions.

Alas, DFE also has limitations and drawbacks. First, it requires more or less complete time series data that characterize the investigated system. These data are still relatively seldom, although they are being generated at an increasing rate and with rapidly improving quality. Second, and arguably more limiting, a unique solution of the flux equations in the first phase of DFE is only possible if the flux system is of full rank. However, most actually pathway systems contain more fluxes than metabolites and are therefore underdetermined.

Several constraint-based optimization techniques have been proposed for stoichiometric analyses of underdetermined metabolic systems

In contrast to these methods that require objective functions, we proposed extending DFE with the infusion of additional information

The first issue might be ameliorated by methods developed for structure identification of unknown of ill-characterized pathways. These methods include a wide spectrum of techniques, such as perturbation methods, causality models, correlation-based approaches, or probabilistic models, some of which are based on time series data (see

Although we presented proof of concept that the different approaches described above can be used to supplement DFE, these approaches are not always optimal, because they require additional information and assumptions that are

Specifically, we propose here a distinct approach to supplementing DFE with information hidden in suitable metabolic time series. Extracting this information permits the determination of a sufficient subset of fluxes to execute DFE on the rest of the flux system. In contrast to all other solutions presented so far for the complementation of DFE, the method proposed here does not require any assumptions regarding the mathematical representation of the fluxes. Furthermore, kinetic information or knowledge of the functional forms of the enzymatic reactions is not required. We will demonstrate in the following that the proposed method can succeed even if some of the time series data are not measured or when there is mass leakage in the pathway systems. In addition, the new method allows us to address a recurring unanswered question, namely how many time series data are needed to estimate the structure and parameters of a system.

Specific details of the proposed approach are presented in the _{
j
} and _{
i
}, combined with the degradation of _{
i
} within a linear section of a pathway system:

Suppose we have time series data, so that we can estimate _{
i
} has the same value (_{i}), whereas _{
j
} has a different value at each of these time points. It is reasonable to assume that _{
i
}. If so, we have _{
j
} and _{
i
} always has the same value. Using these quantities, the methods proposed here allow us to estimate the functional format of _{
j
} values. Once we know

**This file contains: (1) details regarding the process of merging pairs of points; (2) the estimation procedure for a four-variable branched pathway and results of two cases where fluxes contain more than one variable; and (3) the results of the method for a five-variable system where different levels of artificial noise were added to the time series data and sub-datasets were randomly picked from data generated with ten sets of initial conditions.**

Click here for file

Methods

The proposed method offers a systematic strategy to extend DFE and to ameliorate its limitations. Just like DFE, the proposed method starts with an optional data preprocessing step, but without any assumption regarding the functional formats of the fluxes in the system. First, the experimental data are tested for mass conservation to make sure no mass is lost or gained during the observed time period. If the data do indicate losses or gains in mass, it is useful to locate possible branches off the main pathway(s) and to account for the changes in total mass of the metabolites in the pathway

where _{
i
} denotes the concentration or amount of a variable or variable pool and _{
i
}, respectively. Substituting slope estimates for the differentials in this system of equations decouples the ordinary differential equations (ODEs) and results in a system of fluxes that is linear at each time point

where **s** is a vector of slopes, **N** is the stoichiometric matrix, **v** is a vector of fluxes, and _{1}
_{2}
_{
j
}
_{
K
} where measurements are available.

Next we check the rank of the linear set of algebraic equations in Eq. (2). The system can be easily solved at each time step to obtain dynamic profiles of all fluxes if the system has full rank. Over-determined systems may be solved by pooling fluxes, the use of pseudo-inverse methods, or regression. However, if the system is underdetermined, the solution space is infinite. To overcome this issue, some of the fluxes need to be estimated independently, until the system has full rank and can be solved uniquely. Elsewhere we showed that additional information maybe used to characterize selected fluxes

As an introductory example, consider a linear part of a pathway with feedback inhibition as shown in Figure

The system could be part of a larger pathway system, but for this illustration the context is not relevant. For the illustration, fluxes were generated with a mix of power-law and Hill functions, namely

where _{
max
} = 5 and _{
M
} = 2. We use these settings to create artificial data, but subsequently assume no knowledge of the functions or parameters in Eq. (4).

**(a) Generic three-variable linear pathway with feedback inhibition (Eqs. (3–4))**. **(b)**Time series data, consisting of 50 artificial “measurements” that were generated with initial conditions _{1}(_{0}) = 5, _{2}(_{0}) = 0.1, and _{3}(_{0}) = 8; _{1}, _{2}, _{3} are represented by blue, green, and orange dots, respectively

**(a) Generic three-variable linear pathway with feedback inhibition (Eqs. (****3**–**4****)).****(b)** Time series data, consisting of 50 artificial “measurements” that were generated with initial conditions _{1}(_{0}) = 5, _{2}(_{0}) = 0.1, and _{3}(_{0}) = 8; _{1}, _{2}, _{3} are represented by blue, green, and orange dots, respectively.

Suppose time series data were measured and they are without noise (Figure

Generically, we intend to solve the fluxes in the ^{th} equation, which here happens to have only two fluxes, namely one influx (_{
in
}) going into the pool _{
i
}, and one efflux (_{
out
}) leaving this pool. The flux _{
in
} depends only on the precursor _{
in
} of _{
i
} and _{
out
} depends only on _{
i
} itself; to minimize confusion, we call this variable generically _{
out
}. Extracting the ^{th} equation from Eq. (1), we thus obtain, in general terms,

The functional form of neither flux is assumed to be known. Substitution of derivatives with slopes results in

As a specific illustration, consider the second equation _{2} depends only on the precursor _{1} and _{3} depends only on _{2}. We substitute the derivative

It is reasonable to assume that the in- and effluxes are true functions in a mathematical sense. Thus, since _{
in
} depends only on _{
in
}, _{
in
} must have one and only one value for every given value of _{
in
}. In particular, if _{
in
} assumes the same value at two different time points, _{
in
} must have the same (so far unknown) value at both time points as well. In the illustration example, _{2} depends only on _{1}. Thus, for every value of _{1} there is one and only one value of _{2}. The proposed method therefore requires a screening of the available datasets with the goal of identifying different situations where _{
in
} has some fixed value _{
in_const
}. For all these situations, _{
in
} also has some fixed value _{
in_const
}. Since we do not know the functional form of _{
in
}, we cannot directly compute this value _{
in_const
}. However, we do know that this value is very similar for all situations where _{
in
} ≈ _{
in_const
}. Thus, for the set of all _{
in
} ≈ _{
in_const
}, Eq. (6) has the form

In the illustrative example, we screen the available data sets and search for different situations where _{1} has the same fixed value _{1c
} and, thus, _{2} also has the same (yet unknown) value _{2c
}. Thus, for the entire set of all _{1} ≈ _{1c
} the second system equation has the form

For instance, _{1} has similar values (~0.26) at time points 4, 4.8, 8.8, and 9.2, while _{2} has different values at these time points (Figure

**(a) Fixing**_{1}**within a narrow range (~0.26), four instances of**_{1}**are found (solid red circles)**. Fixing _{1} within another narrow range (~0.6) provides three instances of _{1} (solid orange circles). Similarly, two instances of _{1} are found for _{1} ~1.26 (solid blue circles). **(b)** Collection of 34 “bins” that exhibit the number of times _{1} has approximately the same value given on the _{1}; all other bins are discarded. **(c)** Representation of different _{2} values corresponding to at least two _{1} values in each of the 9 remaining bins. The bars connect the two or more _{2} values in each bin

**(a) Fixing**_{1}**within a narrow range (~0.26), four instances of**_{1}**are found (solid red circles).** Fixing _{1} within another narrow range (~0.6) provides three instances of _{1} (solid orange circles). Similarly, two instances of _{1} are found for _{1} ~1.26 (solid blue circles). **(b)** Collection of 34 “bins” that exhibit the number of times _{1} has approximately the same value given on the _{1}; all other bins are discarded. **(c)** Representation of different _{2} values corresponding to at least two _{1} values in each of the 9 remaining bins. The bars connect the two or more _{2} values in each bin.

We repeat this type of screening for different sets of the same or very similar values of _{
in
}. The result is a set of sets with equal _{
in_const
} values within each set but different _{
in_const
} values for different sets. These sets form a histogram with a bin for each _{
in_const
}. If the range of each bin is small enough, we can assume every _{
in
} in the same bin to have very similar values, so that their corresponding _{
in_const
} are also very similar. Henceforth, we only retain bins with at least two entries. An example in the illustrative example consists of time points 3.4 and 9.6, where _{1} has again similar values. In this case, the value is ~1.26, which is different from the value we screened before. Similarly, for time points 1, 8.4, and 9.4, _{1} has a value of ~0.6 (Figure _{1} has approximately some fixed value, and these sets of _{1} are reflected in a “bin database of values.” Within each bin, the corresponding value of _{2c
} is very similar as well.

Suppose we have identified _{
in
}. For these bins we determine the corresponding _{
out
} values, which are typically different from each other. Suppose that bin

where v_{
in_const
} (_{
p
}) always has the same value, but _{
i
} (_{
p
}) and _{
out
} (_{
p
}) have different values. For our illustration we specify nine bins (_{1} (Figure _{2} at the same time points are shown in Figure ^{th} of the nine bins (shown as the orange bin in Figure _{1}. Therefore, we obtain three equations of the type

Equation (10) is formulated analogously for each bin _{
out
} (_{
p
}) can be represented as at least two equations of the type

Since we do not know the functional form of _{
in
}, we do not know the numerical value of _{
in_const
} (_{
p
}). However, since _{
in_const
} (_{
p
}) is a constant for each bin, the relative positions of a group of values of _{
out
} (_{
p
}) depend on each value –_{
i
} (_{
p
}) within a given bin, and the slope values can be measured directly from the time series data. In addition, since _{
out
} (_{
p
}) is solely determined by _{
out
} (_{
p
}), we can characterize the relative positions of a set of _{
out
} (_{
p
}) and their corresponding values –_{
i
} (_{
p
}). Collecting these relationships, we can establish a plot of _{
out
} (_{
p
}) versus –_{
i
} (_{
p
}). If the bin contains only two points of _{
out
}, we consider them as a pair and link them with a connecting line. If the bin contains _{
out
} (where _{
out
} based on their values and connect every two adjacent points as a pair to form a total of _{
out
} (_{
r
})(1), –_{
i
} (_{
r
})(1)) for the first point and (_{
out
} (_{
r
}) (2), –_{
i
} (_{
r
}) (2)) for the second point, where

To continue the illustration, the 8^{th} bin contains two instances of _{1} ~1.26. The corresponding values of _{2} are 1.54 and 2.93, and the –_{2} values are −1.35 and 0.93, respectively. The points in the plot of _{2} (_{8}) versus –_{2} (_{8}) are therefore represented as (1.54,–1.35) and (2.93, 0.93). We consider these two points as a pair and link them using a red line (Figure ^{th} bin contains four instances of _{1} ~0.26. Their corresponding values of _{2} are 1.20, 1.37, 1.66, and 1.99, and the –_{2} values are 0.05, 0.35, 1.02, and 1.65, respectively. The points in the plot of _{2} (_{5}) versus –_{2} (_{5}) are therefore represented as (1.20, 0.05), (1.37, 0.35), (1.66, 1.02), and (1.99, 1.65). Two points each are considered a pair and linked with a red line (Figure _{
out
} (_{
r
})(1) and _{
out
} (_{
r
})(2) is below some threshold _{r} is set as 0.2 in the examples shown in this article, but it will generally depend on the accuracy and quantity of the data. The higher the value is, the fewer pairs will remain after filtration. However, as long as the remaining pairs cover most of the spectrum in the _{
r
} might be preferable. Suppose

**(a) The 8**^{th}**bin in Figure****(b) contains two different**_{2}**values corresponding to the “blue” instances in Figure****(a) for**_{1}**~1.26.** The corresponding values of _{2} and –_{2}, obtained from the plot of _{2} versus –_{2}, are (1.54, -1.35) and (2.93, 0.93). These two points are considered a pair and linked with a red line. **(b)** The 5^{th} bin of Figure _{1} ~0.26. Their corresponding values of _{2} and –_{2} are (1.20, 0.05), (1.37, 0.35), (1.66, 1.02), and (1.99, 1.65). Two points each are considered a pair and linked with a red line

**(a) The 8**^{th}**bin in Figure****(b) contains two different**_{2}**values corresponding to the “blue” instances in Figure****(a) for**_{1}**~1.26.** The corresponding values of _{2} and –_{2}, obtained from the plot of _{2} versus –_{2}, are (1.54, -1.35) and (2.93, 0.93). These two points are considered a pair and linked with a red line. **(b)** The 5^{th} bin of Figure _{1} ~0.26. Their corresponding values of _{2} and –_{2} are (1.20, 0.05), (1.37, 0.35), (1.66, 1.02), and (1.99, 1.65). Two points each are considered a pair and linked with a red line.

**(a) Pairs of points satisfying a threshold value of****(see****) greater then 0.2**. Seven pairs (_{2} versus _{3}, which in an actual situation is not known. **(b)** Pairs in (a) are merged, based on the distances between points in each “node” and the distances between two points in a pair. **(c)** Subgroups of pairs in (b) are merged. **(d)** If the value of _{3} is known for _{2} = 1or for some other value. The entire cluster of lines is vertically shifted accordingly. If small values of _{2} are covered by the pairs, the shift is determined by the observation that a flux is usually zero if the substrate concentration is zero. Here, the sum of errors between the estimated points and corresponding points on the true green line is 0.0354

**(a) Pairs of points satisfying a threshold value of****(see****) greater then 0.2.** Seven pairs (_{2} versus _{3}, which in an actual situation is not known. **(b)** Pairs in (a) are merged, based on the distances between points in each “node” and the distances between two points in a pair. **(c)** Subgroups of pairs in (b) are merged. **(d)** If the value of _{3} is known for _{2} = 1 or for some other value. The entire cluster of lines is vertically shifted accordingly. If small values of _{2} are covered by the pairs, the shift is determined by the observation that a flux is usually zero if the substrate concentration is zero. Here, the sum of errors between the estimated points and corresponding points on the true green line is 0.0354.

Equation (12) indicates that _{
out
} (_{
p
}) and –_{
i
} (_{
p
}) differ by a constant, since we do not know the value of _{
in_const
} (_{
p
}). This fact translates into a constant vertical shift in the _{
out
} versus _{
out
}, and it is reasonable to assume that this graph is continuous and usually even monotonic. Therefore, the next step is to merge the individual pairs by determining a proper shift for each pair.

Intuitively, it is easy to see how to shift all pairs so that they are close to one continuous line. Automation of the process requires an algorithm that is not quite straightforward, but can be facilitated with a graphical user interface; technical details of a possible merging process are presented in Figure S1 of the Additional file

**SET** each pair of points as a node

**SET** each node as a subgraph

**WHILE** the graph is not connected

**FOR** each subgraph in the graph

**FOR** each node in the current subgraph

**SET** other-subgraphs as the subgraphs; exclude the current subgraph

**CALCULATE** the distance from the current node to every node contained in other-subgraphs

**END FOR**

**FIND** the shortest distance and its corresponding nodes

**CONNECT** these two nodes

**END FOR**

**END WHILE**

When the merging is completed, all pairs of points are close to a relatively smooth line, but the overall shift of the group of pairs is not known. We do know that essentially all metabolic fluxes will have values close to zero when their substrate concentration approaches zero. Thus, if sufficiently small substrate values are available in one of the bins, one easily estimates a reasonable shift. Should the flux value associated with some substrate concentration be known, the shift can be determined from this information. A further alternative is the following. If the inferred trend line suggests that the flux follows some rate law, such as a Hill function, the parameters of this function, together with the appropriate shift, can be obtained in a single optimization step.

Figure _{3} is known for _{2} = 1. If so, we ultimately shift the entire trend accordingly. The result is shown in Figure

Finally, based on the numerical or graphical flux profile thus determined, one may test candidate functions that capture the flux-substrate relationship. For instance, the result in the illustrative example shows that the functional relationship of _{2}
_{3} is

Now that we have determined _{3}, it is easy to compute _{2} from the measured slopes of _{2}. The plot of _{3} is slightly curved, which is consistent with its power-law function in Eq. (4), although again, there is no proof. The

The parameters of any candidate functional form are easily estimated, because no differential equations are involved and the problem is of low dimension; they represent a fully parameterized kinetic model for the flux term itself and, subsequently for the differential equation. Due to this simplicity, it is even possible to scan a variety of candidate functions and assess their appropriateness. If a suitable functional format can be determined with appropriate parameter values, the task is completed. If not, one may represent the flux-substrate plot with a piecewise-polynomial function, such as cubic spline. Even in this non-explicit, numerical format, the result is sufficient to reduce one or two degrees of freedom in the overall DFE task. Figure

**Flowchart of the proposed method.** Starting with experimental time series, the data are smoothed and balanced for mass conservation, if necessary. The slopes of the time series at each time point are estimated. Combined with the knowledge of the system topology, substitution of the derivatives in the ODE with slope information yields a linear system of fluxes. If the system has full rank, solve the system with techniques from linear algebra. If the system is underdetermined, use auxiliary steps, as proposed in this article, to solve a subset of the fluxes until the system is of full rank. The results are the dynamic profiles of all extra- and intra-cellular fluxes in the system. If desired, make assumptions regarding the functional forms of the fluxes. These functions correspond to symbolic flux representations that can be independently fitted to the respective dynamic flux profiles and result in a fully parameterized kinetic model. As an alternative each process may be approximated as a piecewise function, for instance using spline methods

**Flowchart of the proposed method.** Starting with experimental time series, the data are smoothed and balanced for mass conservation, if necessary. The slopes of the time series at each time point are estimated. Combined with the knowledge of the system topology, substitution of the derivatives in the ODE with slope information yields a linear system of fluxes. If the system has full rank, solve the system with techniques from linear algebra. If the system is underdetermined, use auxiliary steps, as proposed in this article, to solve a subset of the fluxes until the system is of full rank. The results are the dynamic profiles of all extra- and intra-cellular fluxes in the system. If desired, make assumptions regarding the functional forms of the fluxes. These functions correspond to symbolic flux representations that can be independently fitted to the respective dynamic flux profiles and result in a fully parameterized kinetic model. As an alternative each process may be approximated as a piecewise function, for instance using spline methods.

The procedure described above has generated one or two additional flux estimates. For the example in Eq. (3), the determination of _{2} and _{3} “fills” the rank, and the system can be uniquely solved. In fact, only one of the two is needed. For examples where one or two additional fluxes are not sufficient for a unique solution, the same procedure has to be performed with other equations until enough fluxes are determined to make the flux system full rank. DFE subsequently identifies all other fluxes as plots against time or against their substrates and modulators.

In cases where fluxes contain more than one variable, the time courses have to be screened for combinations where the contributing variables have the same values. The concepts of the procedure are exactly the same as for the univariate case, but the implementation is obviously more involved (see Additional file

Results

The simple linear pathway shown in the previous section illustrated the concepts of the proposed extension to DFE. This section describes applications of the proposed methods in the context of further didactic and actual examples that become increasingly more complicated. We begin with two artificial cases with distinct characteristics and conclude with the analysis of experimental observations describing trehalose metabolism in the yeast

Branched pathway with feedforward activation and feedback inhibition

Consider a branched pathway with fluxes represented by various functional forms, including Michaelis-Menten and Hill functions with inhibition and activation. The pathway, shown in Figure

The kinetic descriptions for each of the reactions are:

As before, we use these formats to generate artificial data, but subsequently assume no knowledge of their characteristics.

**(a) Metabolic network with positive feedforward and negative feedback**. All enzymatic reactions are assumed to follow Michaelis-Menten or Hill kinetics except for those corresponding to _{2} and _{5}, which are assumed to be represented with an Irreversible General Hyperbolic Modifier Kinetic function and with an Irreversible Hill function with one modifier, respectively (see Eq. (14) for details). **(b)** Sets of initial conditions used to generate six different datasets. **(c)** Time series data corresponding to the first set of initial values in (b); _{1}, _{2}, _{3}, _{4} are represented by blue, red, orange, and green dots, respectively

**(a) Metabolic network with positive feedforward and negative feedback.** All enzymatic reactions are assumed to follow Michaelis-Menten or Hill kinetics except for those corresponding to _{2} and _{5}, which are assumed to be represented with an Irreversible General Hyperbolic Modifier Kinetic function and with an Irreversible Hill function with one modifier, respectively (see Eq. (14) for details). **(b)** Sets of initial conditions used to generate six different datasets. **(c)** Time series data corresponding to the first set of initial values in (b); _{1}, _{2}, _{3}, _{4} are represented by blue, red, orange, and green dots, respectively.

The system in Eq. (13) is not of full rank. Thus, some of the fluxes need to be determined with the proposed method. For our illustration, we select the third equation in Eq. (13), because it contains only two fluxes; also, _{3} depends only on _{2}, and _{4} depends only on _{3}, which we know from the topology of the pathway. In the previous example, all time series were oscillating and it was easy to find enough data points where one variable is fixed and other variables display different values. In the present example, each single dataset displays changes over time that show few repeated concentration values (see Figure

For this illustration, we simulated multiple datasets with the initial values presented in Figure _{3} by using the first four datasets in Figure _{2} (Figure _{2} in the four datasets (from 0.25 to 2.34). The merging process of pairs is shown in Figure _{2} (~0.25) to be close to zero and shift the entire set of merged pairs up by about six units to obtain the estimates of _{3}. Indeed, this step recoups the true flux, which is shown in green, but would be unknown in a real application. Once _{3} is determined, the system of Eq. (13) is still underdetermined and another flux needs to be estimated to make the system full rank. The most straightforward choice is _{4}, which is directly computed from _{3} and the measured slopes of _{3}.

**(a) Bins of instances of**_{3}**for different values; the range of each bin is chosen as 0.033**. Among the 26 bins, 13 bins have at least two _{3} values; the others are discarded. **(b)** Representation of 13 sets of corresponding _{2} values in those bins that have at least two _{3}. The bars connect two or more _{2} values within each _{3} bin

**(a) Bins of instances of**_{3}**for different values; the range of each bin is chosen as 0.033.** Among the 26 bins, 13 bins have at least two _{3} values; the others are discarded. **(b)** Representation of 13 sets of corresponding _{2} values in those bins that have at least two _{3}. The bars connect two or more _{2} values within each _{3} bin.

**(a) Collection of****pieces exceeding a chosen threshold****(here****= 12 and****= 0.2; see****)**. The green line is the “true” functional representation of _{2} versus _{3}. **(b)** Pairs in (a) are merged based on their distances and on the distances between two points in a pair. **(c)** The subgroups of pairs in (b) are merged. **(d)** The sigmoidal shape of points in (c) suggests that the flux of the smallest _{2} (~0.25) should be close to zero. The sum of errors between the estimated points and their corresponding true values (on the green line) is 0.0551

**(a) Collection of****pieces exceeding a chosen threshold****(here****= 12 and****= 0.2; see****).** The green line is the “true” functional representation of _{2} versus _{3}. **(b)** Pairs in (a) are merged based on their distances and on the distances between two points in a pair. **(c)** The subgroups of pairs in (b) are merged. **(d)** The sigmoidal shape of points in (c) suggests that the flux of the smallest _{2} (~0.25) should be close to zero. The sum of errors between the estimated points and their corresponding true values (on the green line) is 0.0551.

Instead of _{4}, one could also estimate an additional flux from another equation in Eq. (13) using the same procedure, for example, by solving _{5} and _{6} in the fourth equation. Flux _{6} depends only on _{4} but _{5} depends on two variables _{1} and _{2}. The steps of estimating _{5} and _{6} are described in Additional file

The proposed method was also tested on a five-variable system that has been used as a benchmark problem in many articles (

Glycolysis and trehalose production

This last example describes in a simplified fashion how the baker’s yeast

**Schematic representation of a simplified model of glycolysis and the trehalose cycle in the yeast****(adapted from****).**_{i} and _{i} represent dependent variables and fluxes, respectively. One inhibitory interaction is shown in red. Abbreviations: _{1}, extracellular glucose; _{2}, intracellular glucose; _{3}, glucose 6-phosphate; _{4}, trehalose; _{5}, fructose 1,6-bisphosphate; _{6}, extracellularly accumulating end products (ethanol, glycerol and acetate); _{7}, mass diverted into the pentose phosphate pathway; _{8}, mass consumed by other pathways (_{1}, glucose transport; _{2}, hexokinase and glucokinase; _{3}: aggregated step of all enzymatic steps between glucose 6-phosphate and the production of trehalose; _{4}, trehalase; _{5}, phosphoglucose isomerase and phosphofructokinase; _{6}, aggregated step of all enzymatic steps between fructose 1,6-bisphosphate aldolase and the release of end-products; _{7}, flux into the pentose phosphate pathway; _{8}, flux towards other pathways (leakage). Metabolites without available experimental measurements are shown in gray. The flux _{6} (blue) is directly measurable from the time series of _{6}. Fluxes _{3} and _{4} (green) were estimated using the proposed method

**Schematic representation of a simplified model of glycolysis and the trehalose cycle in the yeast****(adapted from****).**_{i} and _{i} represent dependent variables and fluxes, respectively. One inhibitory interaction is shown in red. Abbreviations: _{1}, extracellular glucose; _{2}, intracellular glucose; _{3}, glucose 6-phosphate; _{4}, trehalose; _{5}, fructose 1,6-bisphosphate; _{6}, extracellularly accumulating end products (ethanol, glycerol and acetate); _{7}, mass diverted into the pentose phosphate pathway; _{8}, mass consumed by other pathways (_{1}, glucose transport; _{2}, hexokinase and glucokinase; _{3}: aggregated step of all enzymatic steps between glucose 6-phosphate and the production of trehalose; _{4}, trehalase; _{5}, phosphoglucose isomerase and phosphofructokinase; _{6}, aggregated step of all enzymatic steps between fructose 1,6-bisphosphate aldolase and the release of end-products; _{7}, flux into the pentose phosphate pathway; _{8}, flux towards other pathways (leakage). Metabolites without available experimental measurements are shown in gray. The flux _{6} (blue) is directly measurable from the time series of _{6}. Fluxes _{3} and _{4} (green) were estimated using the proposed method.

**Experimental metabolite time courses of glucose metabolism determined by**^{13} **C-NMR in****grown under optimal temperature (30 °C) with a single pulse of glucose (65 mM) (adapted from****).** The dots for _{1}, …, _{6} are experimental measurements, while _{7} was determined from the flux _{7}, which was inferred with the methods described in the _{6}

**Experimental metabolite time courses of glucose metabolism determined by**^{13} **C-NMR in****grown under optimal temperature (30°C) with a single pulse of glucose (65 mM) (adapted from****).** The dots for _{1}, …, _{6} are experimental measurements, while _{7} was determined from the flux _{7}, which was inferred with the methods described in the _{6}
.

The model contains eight dependent variables and eight fluxes, as shown in Eq. (15), where _{ext} and _{int} represent the extracellular (0.05 L) and intracellular (0.00717 L) volume of the bioreactor and the cell population, respectively. Each of the fluxes is a function of some of the variables, as shown in Eq. (16), but it is important to note that we do not make any assumptions regarding the functional forms of the fluxes. In principle, DFE seems to be directly applicable. However, the time series data contain the measurements of only five of the metabolites, namely Glc (_{1}), G6P (_{3}), Tre (_{4}), FBP (_{5}), and extracellularly accumulated end products (EtOH, Gly, and Ace; _{6}). Without the measurements of _{2}, _{7}, and _{8}, the system in Eq. (15) is not of full rank and, due to the experimental set-up, _{7} and _{8} cannot be measured or determined directly by estimating slopes.

To complement the rank of the flux system, we use the proposed method of flux estimation. First, one should note that the measurements of Glc (_{1}) concern extracellular glucose. Thus, _{1} is easy to measure experimentally, but it is very difficult to obtain good measurements of intracellular glucose (_{2}), because it is immediately converted in to G6P (_{3}). Thus, the proportion of Glc (_{2}) is negligible in comparison to Glc (_{1}), and because the measured concentration of glucose is close to the sum of Glc (_{1}) and Glc (_{2}), we merge _{1} and _{2} into one pool, which is represented by the sum of the first two equations in Eq. (15). Furthermore, the amount of material entering the pentose phosphate pathway (PPP; _{7}) is not directly measurable, but independent lab experiments had indicated that it has a value of approximately 5% of the glycolytic flux; thus

To supplement the underdetermined DFE, we select the equation _{3} and _{4} are available. As before, we fix _{3} at some values (Figure _{4} and –_{4} (Figure _{4}) and the minimum of _{4} is very close to zero. For a concentration close to zero, the value of the flux should be close to zero as well. Therefore, the entire cluster of pairs is moved up around 4 units, and the updated functional plot is shown in Figure _{3} can now be calculated accordingly and transformed to the form as fluxes versus time. After the determination of _{3} and _{4}, the system of Eq. (15) becomes full rank and the rest of the fluxes at each time point can be solved with DFE even without knowledge of the times series of _{7} and _{8}. Indeed, the time courses of _{7} and _{8} can be calculated via point-by-point integration of _{7} and _{8}. Upon the determination of the concentrations of all variables, the total mass over time can be calculated, confirming no significant loss or gain of mass (Figure

**(a) The experimental concentration data (29 time points) were smoothed and interpolated with a spline function, thereby yielding metabolite levels of**_{3}**at about 300 time points**. These _{3} values were put into 186 different bins with size 0.03. Among these, 54 bins have at least two _{3} values. **(b)** Graph of _{4} values, corresponding to at least two _{3} values in each of the _{3} bins. Selection of **(c)** Pairs in (b) are merged. **(d)** Resulting functional plot of _{4}_{4}; the blue dots represent the dots in (c), while the red triangles represent the true plot of _{4}_{4} in the dynamic model; in reality, these would not be known. **(e)** Functional plot of _{3}_{3} (blue dots), calculated from the blue dots in (d), and true values of _{3} (red triangles) according to the dynamic model. **(f)** Confirmation that the total mass (represented as the number of 3-carbon units) does not change appreciably over time

**(a) The experimental concentration data (29 time points) were smoothed and interpolated with a spline function, thereby yielding metabolite levels of**_{3}**at about 300 time points.** These _{3} values were put into 186 different bins with size 0.03. Among these, 54 bins have at least two _{3} values. **(b)** Graph of _{4} values, corresponding to at least two _{3} values in each of the _{3} bins. Selection of **(c)** Pairs in (b) are merged. **(d)** Resulting functional plot of _{4}_{4}; the blue dots represent the dots in (c), while the red triangles represent the true plot of _{4}_{4} in the dynamic model; in reality, these would not be known. **(e)** Functional plot of _{3}_{3} (blue dots), calculated from the blue dots in (d), and true values of _{3} (red triangles) according to the dynamic model. **(f)** Confirmation that the total mass (represented as the number of 3-carbon units) does not change appreciably over time.

Once we have obtained the time series of all fluxes, we can generate the plots of concentrations of metabolites that are involved in the enzymatic reactions (see Eq. (16)) versus a flux. The results are shown in Figure

**Results from the proposed method and subsequent application of DFE to yeast data from the model in Figure****.** Shown are metabolite concentrations against fluxes at different time points (blue dots), connected by inferred trend lines for all fluxes (green lines)

**Results from the proposed method and subsequent application of DFE to yeast data from the model in Figure****.** Shown are metabolite concentrations against fluxes at different time points (blue dots), connected by inferred trend lines for all fluxes (green lines).

Discussion

Of all steps in the generic mathematical modeling process, parameter estimation and structure identification continue to be among the most severe bottlenecks for modeling biological systems. Until recently, this task was typically pursued from the bottom up by using local data from individual enzymatic steps. However, modern techniques of molecular biology have provided us with a strikingly different estimation strategy, namely a top-down or inverse approach, which is based on dynamic time series data that are being generated with rapidly increasing frequency and quality. Many recent articles have proposed various methods to tackle this inverse estimation problem using time series data. However, none of these methods are effective in all cases. Furthermore, almost all methods have been focusing on the goodness of fit and the speed of the algorithm, but not necessarily the quality of fit in terms of the validity of the model, extrapolation ability, and predictive power with respect to data not used in the estimation. In addition, there has been little discussion of the diagnostic tools for data fits beyond the residual error. For instance, it is possible that a fit is good in terms of the residual error, but that the estimated fluxes are incorrect because of numerical compensations between terms within the model

Dynamic Flux Estimation (DFE)

In this article we propose a model-free approach with minimal assumptions to supplement DFE with information already embedded in the time series data. The proposed method starts with the selection of a decoupled equation; preferable one that contains a minimal number of terms and contributing metabolites. Within this equation, we repeatedly fix one or a few variables that have constant or very similar values within certain small ranges, and find the corresponding values of the variables that appear in another flux of the equation. The result of this step is a plot of a flux versus a metabolite, with several pairs of points showing the relative positions of the true metabolite concentration and the flux values in each pair. The position of each pair is initially subject to shifting in the

The proposed method may appear cumbersome or even baroque. However, one should consider that it solves a problem that so far has not even been addressed—let alone solved—with any systematic approach. Also, the method is presently likely to suffer from a lack of suitable data. But judging by the development of high-throughput experimental methods and the number and increasing quality of published time series over the past decade, this issue seems to be primarily a matter of time. Indeed, one should expect that it will soon be feasible to generate strategically selected, multiple datasets for the identification of a system, which differ slightly in their settings. These datasets must come from experiments that do not alter the functional characteristics of the fluxes in the system but might, for instance, measure system responses under modestly different substrate or inhibitor conditions. At the same time, the data should be representative of the dynamics of the system within the pertinent ranges of its variables.

The method involves one step that is subject to bias. Namely, the overall shifting of the flux-metabolite relationship requires extrapolation or some other information, unless metabolite concentrations close to zero are available. To resolve this issue, it might be possible to determine a reference point for the shift from enzymatic or kinetic information. However, in many cases, this information will have been obtained

Outside these remaining details, the proposed method has several notable advantages. First, no assumptions are needed regarding the mathematical representations when determining the individual fluxes. Second, the application of the method is not limited to a small range of a metabolite or its flux. Instead, it allows the modeler to examine the full spectrum of the functional form, depending on how widely the available time series data cover metabolite concentrations along the

Conclusions

In this article we propose a systematic strategy to supplement and ameliorate the limitations of the method of Dynamic Flux Estimation (DFE). The proposed strategy makes no

Competing interests

The authors declare no competing interests.

Authors’ contributions

Ideas and concepts were jointly discussed among both authors. ICC developed and implemented the project under the supervision of EOV. Both authors contributed to the writing of the manuscript. Both authors read and approved the final manuscript.

Acknowledgments

The authors are grateful to Dr. Luis L. Fonseca for constructive discussions and for allowing us to use some of his data. They also acknowledge David Fieni’s work on an automated shifting algorithm. This work was supported in part by a Molecular and Cellular Biosciences Grant (MCB-0946595; E.O. Voit, PI) from the National Science Foundation, a grant from the National Institutes of Health (R01 GM063265; Y.A. Hannun, PI), and an endowment from the Georgia Research Alliance. The work was also in part funded by the BioEnergy Science Center (BESC), which is a U.S. Department of Energy Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring institutions.