Abstract
Background
The increasing availability of models and data for metabolic networks poses new challenges in what concerns optimization for biological systems. Due to the high level of complexity and uncertainty associated to these networks the suggested models often lack detail and liability, required to determine the proper optimization strategies. A possible approach to overcome this limitation is the combination of both kinetic and stoichiometric models. In this paper three control optimization methods, with different levels of complexity and assuming various degrees of process information, are presented and their results compared using a prototype network.
Results
The results obtained show that BiLevel optimization lead to a good approximation of the optimum attainable with the full information on the original network. Furthermore, using Pontryagin's Maximum Principle it is shown that the optimal control for the network in question, can only assume values on the extremes of the interval of its possible values.
Conclusions
It is shown that, for a class of networks in which the product that favors cell growth competes with the desired product yield, the optimal control that explores this tradeoff assumes only extreme values. The proposed BiLevel optimization led to a good approximation of the original network, allowing to overcome the limitation on the available information, often present in metabolic network models. Although the prototype network considered, it is stressed that the results obtained concern methods, and provide guidelines that are valid in a wider context.
Background
Current metabolic engineering processes allow to manipulate metabolic networks to improve the desired characteristics of biochemical systems [1]. These manipulations may lead to the maximization of the normal product yield or redirect the production to a flux that was residual or nonsignificant in the original network. The high level of uncertainty in metabolic network models knowledge makes it extremely difficult to determine what are the required manipulations needed to attain a given objective. Since an heuristic approach to such problems does not allow to explore the maximum potential of metabolic engineering, two approaches are usually considered when modeling metabolic networks. Kinetic models describe the complete dynamics of the network, and have proven useful to implement optimization and control over the network, such as in [2]. The creation of reliable kinetic models involves the estimation of parameters, the complexity of this task increasing with the size of the network considered.
The second approach models the networks on the basis of reaction stoichiometry. Although easier to obtain, these models lack the ability to directly predict the dynamics of the system.
Several techniques have been proposed to optimize and infer network characteristics from these models. In [3] a platform that combines many of these methods is presented. Flux Balance Analysis (FBA) allows the determination of the optimal flux distribution on a network described in terms of the stoichiometry of the reactions and yields reliable results in the study of metabolic systems [46]. A review of the method can be seen in [7].
When optimizing a metabolic network for a given objective two distinct problems must be addressed. The first is to find which branch or branches must be manipulated. The second is to determine what type of alterations must be done. Strategies such as OptKnock [8] and the work in [9] address the first problem. In this work a strategy for the second problem is described.
The simulation and engineering of metabolic network models typically involves complex optimization procedures. Geometric Programming (GP), one of the techniques used in this paper, is a powerful mathematical optimization tool that can be applied to problems where the objective and constraint functions have a special form [10]. GP is of particular interest because it can solve large scale problems with extreme efficiency and reliability [11]. Furthermore it has been shown that a problem formulated in SSystems form can be solved with GP after a minimum adaptation [12].
A common optimization problem is the maximization of the final concentration of a metabolite whose formation competes with the natural objective of the cell (e.g. maximization of biomass). In this work, a prototype network with such behavior is taken as example and the corresponding optimization problem is solved with three alternative methods.
It is stressed that the emphasis of this work is on the methods and not the specific network considered. The key point of the paper consists in establishing properties of a number of optimization methods that may serve as guidelines when considering more complex networks.
This will be further explained in the next section.
Results and Discussion
An overall view of the problem considered and paper contributions is first presented. Details may then be seen in subsequent sections.
The problem to consider consists in finding a control function, defined over a finite interval of operation time, such that the final concentration of a desired product is maximum. This product is yielded by a metabolic network that, depending on the control function, either produces it or a product that favors cell growth. In order to settle ideas, assume that the control variable u is such that it is constrained to be in the interval [0, 1], with u = 0 corresponding to only production of cell growth product and no production of the desired product, and u = 1 corresponds to the inverse situation. Values of u in between 0 and 1 correspond to a mixed production in a way that depends on the network dynamics.
Since the optimization is with respect to a time function, this is an in finite dimensional problem. However we prove in this paper, using Pontryagin's Maximum Principle [13], that the optimal control only assumes values of 0 and 1. This is a priori assumed by other authors [14] and receives now a solid justification. It is a result valid for similar metabolic network problems that aim at optimizing a final yield (e.g. a concentration at the end of the optimization time interval, such as in [15]) and such that the control enters linearly in the network equations.
The significance of this result consists in the fact that, instead of searching the optimal control among piecewise continuous functions assuming values between 0 and 1, one only has to look functions assuming the extreme values of 0 and 1.
Furthermore, in the case study considered, it is shown that the optimum has only one switch between 0 and 1. Therefore the search for the optimum is reduced to find the switching instant, t_{reg}, that leads to the maximum final yield. Considering the structure of the metabolic network, this is intuitive: the optimum is achieved by first applying all cell resources to population growth and, after t_{reg}, to redirect them to desired production. If t_{reg }is too small, the desired production rate is higher during more time, but the cell population to which it applies is small. If t_{reg }is too big, there are many cells to produce, but they only act during a small time interval. Hence, there is an optimum value for t_{reg}.
As mentioned in the Background section, a major problem is the high level of uncertainty in the knowledge about metabolic network dynamics. In this respect we consider different optimization algorithms that assume various degrees of information about the system to be optimized.
The first is direct optimization. This assumes complete knowledge about the system and is included to establish a benchmark with which other methods may be compared.
The other two methods are variants of a bilevel algorithm designed in order to accommodate missing information on the network kinetics. Both cases differ from the type of inneroptimization: Geometric Programming in one case and Linear Programming in the other. Both methods lead to good approximations of the optimal control, with a slight advantage of the one relying on Geometric Programming.
Prototype network model
The optimization strategies were tested on a prototype network that is a modified version of a previously one suggested in [16]. The choice of this network was due to its widespread use as a test benchmark for several optimization algorithms. A graphical representation of the network is shown in Figure 1 associated with the following set of ordinary differential equations:
Figure 1. Prototype network. The circles correspond to metabolites and the arrows to fluxes with the reaction rates indicated.
Here the states x_{i }, i = 1,..., 5 are metabolite concentrations at the network nodes, v_{i}, i = 1,..., 4 are fluxes associated to the metabolic network branches and k is a constant parameter that represents the uptake of x_{1}. In the equations, u represents a control function that allows to redirect the flux between the branches x_{2 }→ x_{3 }and x_{2 }→ x_{4}. Assuming that x_{3 }represents a precursor of the cellular objective (such as growth) and x_{5 }the desired product, if u(t) is biased towards the branch of v_{2 }this yields the formation of x_{3 }but little or no production of x_{5}. If u(t) is biased towards the branch of v_{3 }the production of x_{5 }will be affected by the low concentration of x_{3 }(since there is a forward feedback). Thus, there is an optimal profile for u(t) to maximize the concentration of x_{5 }at the final time t_{final}.
In the framework of Ssystems [16] the prototype network is described by:
where β_{i }are the rate constants, g_{ij }and h_{ij }are the kinetic orders. Table 1 shows the list of parameters. All the simulations using the prototype network assume x(0) = [0.8 0 1 0 0].
Table 1. Parameters used in the prototype network.
Direct optimization
Direct optimization uses model (2) with the set of parameters from Table 1.
On a first approach, all possible integer values of t_{reg }in the interval t_{reg }= [1, 30] were used to compute the final product concentration x_{5}(t_{final}) where t_{final }= 30s. Figure 2 plots the resulting function J(t_{reg}) = x_{5}(t_{final}).
Figure 2. Results of the simulation using Direct optimization. The final product concentration is shown as a function of T_{reg}. For the value of T_{reg }corresponding to the dotted line there is a maximum yield.
It is clear from Figure 2 that there is an optimal value for the time of regulation that maximizes the yield of x_{5}. For the network considered, the optimal time of regulation is t_{reg }= 9s. If u(t) switches from 0 to 1 before t_{reg }the formed biomass will not be enough to maximize x_{5}(t_{final}). On the other hand, if u(t) switches from 0 to 1 after t_{reg}, there will be more biomass but there time will not be enough time to produce the maximum possible amount of x_{5}.
To illustrate better the behavior of the prototype network, simulations were made for t_{reg }= 4, t_{reg }= 9 and t_{reg }= 14. The obtained optimal t_{reg }= 9 is compared in Figure 3 with lower and upper values in order to show the different time evolution of the metabolites.
Figure 3. Comparison of three u(t) profiles. Three time profiles for the control function u(t) (above) and the corresponding product yield (below). The solid line is the optimal T_{reg }obtained by Direct Optimization.
BiLevel Optimization
The BiLevel optimization was used to test all the possible values of t_{reg}. Figure 4 plots the normalized curves for J(t_{reg}) = x_{5}(t_{final}) for the two optimizations, inneroptimization using Geometric Programming (GP) and inneroptimization using Linear Programming (LP). By comparing Figure 4 with Figure 2 it can be seen that the profiles remain similar. The final product yield, x_{5}(t_{final}), increases with t_{reg }until the optimal value is reached, then it starts decreasing.
Figure 4. Result of the optimization using the Inner Optimization with Geometric Programming (left) and Linear Programming (right). The profiles of the production of x_{5 }remain similar to the simulation using Direct optimization.
The optimal time of regulation obtained with both GP and LP on the inner optimization was t_{reg }= 9. As shown mathematically in the methods section, the optimal control function is either 0 or 1, provided that the dynamics depends linearly on the control and the cost to optimize has only a final term.
In this case the dependency of the Hamiltonian function on u is linear (as given by (8) below). For the prototype considered, . Figure 5 shows a plot of ϕ(λ(t), x(t)) obtained with a nearoptimal control function u(t). As expected, ϕ(λ(t), x(t)) is negative for values smaller than t_{reg}, leading to an optimal control u(t) = 0 and becomes positive for values larger than t_{reg}, leading to u(t) = 1. Thus, the optimal control is obtained on the extremes of the allowed interval and furthermore, one single switch (from 0 to 1) is enough to achieve the optimal control. It should be remarked that, since ϕ(λ(t), x(t)) is close to zero around t = t_{reg}, in practice, when using a numeric method there can be some jittering in the transition of the manipulated variable.
Figure 5. Plot of ϕ(λ(t), x(t)) (9). This function changes sign at the optimal instant of control switching T_{reg}.
Conclusions
For a class of networks in which the yield of the product that favors cell population growth (the "natural" product) competes with the desired product yield, with the manipulated variable affecting linearly the fluxes, it has been shown that the optimal control assumes only extreme values. While the implementation of this optimal control poses no challenge on in silico metabolic networks, on real metabolic networks complex bioengineering skills are required. Gene knockout manipulations do not adequate to this kind of control problem due to the long time scale associated with these techniques. The manipulation of specific enzyme levels, controlled by modulating the expression of the corresponding genes using promoter systems and inducers, is a possible solution to this kind of control problem [14].
The use of a bilevel optimization strategy, that maximizes the natural product in the inner level by manipulating the fluxes, leads to a good approximation to the optimal solution, with the advantage of not requiring the full knowledge of the network model. Real networks are extremely complex and exhibit relations between metabolites that are not always expected or fully understood. This gives emphasis to the need of good in silico models and also to the determination of the exact branches to be modified when optimizing a network. Although the example network used is very simple, it has proved to be useful to test the optimization strategies but a more complex network should be used to confirm that the strategy can be scaled to a larger network.
Methods
The optimization problem
The optimization problem consists in relation to a nonlinear state model of a metabolic network like (2), in selecting u(t) for t ∈ [0, t_{final}] such that:
is maximized under the constraint that u(t) ∈ [0, 1], ∇t ≥ 0.
The solution of the optimization problem is obtained using different approaches. Before accomplishing this task, Pontryagin's Maximum Principle is invoked to establish a particular form of the optimal control function for the class of problems at hand.
Optimization
The control function is now optimized in order to obtain a maximum yield of biomass at the end of the runtime (t_{final}). Three different methods, assuming various levels of information about the network, are considered in order to attain this goal.
The first method, direct optimization, is used as a benchmark to compare the results of the other methods. The last two methods rely on a Bilevel optimization and illustrate a possible solution to the optimization problem when the information about the network is incomplete.
Direct optimization
The first method, Direct Optimization, is used mainly as a benchmark, to compare the results of the following methods. Since it is assumed that all the information about the network kinetics is known, the system of differential equations, described in (2) is used. Given a function that receives t_{reg }as input and outputs the final yield of x_{5}, this optimization tests all the possible values of t_{reg }and returns the function J(t_{reg}) = x_{5}(t_{final}). The value of t_{reg }that results on a maximum product yield is then determined by solving a simple optimization problem.
The optimization was tested with two MATLAB functions: fmincon, from the standard optimization toolbox, that finds the minimum of a constrained nonlinear multi variable function, and simannealingSB from Systems Biology Toolbox [17] that performs simulated annealing optimization.
BiLevel Optimization algorithm structure
The BiLevel optimization algorithm was structured so as to accommodate missing information on the network kinetics. The boxed metabolites and fluxes from Figure 1 are a part of the network that might not be fully described in terms of kinetics. In this approach the missing kinetic information is replaced by stoichiometric data and flux balance analysis is used to obtain the proper flux distribution. Then, an inner optimization determines the fluxes during the batch time. The first step of the inner optimization process is to define the initial conditions of the input x_{1 }and outputs x_{3}, x_{5}. A valid distribution for the fluxes v_{1}, v_{2}, v_{3 }and v_{4 }is then obtained.
After obtaining the flux distribution, new values for the input/outputs can be calculated by integrating their expressions in the considered time interval. During this time interval the function u(t) and the values of v_{1}, v_{2}, v_{3 }and v_{4 }are kept constant. This process is repeated along a time grid from t = 0 to t = t_{final}. The time interval for the integration was defined to be 1 second. The inner optimization process allows us to obtain the product yield, x_{5}(t_{final}), given a certain u(t), taking into account a valid approximation of the network dynamics over the simulation time. The detailed fluxogram of the inneroptimization is shown in Figure 6.
Figure 6. InnerOptimization algorithm. Block diagram of the InnerOptimization algorithm.
The bilevel optimization algorithm can be represented schematically as in Figure 7.
Figure 7. BiLevel optimization formulation. Structure of the BiLevel optimization.
Inneroptimization using Geometric Programming
On the first implementation of the BiLevel optimization algorithm the dynamics of the boxed metabolites from Figure 1 are used but, following the algorithm structure, steadystate is assumed. Thus, and from (2) become:
In this algorithm implementation, the inner optimization problem determines the profile of the metabolites, instead of fluxes, due to the nature of the equations. The metabolite concentrations are calculated at the beginning of each time interval, solving a Geometric Programming problem, and used with (2) to integrate the values of x_{1}, x_{3 }and x_{5 }during that interval.
Inneroptimization using Linear Programming
On the second implementation it is assumed that only stoichiometric information is available for the reactions inside the box of Figure 1. Assuming steady state, the equations of and become:
Figure 1 shows a regulation from x_{3 }(Biomass) to flux v_{3}. Since stoichiometric models do not account for feedbacks, the effect of x_{3 }can not be integrated directly in the equations. Assuming that the forward feedback leads to an over expression of flux v_{3}, then a valid solution is to model the forward feedback as a variation of the constraints applied to flux v_{3}. Setting flux v_{2 }(precursor of Biomass formation) as the objective function, the FBA problem is solved with the previous equations to obtain a valid and unique flux distribution at each time step. In the context of the inneroptimization, these fluxes are then used to calculate the values of the input/outputs.
Pontryagin's Maximum Principle
A general tool to solve dynamic optimization problems such as the one considered here is Pontryagin's Maximum Principle PMP [13].
Let x be the state of a dynamical system with control inputs u such that:
where F : ℜ^{n }× ℜ → ℜ^{n}, U is the set of valid control inputs and T is the final time, assumed here to be constant.
The control function u must be chosen in order to maximize the functional J, defined by:
Where ψ is the cost associated with the terminal condition of the system and L the Lagrangian.
According to PMP, a necessary condition for the optimal control is that, along the optimal solution for the state x, costate λ and control u the Hamiltonian H is maximum with respect to u [13].
Comparing the cost (3) with the generalized case (7) and taking into consideration that, in the case at hand, given by (1), the dynamics vector field depends linearly on the control, it follows that
where ϕ(λ, x) is a function that does not explicitly depend on u. Since, according to (8), the Hamiltonian is linear in u, its maximum is obtained at the boundary of the admissible control set U.
Hence, this shows that, for the metabolic network (1), the control that optimizes (3) only assumes the values u = 0 or u = 1.
In the case at hand, we are interested in maximizing the final value of the state x_{5}. Since the Lagrangian (L) is zero, (7) becomes J(u) = ψ(x_{i}(T)). Thus, the functional J to be maximized is:
as shown before in (3).
Taking into account that, L = 0 the adjoint equations are reduced to
The network is described by the system of ordinary differential equations in (2), if we consider the state model in the form of f(x, u), where u is the control function, calculating f_{x}(x, u) is straightforward.
Thus
The terminal conditions for the costates λ are
Since L = 0 the Hamiltonian is given by λ^{T }f(x).
Substituting in the expression and after some manipulation, becomes:
that depends linearly on the control function u, as expected.
The derivative of the Hamiltonian in order to the control function is:
This expression is the one that determines the number of switches between 0 and 1 of the control variable.
Authors' contributions
AD helped in the research of the state of the art, implemented the software and drafted the manuscript. SV was involved in the creation and modeling of the prototype network, formulation of the optimization processes and helped to draft the manuscript. JML provided the mathematical basis for the optimization and control techniques and helped to draft the manuscript. All authors read and approved the final manuscript.
Acknowledgements
The work reported in this paper was performed within the project DynaMo  Dynamical modeling, control and optimization of metabolic networks, supported by FCT (Portugal) under contract PTDC/EEAACR/69530/2006.
References

Nielsen J: Metabolic engineering.
Appl Microbiol Biotechnol 2001, 55(3):26383. PubMed Abstract  Publisher Full Text

Chu WB, Constantinides A: Modeling, optimization, and computer control of the cephalosporin C fermentation process.
Biotechnol Bioeng 1988, 32(3):27788.
[Chu, W B Constantinides, A United States Biotechnology and bioengineering Biotechnol Bioeng. 1988 Jul 20;32(3):27788.]
PubMed Abstract  Publisher Full Text 
Rocha I, Maia P, Evangelista P, Vilaca P, Soares S, Pinto JP, Nielsen J, Patil KR, Ferreira EC, Rocha M: OptFlux: an opensource software platform for in silico metabolic engineering.
BMC Syst Biol 4:45. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Edwards JS, Covert M, Palsson B: Metabolic modelling of microbes: the fluxbalance approach.
Environ Microbiol 2002, 4(3):13340. PubMed Abstract  Publisher Full Text

Varma A, Palsson BO: Stoichiometric flux balance models quantitatively predict growth and metabolic byproduct secretion in wildtype Escherichia coli W3110.
Appl Environ Microbiol 1994, 60(10):372431. PubMed Abstract  PubMed Central Full Text

Schilling CH, Edwards JS, Letscher D, Palsson BO: Combining pathway analysis with flux balance analysis for the comprehensive study of metabolic systems.
Biotechnol Bioeng 2000, 71(4):286306. PubMed Abstract  Publisher Full Text

Llaneras F, Pico J: Stoichiometric modelling of cell metabolism.
J Biosci Bioeng 2008, 105:111.
[Llaneras, Francisco Pico, Jesus Research Support, NonU.S. Gov't Review Japan Journal of bioscience and bioengineering J Biosci Bioeng. 2008 Jan;105(1):111.]
PubMed Abstract  Publisher Full Text 
Burgard AP, Pharkya P, Maranas CD: Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization.
Biotechnol Bioeng 2003, 84(6):64757. PubMed Abstract  Publisher Full Text

Rocha M, Maia P, Mendes R, Pinto JP, Ferreira EC, Nielsen J, Patil KR, Rocha I: Natural computation metaheuristics for the in silico optimization of microbial strains.
BMC Bioinformatics 2008, 9:499. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Koh K, Kim S, Mutapic A, Boyd S: GGPLAB: A simple Matlab toolbox for Geometric Programming.
2006.

Boyd SP, Vandenberghe L: Convex Optimization. Cambridge University Press; 2004.

MarinSanguino A, Voit EO, GonzalezAlcon C, Torres NV: Optimization of biotechnological systems through geometric programming.
Theor Biol Med Model 2007, 4:38. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Lewis F, Syrmos V: Optimal Control. 2nd edition. John Wiley & Sons Inc. New York; 1995.

Kapil G, Gadkar RM III, F JD: Optimal genetic manipulations in batch bioreactor control.
Automatica 2006, 42(10):17231733. Publisher Full Text

Gaspar P, Neves AR, Ramos A, Gasson MJ, Shearman CA, Santos H: Engineering Lactococcus lactis for Production of Mannitol: High Yields from FoodGrade Strains Deficient in Lactate Dehydrogenase and the Mannitol Transport System.
Appl Environ Microbiol 70 PubMed Abstract  PubMed Central Full Text

Sorribas A, HernandezBermejo B, Vilaprinyo E, Alves R: Cooperativity and saturation in biochemical networks: a saturable formalism using Taylor series approximations.
Biotechnol Bioeng 2007, 97(5):125977. PubMed Abstract  Publisher Full Text

Schmidt H, Jirstrand M: Systems Biology Toolbox for MATLAB: a computational platform for research in systems biology.
Bioinformatics 2006, 22(4):514515. PubMed Abstract  Publisher Full Text