Abstract
Background
Understanding complex systems through decomposition into simple interacting components is a pervasive paradigm throughout modern science and engineering. For cellular metabolism, complexity can be reduced by decomposition into pathways with particular biochemical functions, and the concept of elementary flux modes provides a systematic way for organizing metabolic networks into such pathways. While decomposition using elementary flux modes has proven to be a powerful tool for understanding and manipulating cellular metabolism, its utility, however, is severely limited since the number of modes in a network increases exponentially with its size.
Results
Here, we present a new method for decomposition of metabolic flux distributions into elementary flux modes. Our method can easily operate on large, genomescale networks since it does not require all relevant modes of the metabolic network to be generated. We illustrate the utility of our method for metabolic engineering of Escherichia coli and for understanding the survival of Mycobacterium tuberculosis (MTB) during infection.
Conclusions
Our method can achieve computational time improvements exceeding 2000fold and requires only several seconds to generate elementary mode decompositions on genomescale networks. These improvements arise from not having to generate all relevant elementary modes prior to initiating the decomposition. The decompositions from our method are useful for understanding complex flux distributions and debugging genomescale models.
Background
Computational analysis of cellular metabolism has recently gained increasing prominence and importance. In particular, genomescale computational models capturing stoichiometric and thermodynamic constraints have been published for over 30 organisms [1] ranging from relatively simple prokaryotes such as E. coli, to complex eukaryotes such as Homo sapiens [2,3]. The application of such computational models is dependent on their accuracy and the tools developed for their analysis. The maintenance of model accuracy, or debugging, is an ongoing process by which model predictions are validated against experimental observations and discrepancies are identified and corrected. This process is clearly linked with the use and analysis of the models. These computational models can be analyzed in a number of ways. Fluxbalance analysis (FBA) and elementary mode analysis are perhaps two of the most popular and powerful.
FBA determines a distribution of steadystate reaction fluxes that satisfies the constraints of the model and that optimizes a biological objective function such as biomass or adenosine triphosphate (ATP) production [4]. With appropriate constraints and a biological objective, FBA has been shown to be an effective method for prediction of phenotypes associated with genetic manipulations such as knockouts [5,6] and of intracellular metabolic fluxes [7]. A significant application for FBA, therefore, is metabolic engineeringusing computational predictions of metabolic phenotype under genetic manipulations to guide the engineering of metabolically optimized strains [8]and computational methods have been devised to search the space of genetic manipulations in silico for those that yield the desired phenotype [913]. But, while FBA has had a number of successful applications, the method gives little insight into its predictions, hindering human understanding and model debugging.
Elementary mode analysis, on the other hand, yields no explicit predictions of metabolic behavior, and seeks primarily to allow understanding of an organism's metabolic capabilities. In elementary mode analysis, the elementary flux modes (EFMs) that define minimal sets of reactions capable of operating at steady state are generated [14]. Elementary flux modes formalize the concept of a biochemical pathway [15], and studying the modes associated with a given metabolic network has been shown to be effective for understanding its function, regulation, and robustness [16,17]. The principal drawback of elementary mode analysis is that the number of EFMs in a network suffers from a combinatorial explosion [18], and the use of complete sets of EFMs gives rise to problems with scalability when applied to genomewide models [19]. For example, more than two million modes exist for a simple model of E. coli central metabolism consisting of 112 reactions [20], which increases to more than 26 million modes when the possible substrates are expanded [21]. iAF1260 [6], the most recent genomescale metabolic network of E. coli, consists of 2077 reactions, and the number of reactions in E. coli metabolic models has increased steadily for the past two decades [22]. Thus, the computation time and memory storage required to enumerate all the EFMs of full and detailed genomescale metabolic networks are prohibitively large.
For many applications, however, our understanding of particular metabolic functions of an organism, such as its ability to produce a desired metabolite, is of greater interest than our understanding of its full metabolic capabilities. In this case, it is not necessary to know all the EFMs of the network, but simply those that combine to give rise to a particular behavior. Previous approaches to this problem have relied on first computing all modes of the network [23,24]. More recently, an approach presented by de Figueiredo et al. [25], found biologically significant EFMs by identifying the Kshortest EFMs of the metabolic network without necessarily enumerating the whole set. Our motivation is similar in that we are not concerned with generating all the EFMs of a network. However, our approach differs greatly in that we wish to determine those elementary modes that combine to yield a given flux distribution. This flux distribution can consist of both measured and computationally predicted fluxes.
We present an algorithm to find the elementary modes that combine to produce any previouslyspecified flux distribution. This links the advantages of elementary mode analysis with the advantages of flux balance analysis, without requiring the prohibitive computation of all elementary modes. Our method is therefore applicable to genomescale models and, because it can take as an input any flux distribution, it can be connected to particular experimental conditions [26] or genetic modifications [10].
To demonstrate the utility of our method and its applicability to genomescale models, we apply it to genomescale models of E. coli and MTB and show how its results can be used to interpret flux distributions related to the metabolic engineering of E. coli and to the survival of MTB during infection.
Results and Discussion
Overview
Our method takes a given steadystate metabolic flux distribution and the corresponding metabolic model, and produces a decomposition of the flux distribution into elementary flux modes. In this paper, we use flux distributions obtained by FBA, but our method can equally be applied to flux distributions obtained by alternative means, such as those derived from experimental measurement or obtained from genetic perturbation analysis methods such as MOMA [27] or ROOM [28]. As an elementary flux mode is itself a set of reactions operating at steady state, any flux distribution composed entirely of elementary flux modes must necessarily operate at steady state too. Therefore, an input flux distribution derived from experimental measurement may need to be balanced first to produce a steady state flux distribution, either by regression to fit the measured data or alternative means.
Our method operates by first selecting the reaction with nonzero flux of maximum magnitude from the given flux distribution. The algorithm then uses mixedinteger linear programming to find an elementary flux mode that both contains the selected reaction and is contained in the given distribution. The contribution of this elementary flux mode to the given distribution is determined before it is removed to give an updated flux distribution. The updated flux distribution is used as the input distribution for the next iteration of the algorithm, and this procedure is repeated until the updated flux distribution is zero (see Methods for details).
Elementary mode decompositions are not unique. Our goal is to assist in biological interpretation, and we are primarily interested in obtaining a valid decomposition rather than any specific decomposition. A valid decomposition will arise irrespective of the choice made at each iteration of the reaction with nonzero flux, but how this choice is made determines the specific decomposition. Our choice has some desirable properties. As the elementary flux mode found by the mixedinteger linear program includes the chosen reaction, the flux through this reaction will upper bound the contribution of the elementary flux mode. By selecting the reaction with nonzero flux of maximum magnitude, we avoid the possibility of generating many elementary flux modes with very small weightings and hence minimal contribution to the flux distribution. This choice also minimizes numerical instabilities arising from the calculations.
Although mixedinteger linear programming is NPhard in general, some large mixedinteger linear programs (MILPs) can be solved with a modest amount of computation by solvers such as SCIP [29], IBM ILOG CPLEX (International Business Machines Corp., Armonk, New York), and Gurobi (Gurobi Optimization, Houston, Texas). In particular, our algorithm, implemented using Gurobi, successfully terminates in at most several seconds in all the genomescale applications mentioned in this paper.
By contrast, previous approaches to decomposing flux distributions into elementary modes [23,24] have relied on first generating the complete set of relevant elementary modes, then calculating a weight distribution among these modes. Since elementary mode decompositions are not unique [23], the principal advantage of these previous approaches in comparison to ours is that they are capable of selecting a particular unique decomposition based on some criterion (typically the minimization of the Euclidean norm of the weight vector), while our approach simply generates a valid decomposition among all possibilities. It is not clear, however, that criteria for selecting the weight distribution, such as the minimization of the Euclidean norm, are biologically meaningful. As we will see with the examples in this paper, simply having a valid decomposition is in itself useful.
The principal drawback of these previous approaches is that generating the complete set of relevant elementary modes can be prohibitive, particularly for genomescale models. To demonstrate this fact, we used efmtool [21] to efficiently generate the relevant elementary modes for the genomescale model of MTB by Jamshidi and Palsson [30], iNJ661, for growth on Middlebrook 7H9a standard growth medium for MTB. With the elementary modes generated by efmtool, we then applied the quadratic programming approach proposed by Schwartz and Kanehisa [23] to obtain a distribution of weights to assign to these modes. The total computational time of this approach was 34 minutes and 27 seconds.
By contrast, our approach generates a valid elementary mode decomposition consisting of 19 modes in 1.0 secondsa computational time improvement exceeding 2000fold. iNJ661 consists of 1,028 reactions, which is modest for current genomescale models [1]. With the ever increasing size and complexity of genomescale models [22] and the exponential manner in which the number of elementary modes increases with the size of the network, it follows that in some cases, the advantages offered by our approach may not simply be a severalthousand fold improvement in computational time, but rather the difference between practical feasibility and infeasibility.
Indeed, iNJ661v [31], a recent model that modifies iNJ661 with the aim of more accurately modeling MTB in in vivo infection, is only slightly larger than iNJ661, with 1,049 reactions. However, a more complex growth medium resulted in a significantly larger number of relevant elementary modes for the flux distribution obtained by FBA than that for iNJ661, and we were unable to generate them all using efmtool because of memory limitations. Using our decomposition method, we generated a valid elementary mode decomposition consisting of 27 modes in 2.5 seconds.
Metabolic engineering of E. coli
To demonstrate the utility of our approach for metabolic engineering, we consider the metabolic engineering of E. coli for increased acetate productiona problem that has received attention owing to the value of acetic acid for its industrial and food uses [32]. Knockout strategies for increased production of acetate based on FBA have previously been reported by Lun et al. [10] using the genomescale metabolic reconstruction of E. coli by Feist et al. [6], iAF1260. These knockout strategies were generated using GDLS (Genetic Design through Local Search) [10], an efficient heuristic for generating metabolic engineering strategies involving multiple knockouts from genomescale models, extending the capabilities of the computationallyexpensive optimal search proposed by OptKnock [9]. The strategies were chosen using FBA to have high predicted production of acetate whilst maintaining required energy production and growth to ensure viability. Specifically, a predicted nongrowthassociated ATP maintenance (NGAM) flux of at least 8.39 h^{1 }and a biomass flux of at least 0.05 h^{1 }were required.
The proposed strategies make sense biologically [10] and include experimentallytested knockouts for acetate production such as alcohol dehydrogenase and ATP synthase [32]. However, the cause for the increased acetate production is not immediately clear from the metabolic phenotype predicted by FBA alone. Table 1 shows the number of reactions that have been added to or removed from the flux distributions of the geneknockout mutants when compared with the wildtype flux distribution. As we can see, the changes in the flux distributions are quite extensive and, in particular, they do not result simply from shunts around the blockages created by the gene deletions. The inherent structure of the network, and hence the mechanisms by which predicted metabolic behavior arises, is difficult to ascertain from the flux distribution as a whole. We, therefore, applied our decomposition algorithm to determine the elementary flux modes that make up the flux distributions under these knockout strategies. We took the best knockout strategies reported by Lun et al. [10] with numbers of knockouts ranging from 1 to 8. For the purposes of these strategies, a single knockout is considered to consist of all genes capable of catalyzing a reaction, i.e. the enzyme complex or all complexes with the same metabolic function.
Table 1. Number of changes to reactions used in each E. coli knockout mutant compared with the wildtype (WT)
Table 2 shows the elementary flux modes we obtained. The flux distribution for each knockout mutant can be decomposed into two elementary modes, which together supply the energy and growth requirements of the organism. We emphasize that elementary mode decomposition will identify the structural components that are important to the metabolic phenotype. The focus on the NGAM and biomass components reflects their biological importance to the underlying biology of the model. The first mode only contributes to the required NGAM flux, and uses relatively few reactions. In comparison, the second mode is solely responsible for producing biomass and involves many more active reactions. This corresponds with the large number of metabolites required for biomass production. Furthermore, as noted by Stelling et al. [16], the unmodified E. coli metabolic network displays a degree of robustness as evidenced by the varied pathways by which biomass and NGAM are produced. The knockouts force flux onto alternative pathways for producing these necessary metabolic components, and these pathways produce acetate as a sideproduct. By finding and examining the elementary modes associated with the acetateenhancing knockout strategies, we uncover the mechanism responsible for the increased acetate flux, which was not apparent from the FBA analysis.
Table 2. Elementary modes for acetateproducing E. coli knockout strategies^{a}
Most interestingly, when the modes are examined in terms of acetate production, we find that the most efficient modes are generally those that only contribute to NGAM flux. However, a biomassproducing mode is necessary to satisfy growth and viability requirements. Thus, the problem of selecting knockouts to maximize acetate production, given a limiting resource such as glucose, is not necessarily about finding a single optimal elementary mode. Rather, competing constraints demand that the chosen modes need to complement each other well. This can be seen in the reported decompositions for the various knockout mutants.
For example, of all the biomassproducing modes, the mode arising from the fiveknockout mutant is the most efficient at 1.496 mmol of acetate per mmol of glucose. When more knockouts are allowed, the overall acetate production is improved despite a decrease in the acetate production efficiency of the biomassproducing mode. This decrease is offset by a shift towards using NGAMproducing modes with significantly more efficient production of acetate. For the six and eight knockout cases, the NGAMproducing mode generates 1.690 mmol of acetate per mmol of glucose.
Finally, as FBA does not necessarily yield a unique distribution, we implemented the recursive MILP algorithm from Lee et al. [33] to find alternate optima and then obtained decompositions for the corresponding flux distributions. Our results (not shown) indicate that the decomposition into two modes with distinct functions related to NGAM and biomass production is preserved for alternate optima. Thus, the decomposition of flux distributions into primarily NGAMproducing and biomassproducing modes is a robust quality.
Understanding the survival of MTB during infection
To illustrate another application of our approach, we consider the utilization of the glyoxylate shunt in MTB. The glyoxylate shunt enzyme isocitrate lyase (ICL), present in MTB as two isoforms, is believed to be required by microorganisms to utilize fatty acids as a source of carbon and energy. This shunt has previously been shown to be required for the in vivo growth and virulence of MTB [34]. Since MTB is believed to subsist on fatty acids during infection [34,35], it is argued that, by removing ICL, MTB is no longer able to utilize fatty acids for carbon and energy and therefore unable to grow in vivo. Indeed, strains of MTB absent in both isoforms of ICL are unable to grow on fatty acid substrates and unable to survive in macrophages and mice [34]. Therefore, ICL has attracted significant attention as a promising drug target for treatment of MTB infection [36].
It is possible, however, that given the robustness that is generally observed in metabolic networks [16], such a vital function would not simply rest on a single enzymeeven one present as two isoforms. Using our method, we can study the metabolic capabilities of MTB growing on fatty acid substrates. In particular, we determined the elementary modes used by MTB at differing uptake rates of octadecenoate using the genomescale metabolic reconstruction of MTB by Beste et al. [37] (see Figure 1). We found that there exist modes that generate biomass and/or NGAM both using and not using ICL. At a given NGAM flux, the modes that use ICL generally produce biomass more efficiently than those that do not. However, for a given uptake of octadecanoate, the modes that produce the most biomass are those that do not use ICL. Therefore, the modes available to maximize the biomass production while maintaining a given NGAM requirement will depend on the supply flux of octadecenoate. When the supply is sufficiently high, the NGAM requirement is easily met by the high efficiency biomass producing modes that do not use ICL, but when it is lower, use of ICL allows the NGAM requirement to be met more efficiently and, hence, allows more biomass to be produced.
Figure 1. Elementary modes used by MTB growing on octadecenoate as the sole carbon source with and without ICL. Five unique elementary modes are identified overall by applying our decomposition method to three representative flux distributions, and these elementary modes are used to characterize optimal metabolic behavior for octadecenoate uptake varying from 0 to 0.08 mmol g^{1 }h^{1}. The weights of these five elementary modes (colored blue, green, red, cyan, and magenta) as octadecenoate uptake varies are shown stacked (a) with and (b) without ICL present. The modes colored blue and green use ICL, while the remainder do not. Each mode is normalized so that the octadecenoate uptake of the mode is 1 mmol g^{1 }h^{1}. (c) The biomass and NGAM generated by each mode with 1 mmol of octadecenoate. The region enclosed by the solid line is the space of achievable pairs of biomass and NGAM when ICL is present, while the region enclosed by the dotted line is that when ICL is not present. When octadecenoate is plentiful, the cyan and magenta modes can be used to meet the NGAM requirement and to produce biomass; when octadecenoate is more limited, the remaining modes are needed to meet the NGAM requirement and, with ICL present, this can be achieved more efficiently. (d) FBApredicted growth rate under varying octadecenoate uptake with ICL present (solid line) and without it (dotted line).
Under the assumption that the metabolic reconstruction by Beste et al. is correct, our analysis implies that MTB is capable of producing both carbon and energy from fatty acids without the use of ICL but, for lower uptake rates of fatty acids, ICL allows for more efficient utilization of the fatty acids. Indeed, we found that the optimal growth rate predicted by FBA differs only slightly with ICL present and without it (see Figure 1d). This presents the intriguing possibility that MTB possesses the metabolic capability to grow on fatty acids without ICL, but does not do so after the knockout of ICL because it has not yet undergone the adaptive evolution necessary to make use of this metabolic capability. This possibility is consistent with the work of Fong and Palsson [38] showing that the growth rate of gene deletion strains of E. coli can change significantly after undergoing adaptive evolution. The possible existence of such inactive routes for metabolizing fatty acids without the use of ICL in MTB has been discussed elsewhere [39] and, if true, would imply that MTB could rapidly evolve resistance to drugs inhibiting ICL.
Closer examination of the elementary modes reveals how MTB grows in silico without ICL and provides a testable means of confirming or rejecting the model's predictions. All the elementary modes that do not use ICL use malate synthase, the second of the two enzymes that form the glyoxylate shunt. In all of these modes, the full flux through malate synthase comes from HtrA, a gene predicted to code for 4hydroxy2oxoglutarate aldolase to complete the hydroxyproline degradation pathway in MTB using the Bayesian method of Green and Karp [40]. HtrA supplies the glyoxylate that is used as a substrate by malate synthase. MTB and other mycobacteria have been observed to grow using hydroxyproline as a carbon source [41,42], suggesting that this pathway may indeed exist. Further studies confirming and characterizing this pathway will shed light on whether it does in fact provide MTB with a viable means of producing glyoxylate.
The elementary mode decomposition analysis we have presented sheds light on the mechanics of the FBA prediction that is difficult to obtain from the FBA results alone. Specifically, HtrA in combination with malate synthase can be used in GSMNTB to generate biomass and NGAM. However, the space of possible biomass and NGAM production rates that can be achieved in this way is smaller than that which is possible using the glyoxylate shunt. At low octadecenoate uptake rates, this difference is important, leading to a lower biomass production to meet the NGAM requirement. Furthermore, it also demonstrates the utility of our method in identifying a potential source of the discrepancy between in silico predictions and observed experimental results. If growth is not experimentally possible, even after adaptive evolution, then it suggests that the model is incorrect and the likely error in the model comes from the inclusion of the predicted gene, HtrA.
Conclusions
We have presented a method for decomposing a given flux distribution into a set of constituent elementary modes. In contrast to previous approaches, our method does not require the initial generation of a full set of elementary modes, which is often computationally demanding and, in some cases, computationally infeasible for practical purposes. In a moderatelysized instance, we observed a computational time improvement of over 2000fold using our method.
Overall, we see that elementary mode analysis offers a great deal that fluxbalance analysis alone does not. FBA yields predictions of overall metabolic behavior, while elementary mode analysis allows understanding of metabolic capabilities. By decomposing flux distributions obtained by FBA into elementary modes, we can gain insight into how metabolic networks achieve their predictions. We exploit modularity to decompose a complex entity into a simpler entity, which enables debugging and understanding and, ultimately, more sophisticated design and engineering.
Methods
Genomescale FBA modeling
We work with the genomescale model of E. coli, iAF1260. This model consists of three parts. First, from m metabolites and n reactions, we form an m × n stoichiometric matrix S, whose ijth element S_{ij }is the stoichiometric coefficient of metabolite i in reaction j. Second, the flux distribution v, whose jth element v_{j }is the flux though reaction j, is constrained by a lower bound vector a, whose jth element a_{j }is the lower bound of the flux through reaction j, and an upper bound vector b, whose jth element b_{j }> 0 is the upper bound of the flux through reaction j. Finally, the linear objective is formed by multiplying the fluxes by an objective vector f, whose jth element f_{j }is the weight of reaction j in the biological objective. Thus, a biologically optimal flux distribution is obtained by solving
Elementary mode decomposition
For a given flux vector ν, we use R(ν) = {i:ν_{i }≠ 0}to denote the set of indices of the reactions participating in ν. We define an elementary flux mode using the following two definitions [4].
Definition
A flux mode, or mode, in a metabolic network specified by a stoichiometric matrix S and lower and upper bound constraints a and b is a nonzero n × 1 vector p satisfying the following two conditions:
1. it is a valid steadystate flux distribution, i.e. Sp = 0;
2. irreversible reactions flow in the right direction, i.e. for all j such that a_{j }≥ 0, we have p_{j }≥ 0.
Definition
We say a flux mode is elementary if it is minimal among all flux modes, i.e. there does not exist any other flux mode such that R(p') ⊂ R(p).
Before stating the algorithm, we require one further definition.
Definition
We say a flux mode p conforms to a flux distribution v if ν_{j }> 0 for all reactions j with p_{j }> 0 and if ν_{j }< 0 for all reactions j with p_{j }< 0.
Our interest is in finding elementary modes that conform to a given flux distribution v since it ensures that v is decomposed into elementary modes that have reactions flowing in the same direction as v, for the purposes of biological interpretation.
Our algorithm takes as input a flux distribution v in the feasible set of optimization problem (1) and outputs an elementary mode decomposition for v, i.e. a set of elementary flux modes {p^{(k)}} that conform to v and a corresponding set of positive weights {w_{k}} such that . Suppose we have a flux distribution v that satisfies Sv = 0. Let k: = 1, and v^{(k): }= v. Choose some j_{k }such that , and then solve the following mixedinteger linear program (MILP):
where M is a large constant and sgn is the sign function, taking the value 1 if its argument is positive and 1 if its argument is negative. This MILP is similar to that used by de Figueiredo et al. [25] for finding the shortest elementary modes in a metabolic network. Our purpose, however, differs significantly: we seek to decompose a given flux distribution into constituent elementary modes. The specific choice of j_{k }is unimportant: all choices such that will generate a valid decomposition, though the specific decomposition achieved will likely vary with the choice. As discussed in the overview, we choose j_{k }= argmax_{j}v_{j}.
Let (p*,q*) be an optimal solution. We then set p^{(k)}: = p* and . Finally, we set v^{(k + 1)}: = v^{(k) } w_{k}p^{(k)}. If v^{(k + 1) }is the zero vector, then we are done. Otherwise, we choose j_{k + 1 }such that , increment k by one, and repeat the above procedure.
The following proposition establishes that this algorithm generates the desired output. In brief, the algorithm works because, at each iteration, the MILP finds a flux mode where p_{jk }is nonzero and where the number of nonzero elements in the flux mode is minimized. Because the number of nonzero elements in the flux mode is minimized, the flux mode is minimal and, hence, elementary. Since each reaction with nonzero flux must participate in at least one elementary mode in the decomposition, it does not matter how j_{k }is chosen, as long as . A valid decomposition will be generated regardless of how j_{k }is chosen, though the particular decomposition that is generated among the nonunique possibilities will depend on this choice.
Proposition
The elementary mode decomposition algorithm stated above terminates after a finite number of iterations K and generates a set of elementary flux modes {p^{(k)}} that conform to v and a corresponding set of positive weights {w_{k}} such that .
Proof
First, to establish that each p^{(k) }is in fact a mode, we observe that any p^{(k) }generated as a solution of problem (2) will meet the steady state condition of the system. Problem (2) has a solution since and q such that q_{j }= 1 if p_{j }≠ 0 and q_{j }= 0 otherwise is in the feasible set of the problem. Further, by constraining the components of p^{(k) }to have the same sign as the corresponding elements of v, we ensure that irreversible reactions flow in the right direction since, for any j, if a_{j }≥ 0 then v_{j }≥ 0, which sets the constraint p_{j }≥ 0 in problem (2).
Second, from the constraints of problem (2), we can see that p^{(k) }conforms to v.
Lastly, we establish that each p^{(k) }is minimal among all flux modes conforming to v and, therefore, elementary in the set of all such modes. We first observe that the optimal cost of problem (2) at iteration k is R (p^{(k)}). Suppose there exists a mode p' that conforms to v with R(p') ⊂ R(p^{(k)}). If , then we assume without loss of generality that , and (p',q'), where q'_{j }= 1 if p'_{j }≠ 0 and q'_{j }= 0 otherwise, is in the feasible set of problem (2) and , in contradiction with p^{(k) }being an optimal solution. If , then let p" = p^{(k) } wp', where . Now p" is a mode that conforms to v with R(p') ⊂ R(p^{(k)}) and (p", q"), where q"j = 1 if p"j ≠ 0 and q"j = 0 otherwise, is in the feasible set of problem (2) and, resulting in the same contradiction. Hence there does not exist a mode p' that conforms to v with R(p') ⊂ R(p^{(k)}), and we conclude that p^{(k) }is elementary.
It is straightforward to see that w_{k }> 0 owing to its definition and that, after each iteration of the algorithm, R(v^{(k)}) will decrease by at least 1, i.e. R(v^{(k + 1)}) < R(v^{(k)}). Thus the algorithm will terminate after a finite number of iterations K ≤ R(v). □
Characterization of optimal metabolic behavior using given elementary modes
When calculating the elementary mode decompositions for a range of related flux distributions, as in our MTB application, it is helpful to use only a subset of all elementary modes obtained, since it likely that a subset of the modes suffices to generate valid decompositions for all the distributions. To do so, we select a subset of K modes {p^{(1)}, ..., p^{(k)}} out of all those obtained and use the following approach to determine if they suffice to support all the flux distributions. We successively remove modes from the subset to arrive at one that is minimal in the sense that no additional modes can be removed and still support all the flux distributions.
We represent each elementary flux mode as a column vector in a matrix P =[p^{(1) }... p^{(K)}] and define a nonnegative weight vector w = [w_{1}, ..., w_{K}] such that a flux distribution v = Pw. Substitution of v = Pw into (1) gives a means of finding a biologicallyoptimal weight vector over the given set of elementary flux modes. Specifically, we solve
If the biomass derived from solving (3) corresponds with that from (1), we conclude that the given elementary modes are sufficient to characterize the flux distribution of interest, and the given modes are utilized according to the weights w* obtained from the optimal solution of (3).
Implementation of FBA and elementary mode decomposition
FBA and our elementary mode decomposition method were implemented using MATLAB R2010b and Gurobi 4.0. This implementation is available in additional file 1.
Additional file 1. Source code for elementary mode decomposition method implemented using MATLAB and Gurobi
Format: ZIP Size: 77KB Download file
Comparison to previous decomposition methods
We used Gurobi 4.0 to solve optimization problem (1) to find a biologically optimal flux distribution for iNJ661 growing on Middlebrook 7H9. The resulting distribution v contained 507 nonzero components. The reactions j for which v_{j }= 0 were removed from the metabolic network, generating a reduced S that was used as input to efmtool. efmtool version 4.7.1 was used with default parameters in MATLAB R2010b to generate the elementary modes for the reduced metabolic network, resulting in a 507 × 131,558 matrix P containing all the elementary modes. Finally, the quadratic program
as proposed by Schwartz and Kanehisa [23] was solved using MOSEK 6.0.0.91 (MOSEK ApS, Copenhagen, Denmark).
efmtool generated the 131,558 elementary modes for the network in 1 minutes 48 seconds, while the quadratic optimization step took 32 minutes and 39 seconds, resulting in a total computational time of 34 minutes and 27 seconds. Computations were carried out on the Mac OS X 10.6.4 platform using a computer with an Intel Core 2 Duo 2.53 GHz processor with 4 GB of RAM.
For iNJ661v, the biologically optimal flux distribution v obtained by solving optimization problem (1) contained 505 nonzero components. Again, a reduced S was generated by removing the reactions j for which v_{j }= 0, and the result used as input to efmtool. With a maximum heap size of 4 GB, efmtool failed before generating all elementary modes, with 657,447 modes generated.
Authors' contributions
DSL and CC conceived the project. DSL and KI designed and implemented the method and performed the computational experiments. DSL, CC, and KI wrote the paper. All authors have read and approved the final manuscript.
Acknowledgements
KI gratefully acknowledges support from the Australian Government through an Australian Postgraduate Award. The authors would like to thank Belinda Chiera and Melissa Yates for careful proofreading of the manuscript.
References

Milne CB, Kim PJ, Eddy JA, Price ND: Accomplishments in genomescale in silico modeling for industrial and medical biotechnology.
Biotechnol J 2009, 4:16531670. PubMed Abstract  Publisher Full Text

Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BØ: Global reconstruction of the human metabolic network based on genomic and bibliomic data.
Proceedings of the National Academy of Sciences 2007, 104:17771782. Publisher Full Text

Ma H, Sorokin A, Mazein A, Selkov A, Selkov E, Demin O, Goryanin I: The Edinburgh human metabolic network reconstruction and its functional analysis.

Kauffman KJ, Prakash P, Edwards JS: Advances in flux balance analysis.
Curr Opin Biotechnol 2003, 14:491496. PubMed Abstract  Publisher Full Text

Edwards JS, Ibarra RU, Palsson BØ: In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data.
Nat Biotechnol 2001, 19:125130. PubMed Abstract  Publisher Full Text

Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ: A genomescale metabolic reconstruction for Escherichia coli K12 MG1655 that accounts for 1260 ORFs and thermodynamic information.
Mol Syst Biol 2007, 3:121. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Schuetz R, Kuepfer L, Sauer U: Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli.
Mol Syst Biol 2007, 3:119. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Alper H, Miyaoku K, Stephanopoulos G: Construction of lycopeneoverproducing E. coli strains by combining systematic and combinatorial gene knockout targets.
Nat Biotechnol 2005, 23:612616. PubMed Abstract  Publisher Full Text

Burgard AP, Pharkya P, Maranas CD: OptKnock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization.
Biotechnol Bioeng 2003, 84:647657. PubMed Abstract  Publisher Full Text

Lun DS, Rockwell G, Guido NJ, Baym M, Kelner JA, Berger B, Galagan JE, Church GM: Largescale identification of genetic design strategies using local search.
Mol Syst Biol 2009, 5:296. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pharkya P, Burgard AP, Maranas CD: OptStrain: A computational framework for redesign of microbial production systems.
Genome Res 2004, 14:23672376. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pharkya P, Maranas CD: An optimization framework for identifying reaction activation/inhibition or elimination candidates for overproduction in microbial systems.
Metab Eng 2006, 8:113. PubMed Abstract  Publisher Full Text

Rocha M, Maia P, Mendes R, Pinto JP, Ferreira EC, Nielsen J, Patil KR, Rocha I: Natural computation metaheuristics for the in silico optimization of microbial strains.
BMC Bioinformatics 2008, 9:499. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Schuster S, Hilgetag C: On elementary flux modes in biochemical reaction systems at steady state.
J Biol Syst 1994, 2:165182. Publisher Full Text

Schuster S, Fell DA, Dandekar T: A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks.
Nat Biotechnol 2000, 18:326332. PubMed Abstract  Publisher Full Text

Stelling J, Klamt S, Bettenbrock K, Schuster S, Gilles ED: Metabolic network structure determines key aspects of functionality and regulation.
Nature 2002, 420:190193. PubMed Abstract  Publisher Full Text

Trinh CT, Wlaschin A, Srienc F: Elementary mode analysis: a useful metabolic pathway analysis tool for characterizing cellular metabolism.
Appl Microbiol Biotechnol 2009, 81:813826. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Klamt S, Stelling J: Combinatorial Complexity of Pathway Analysis in Metabolic Networks.
Mol Biol Rep 2002, 29:233236. PubMed Abstract  Publisher Full Text

Acuña V, Chierichetti F, Lacroix V, MarchettiSpaccamela A, Sagot MF, Stougie L: Modes and cuts in metabolic networks: Complexity and algorithms.
Biosystems 2009, 95:5160. PubMed Abstract  Publisher Full Text

Gagneur J, Klamt S: Computation of elementary modes: a unifying framework and the new binary approach.
BMC Bioinformatics 2004, 5:175. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Terzer M, Stelling J: Largescale computation of elementary flux modes with bit pattern trees.
Bioinformatics 2008, 24:22292235. PubMed Abstract  Publisher Full Text

Feist AM, Palsson BØ: The growing scope of applications of genomescale metabolic reconstructions using Escherichia coli.
Nat Biotechnol 2008, 26:659667. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Schwartz JM, Kanehisa M: A quadratic programming approach for decomposing steadystate metabolic flux distributions onto elementary modes.
Bioinformatics 2005, 21:ii204205. PubMed Abstract  Publisher Full Text

Poolman MG, Venkatesh KV, Pidcock MK, Fell DA: A method for the determination of flux in elementary modes, and its application to Lactobacillus rhamnosus.
Biotechnol Bioeng 2004, 88:601612. PubMed Abstract  Publisher Full Text

de Figueiredo LF, Podhorski A, Rubio A, Kaleta C, Beasley JE, Schuster S, Planes FJ: Computing the shortest elementary flux modes in genomescale metabolic networks.
Bioinformatics 2009, 25:31583165. PubMed Abstract  Publisher Full Text

Colijn C, Brandes A, Zucker J, Lun DS, Weiner B, Farhat MR, Cheng TY, Moody DB, Murray M, Galagan JE: Interpreting expression data with metabolic flux models: Predicting Mycobacterium tuberculosis mycolic acid production.
PLoS Comp Biol 2009, 5:e1000489. Publisher Full Text

Segrè D, Vitkup D, Church GM: Analysis of optimality in natural and perturbed metabolic networks.
Proc Natl Acad Sci USA 2002, 99:1511215117. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Shlomi T, Berkman O, Ruppin E: Regulatory on/off minimization of metabolic flux changes after genetic perturbations.
Proc Natl Acad Sci USA 2005, 102:76957700. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Achterberg T: Constraint Integer Programming. Technische Universität Berlin; 2007.

Jamshidi N, Palsson BO: Investigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets.
BMC Syst Biol 2007, 1:26. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Fang X, Wallqvist A, Reifman J: Development and analysis of an in vivocompatible metabolic network of Mycobacterium tuberculosis.
BMC Syst Biol 2010, 4:160. PubMed Abstract  BioMed Central Full Text

Causey TB, Zhou S, Shanmugam KT, Ingram LO: Engineering the metabolism of Escherichia coli W3110 for the conversion of sugar to redoxneutral and oxidized products: Homoacetate production.
Proc Natl Acad Sci USA 2003, 100:825832. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lee S, Phalakornkule C, Domach MM, Grossmann IE: Recursive MILP model for finding all the alternate optima in LP models for metabolic networks.
Computers & Chemical Engineering 2000, 24:711716. PubMed Abstract

MunozElias EJ, McKinney JD: Mycobacterium tuberculosis isocitrate lyases 1 and 2 are jointly required for in vivo growth and virulence.
Nat Med 2005, 11:638644. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Schnappinger D, Ehrt S, Voskuil MI, Liu Y, Mangan JA, Monahan IM, Dolganov G, Efron B, Butcher PD, Nathan C, Schoolnik GK: Transcriptional Adaptation of Mycobacterium tuberculosis within Macrophages: Insights into the Phagosomal Environment.
J Exp Med 2003, 198:693704. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sharma V, Sharma S, Hoener zu Bentrup K, McKinney JD, Russell DG, Jacobs WR Jr, Sacchettini JC: Structure of isocitrate lyase, a persistence factor of Mycobacterium tuberculosis.
Nat Struct Biol 2000, 7:663668. PubMed Abstract  Publisher Full Text

Beste DJ, Hooper T, Stewart G, Bonde B, AvignoneRossa C, Bushell ME, Wheeler P, Klamt S, Kierzek AM, McFadden J: GSMNTB: a webbased genomescale network model of Mycobacterium tuberculosis metabolism.
Genome Biol 2007, 8:R89. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Fong SS, Palsson BØ: Metabolic genedeletion strains of Escherichia coli evolve to computationally predicted growth phenotypes.
Nat Genet 2004, 36:10561058. PubMed Abstract  Publisher Full Text

Purohit HJ, Cheema S, Lal S, Raut CP, Kalia VC: In Search of Drug Targets for Mycobacterium tuberculosis.
Infect Disord Drug Targets 2007, 7:245250. PubMed Abstract  Publisher Full Text

Green ML, Karp PD: A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases.
BMC Bioinformatics 2004, 5:76. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Indira M, Sirsi M: Nutritional studies on Mycobacterium tuberculosiseffect of amino acids on the 'in vitro' growth of Mycobacterium tuberculosis.

Krulwich TA, Pelliccione NJ: Catabolic pathways of coryneforms, nocardias, and mycobacteria.
Annu Rev Microbiol 1979, 33:95111. PubMed Abstract  Publisher Full Text