Genome sequencing and bioinformatics are producing detailed lists of the molecular components contained in many prokaryotic organisms. From this 'parts catalogue' of a microbial cell, in silico representations of integrated metabolic functions can be constructed and analyzed using flux balance analysis (FBA). FBA is particularly well-suited to study metabolic networks based on genomic, biochemical, and strain specific information.
Herein, we have utilized FBA to interpret and analyze the metabolic capabilities of Escherichia coli. We have computationally mapped the metabolic capabilities of E. coli using FBA and examined the optimal utilization of the E. coli metabolic pathways as a function of environmental variables. We have used an in silico analysis to identify seven gene products of central metabolism (glycolysis, pentose phosphate pathway, TCA cycle, electron transport system) essential for aerobic growth of E. coli on glucose minimal media, and 15 gene products essential for anaerobic growth on glucose minimal media. The in silico tpi-, zwf, and pta- mutant strains were examined in more detail by mapping the capabilities of these in silico isogenic strains.
We found that computational models of E. coli metabolism based on physicochemical constraints can be used to interpret mutant behavior. These in silica results lead to a further understanding of the complex genotype-phenotype relation.
Supplementary information: http://gcrg.ucsd.edu/supplementary_data/DeletionAnalysis/main.htm webcite
The knowledge of a complete genome sequence holds the potential to reveal the 'blueprints' for cellular life. The genome sequence contains the information to propagate the living system, and this information exists as open reading frames (ORFs) and regulatory information. Computational approaches have been developed (and are continuously being improved) to decipher the information encoded in the DNA [1,2,3,4,5,6,7]. However, it is becoming evident that cellular functions are intricate and the integrated function of biological systems involves many complex interactions among the molecular components within the cell. To understand the complexity inherent in cellular networks, approaches that focus on the systemic properties of the network are also required.
The complexity of integrated cellular systems leads to an important point, namely that the properties of complex biological processes cannot be analyzed or predicted based solely on a description of the individual components, and integrated systems based approaches must be applied . The focus of such research represents a departure from the classical reductionist approach in the biological sciences, and moves toward the integrated approach to understanding the interrelatedness of gene function and the role of each gene in the context of multi genetic cellular functions or genetic circuits [8,9,10].
The engineering approach to analysis and design of complex systems is to have a mathematical or computer model; e.g. a dynamic simulator of a cellular process that is based on fundamental physicochemical laws and principles. Herein, we will analyze the integrated function of the metabolic pathways, and there has been a long history of mathematical modeling of metabolic networks in cellular systems, which dates back to the 1960s [11, 12]. While the ultimate goal is the development of dynamic models for the complete simulation of cellular metabolism, the success of such approaches has been severely hampered by the lack of kinetic information on the dynamics and regulation of metabolism. However, in the absence of kinetic information it is still possible to assess the theoretical capabilities and operative modes of metabolism using flux balance analysis (FBA) [10, 13,14,15,16,17].
We have developed an in silico representation of Escherichia coli (E. coli in silico) to describe the bacterium's metabolic capabilities . E. coli in silico was derived based on the annotated genetic sequence , biochemical literature , and the online bioinformatic databases [21,22,23]. The properties of E. coli in silico were analyzed and compared to the in vivo properties of E. coli, and it was shown that E. coli in silico can be used to interpret the metabolic phenotype of many E. coli mutants . However, the utilization of the metabolic genes is dependent on the carbon source and the substrate availability [24, 25]. Thus, the mutant phenotype is also dependent on specific environmental parameters. Therefore, herein we have utilized E. coli in silico to computationally examine the condition dependent optimal metabolic pathway utilization, and we will show that the FBA can be used to analyze and interpret the metabolic behavior of wildtype and mutant E. coli strains.
Materials and Methods
Flux balance analysis
All biological processes are subjected to physico chemical constraints (such as mass balance, osmotic pressure, electro neutrality, thermodynamic, and other constraints). As a result of decades of metabolic research and the recent genome sequencing projects, the mass balance constraints on cellular metabolism can be assigned on a genome scale for a number of organisms. Methods have been developed to analyze the metabolic capabilities of a cellular system based on the mass balance constraints and this approach is known as flux balance analysis (FBA) [13, 14, 16] (see the supplementary information for an FBA primer). The mass balance constraints in a metabolic network can be represented mathematically by a matrix equation:
S • v = 0 Equation 1
The matrix S is the mxn stoichiometric matrix, where m is the number of metabolites and n is the number of reactions in the network (The E. coli stoichiometric matrix is available in matrix format in the supplementary information and in a reaction list in Appendices 1-3). The vector v represents all fluxes in the metabolic network, including the internal fluxes, transport fluxes and the growth flux.
For the E. coli metabolic network represented by Eqn. 1, the number of fluxes was greater than the number of mass balance constraints; thus, there were multiple feasible flux distributions that satisfied the mass balance constraints, and the solutions (or feasible metabolic flux distributions) were confined to the nullspace of the matrix S.
In addition to the mass balance constraints, we imposed constraints on the magnitude of individual metabolic fluxes.
αi ≤ vi ≤ βi Equation 2
The linear inequality constraints were used to enforce the reversibility of each metabolic reaction and the maximal flux in the transport reactions. The reversibility constraints for each reaction are indicated online. The transport flux for inorganic phosphate, ammonia, carbon dioxide, sulfate, potassium, and sodium was unrestrained (αi = -∞ and βi = ∞). The transport flux for the other metabolites, when available in the in silico medium, was constrained between zero and the maximal level (0 ≤ vi ≤ vimax). The vimax values used in the simulations are noted for each simulation (Fig. 1). When a metabolite was not available in the medium, the transport flux was constrained to zero. The transport flux for metabolites capable of leaving the metabolic network (i.e. acetate, ethanol, lactate, succinate, formate, and pyruvate) was always unconstrained in the net outward direction.
The intersection of the nullspace and the region defined by the linear inequalities defined a region in flux space that we will refer to as the feasible set, and the feasible set defined the capabilities of the metabolic network subject to the imposed cellular constraints. It should be noted that every vector v within the feasible set is not reachable by the cell under a given condition due to other constraints not considered in the analysis (i.e. maximal internal fluxes and gene regulation). The feasible set can be further reduced by imposing additional constraints (i.e. kinetic or gene regulatory constraints), and in the limiting condition where all constraints are known, the feasible set may reduce to a single point.
A particular metabolic flux distribution within the feasible set (vector v which satisfies the constraints in Eqns. 1 and 2) was found using linear programming (LP). A commercially available LP package was used (LINDO, Lindo Systems Inc., Chicago, II). LP identified a solution that minimized a metabolic objective function (subject to the imposed constraints- Eqns. 1 and 2) [16, 48, 49], and was formulated as shown below:
where Z = Σ ci vi = <c • v> Equation 3
The vector c was used to select a linear combination of metabolic fluxes to include in the objective function . Herein, c was defined as the unit vector in the direction of the growth flux, and the growth flux was defined in terms of the biosynthetic requirements:
where dm is the biomass composition of metabolite Xm (we used a constant biomass composition defined from the literature  (see Appendix 4)), and the growth flux was modeled as a single reaction that converts all the biosynthetic precursors into biomass.
Phenotype Phase Plane Analysis
All feasible E. coli in silico metabolic flux distributions are mathematically confined to the feasible set, which is a region in flux space ( n), where each solution in this space corresponds to a feasible metabolic flux distribution.
Phenotype Phase Plane (PhPP): A PhPP is a two-dimensional projection of the feasible set, and below we will briefly discuss the formalism for constructing the PhPP. Two parameters that describe the growth conditions (such as substrate and oxygen uptake rates) were defined as the two axes of the two dimensional space. The optimal flux distribution was calculated (using LP) for all points in this plane by solving the LP problem while adjusting the exchange flux constraints (defining the two-dimensional space). A finite number of qualitatively different patterns of metabolic pathway utilization were identified in such a plane, and lines were drawn to demarcate these regions. Each region is denoted by Pnx, y, where 'P' indicates that the region was defined by a phenotype phase plane analysis, 'n' denotes the number of the demarcated phase (as shown in a particular PhPP figure), and 'x, y' denotes the two uptake rates on the axis of the PhPP. PhPPs were also generated for mutant genotypes; represented as Pgenenx, y.
One demarcation line in the PhPP was defined as the line of optimality (LO). The LO represents the optimal relation between exchange fluxes defined on the axes of the PhPP.
Alterations of the genotype
FBA and E. coli in silico were used to examine the systemic effects of in silico gene deletions. The genes involved in the central metabolic pathways (glycolysis, pentose phosphate pathway, TCA cycle, electron transport) were subjected to removal from E. coli in silico. To simulate a gene deletion, all metabolic reactions catalyzed by a given gene product were simultaneously constrained to zero. Some metabolic reactions were catalyzed by more than one enzyme, and all genes that code for enzymes that catalyze a given reaction were simultaneously removed (i.e. rpiAB). Furthermore, all genes that make up an enzyme complex were also simultaneously removed (i.e. sdhABCD).
The optimal metabolic flux distribution for the generation of biomass was calculated for each in silico deletion strain. The in silico gene deletion analysis was performed with the transport flux constraints defined by the wild-type PhPP. The constraints imposed for each simulation are noted in Fig. 1.
For each in silico deletion strain, the optimal production of the twelve biosynthetic precursors and the metabolic cofactors was also calculated to identify auxotrophic requirements and impaired functions in the metabolic network (Table 1). The optimal production of the biosynthetic precursors was calculated by setting the objective function to the drain of a single metabolite (i.e., ATP → ADP + Pi, or PEP →). The numerical value of the objective function for each in silico deletion strain was reported as a fraction of the wild-type optimal value (Table 1).
Figure 1. Growth phenotypes of in silico deletion strains; maximal biomass yields on glucose for all possible single gene deletions in central intermediary metabolism. The environmental variables (uptake rate/external metabolic fluxes) are set to correspond to a point within each of the phases of the wild-type PhPP (figure inset). The maximal yields were calculated using flux-balance analysis with the objective of maximizing the growth flux. The biomass yields are normalized with respect to the results for the full metabolic genotype. The α and β value for the constraints on the external fluxes for glucose and oxygen uptake are defined as follows (units- mmole g-1 hr-1): Phase 1 - vglc = 10, voxy = 23; LO - vgic = 10, voxy = 20.3; Phase 2 - vglc = 10, voxy = 17; Phase 3 - vglc = 10, voxy = 12; Phase 4 - vglc = 10, voxy = 8; Phase 5 - vglc = 10, voxy = 3; Phase 6 - vglc = 10, voxy = 0.
Table 1. Optimal production of the twelve biosynthetic precursors and the metabolic cofactors.