Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA

Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL 33620, USA

Abstract

Background

Optimization procedures to identify gene knockouts for targeted biochemical overproduction have been widely in use in modern metabolic engineering. Flux balance analysis (FBA) framework has provided conceptual simplifications for genome-scale dynamic analysis at steady states. Based on FBA, many current optimization methods for targeted bio-productions have been developed under the maximum cell growth assumption. The optimization problem to derive gene knockout strategies recently has been formulated as a bi-level programming problem in OptKnock for maximum targeted bio-productions with maximum growth rates. However, it has been shown that knockout mutants in fact reach the steady states with the minimization of metabolic adjustment (MOMA) from the corresponding wild-type strains instead of having maximal growth rates after genetic or metabolic intervention. In this work, we propose a new bi-level computational framework--MOMAKnock--which can derive robust knockout strategies under the MOMA flux distribution approximation.

Methods

In this new bi-level optimization framework, we aim to maximize the production of targeted chemicals by identifying candidate knockout genes or reactions under phenotypic constraints approximated by the MOMA assumption. Hence, the targeted chemical production is the primary objective of MOMAKnock while the MOMA assumption is formulated as the inner problem of constraining the knockout metabolic flux to be as close as possible to the steady-state phenotypes of wide-type strains. As this new inner problem becomes a quadratic programming problem, a novel adaptive piecewise linearization algorithm is developed in this paper to obtain the exact optimal solution to this new bi-level integer quadratic programming problem for MOMAKnock.

Results

Our new MOMAKnock model and the adaptive piecewise linearization solution algorithm are tested with a small

Introduction

Metabolic engineering has become an important environment friendly process in modern biotechnology, providing new potential solutions to many global problems, including energy and environmental crisis

Classical metabolic engineering modifies individual metabolic genes or pathways, typically followed by costly and time-consuming screening processes to select desirable mutants based on their resulting phenotypes

Researchers have proposed different metabolic engineering methods based on these metabolic approximation models and typically the improved strains are sequentially modified based on FBA with multiple mutants. However, sequential metabolic engineering strategies do not have the guarantee of the optimality. In

In this paper, we propose a bi-level programming framework for the identification of optimal genetic manipulations under the MOMA assumption. With the new MOMA assumption to approximate the condition to maintain the cell liveness as the essential phenotypic constraints, the inner optimization problem becomes a quadratic programming (QP) problem rather than the linear programming (LP) problem in OptKnock. To address the raised computational complexity, we develop a novel adaptive solution algorithm to solve this new bi-level optimization problem. The new algorithm under the minimizing flux adjustment assumption is tested on metabolic networks and our preliminary experimental results show that our framework can generate more practical and robust knockout strategies compared to OptKnock.

Methods

Backgrounds: FBA and MOMA

Before introducing our new bi-level programming problem to identify optimal metabolic genes or reactions to delete for the maximization of targeted bio-productions, we first review the mathematical foundations of FBA _{ij }_{j }_{biom }_{biom }_{j }

As stated in _{2 }distance between the knockout flux values to wild-type steady-state flux values:

where _{j }_{j }_{biom }_{glc }_{glc_uptake}_{j}

New bi-level programming framework

Following the modeling strategy in OptKnock

Mathematically, we introduce binary variables _{j }_{j }_{j }_{j }

in which _{chemical }

Adaptive linearization strategy for an exact optimal solution

We emphasize that the nested inner optimization problem is a QP problem with respect to flux allocation _{j }

To derive efficient solution algorithms for our new bi-level programming gene knockout problem, we adopt a novel adaptive linearization solution strategy to tackle the computational complexity introduced by the inner QP problem. Specifically, we propose to adaptively represent the quadratic terms in the objective function of the inner problem using a set of linear functions as illustrated in Figure

Schematic illustration of adaptive linearization solution strategy to the new bi-level programming problem under the MOMA assumption: (A) Piecewise linearization; (B-D) Adaptive solution strategy

**Schematic illustration of adaptive linearization solution strategy to the new bi-level programming problem under the MOMA assumption: (A) Piecewise linearization; (B-D) Adaptive solution strategy**.

The basic idea of adaptive piecewise linearization is illustrated in Figure _{1}, which can be represented by a convex combination of endpoints of piecewise segments for a given piecewise linearization. The corresponding quadratic objective function value at _{1 }is denoted by _{1}, which can be approximated linearly by

With this basic understanding of our new bi-level model and adaptive piecewise linearization solution strategy, we describe the detailed algorithm in the following sections.

Piecewise linearized inner problem

The quadratic objective function of the inner problem, denoting the metabolic adjustment to wild-type steady-state flux allocations (_{j}_{j }_{j }

in which _{j}

Similarly, as can be seen from Figure _{j }

With this convex approximation strategy, the inner problem with MOMA is transformed to a linear programming problem with respect to the piecewise variables

Here, both _{j }

We first give the dual problem of the linearized inner problem:

where _{j }_{i }_{j }_{j }_{glc }_{biom }_{j }

This final single-level MILP problem can be solved effectively by professional solvers, such as CPLEX

Adaptive strategy

We have shown that we can effectively solve the linearized bi-level programming problem in the previous section. However, due to the linearization of the original quadratic MOMA objective function, the obtained result for a given linearization scheme is an approximate solution but not exact. In addition, the closeness to the exact optimal solution is directly determined by the number of segments for each flux to approximate the quadratic function _{j }_{j}

When we have only one non-zero value within all the piecewise variables _{j }

Based on the differences and the state of vector ** β**for all flux values, we adaptively add new piecewise linear segments to better approximate the corresponding contributions from each reaction flux to the quadratic objective function in the inner problem. By repeating the above procedure as shown in Figure

**Algorithm 1 **Adaptive bi-level MOMAKnock.

Initialize variables.

Initialize the piecewise linearization with k pieces

**repeat**

Solve the inner primal problem based on previous knockouts to get a low bound objL;

Solve the MILP problem with the low bound objL;

**for **Each flux **do**

Compute Δ_{j}

**if **Δ_{j }**then**

Add a segment point at

**end if**

**end for**

**until **Added segments do not improve the objective function

Results and discussion

Succinate production on AntCore metabolism network

First, we implement our new adaptive bi-level programming method--MOMAKnock--to derive optimal knockout strategies for a core _{2 }distance from the optimal knockout flux values to wild-type steady-state flux values is denoted by "

Results for knockout strains derived by OptKnock on the core E. coli metabolic network

**OptKnock**

**MOMA Flux**

**K**

**Knockouts**

**Succi**

**Biomass**

**Succi**

**Biomass**

2

kdpg→ pyr + gap (or 6pg→kdpg), fadh2 + 0.5o2→2atp (or suc→ fum + fadh2)

102.98

14.36

26.32

13.18

398.75

3

g6p → 6pg + nadph, 3pg+glu→ser+akg+nadh, nadh → nadph

121.02

7.06

24.45

5.23

633.25

4

g6p → 6pg + nadph, dhap → gap, fadh2 + 0.5o2→2atp (or suc→fum + fadh2), glyc → glyc(ext)

118.71

5.00

84.56

5.00

482.70

5

pep → pyr + atp, mal→ pyr+co2 + nadph, dhap + nadh → glyc3p, glyc3p → glyc, fadh2 + 0.5o2→2atp (or suc→fum + fadh2)

126.33

10.91

38.73

12.75

518.65

Results for knockout strains derived by MOMAKnock on the core E. coli metabolic network

**MOMAKnock**

**MOMA Flux**

**K**

**Knockouts**

**Succi**

**Biomass**

**Succi**

**Biomass**

2

6pg→ ru5p+co2+nadph, suc→ fum + fadh2 (or fadh2 + 0.5o2→2atp)

54.41

13.44

40.25

12.65

124.86

3

6pg→ru5p+co2+nadph, fadh2+0.5o2 → 2atp (or suc→ fum + fadh2), ser→ gly + meethf

54.98

12.08

45.71

11.80

157.67

4

pep→ pyr + atp, g6p→ 6pg+nadph, 6pg→ kdpg (or kdpg→ pyr + gap), fadh2 + 0.5o2→2atp (or suc→ fum + fadh2)

57.75

11.24

52.73

10.76

318.52

5

pep → pyr + atp, g6p → 6pg + nadph, 6pg→ kdpg (or kdpg→ pyr + gap), fadh2+0.5o2→2atp (or suc→ fum + fadh2), nadh → nadph

65.25

7.90

53.31

7.65

352.26

Based on the results from OptKnock in Table

Table _{2 }distance based phenotypic constraints in the inner level of MOMAKnock, we can see that the optimal knock flux distributions from MOMAKnock is always closer to the wild-type flux distribution compared to OptKnock suggested knockouts.

Biologically, it is interesting to note that our MOMAKnock indeed identifies relevant reactions as suggested knockout reactions. For example, when the knockout number

Based on these preliminary results on this core network model, even though the OptKnock takes the maximizing biomass production as the inner cellular objective, the derived knockout strategies do not always achieve high biomass production when we simulate these knockout strategies under the MOMA objective. Sometimes, these knockout strategies cannot even guarantee the minimum biomass requirement. The reason for this is that the inner optimization in the bi-level framework of OptKnock serves as the additional constraint for the outer optimization problem. The derived optimization procedure first considers the outer problem as the primary objective and then the inner problem is optimized. The simulated low targeted chemical production rates for OptKnock suggested knockouts in the MOMA flux distribution and the abrupt biomass level changes in OptKnock illustrate that the biomass maximization assumption to approximate cellular objectives may not provide robust and reliable metabolic reaction deletion strategies. On the other hand, MOMAKnock approximates the inner cellular objective by the MOMA assumption which assumes that knockout strains stay closer to the corresponding wild-type strains. If this is guaranteed, knockout strains also can achieve appropriate biomass flux values. In fact, as shown in Tables

By comparison with OptKnock on this core

Succinate production on iAKF1260 network

We further test MOMAKnock on a large

Results for knockout strains derived by MOMAKnock on the iAF1260 E. coli metabolic network

**MOMAKnock**

**MOMA Flux**

**K**

**Knockouts**

**Succi**

**Biomass**

**Succi**

**Biomass**

3

q8+succ→fum+q8h2, 6pgl+h2o→6pgc+h, (2)h2o + o2 + urate → alltn + co2 + h2o2

39.30

5.02

27.45

5.02

906.49

4

q8+succ→fum+q8h2, ac + atp → actp + adp, h2o+methf→10fthf+h, r5p+xu5p-D→g3p+s7p

67.08

5.02

63.23

5.02

402.33

5

q8+succ→fum+q8h2, glu-L+h→4abut+co2, 3pg+nad→3php+h+nadh, 3php+glu-L→akg+pser-L, 6pgc+nadp→co2+nadph+ ru5p-D

74.94

5.02

66.67

5.02

464.76

The MOMA flux distribution: (A) wild-type E. coli network, (B) ** K = 5 **MOMAKnock mutant

**The MOMA flux distribution: (A) wild-type E. coli network, (B) K = 5 MOMAKnock mutant**. (Only a part of the network is presented.)

From Figure

We notice that in Table

Conclusions

In this paper, we have proposed a new bi-level programming optimization framework to identify optimal knockout strategies for maximum targeted bio-productions under the phenotypic constraints approximated by the MOMA assumption. A novel adaptive piecewise linearization solution strategy has been developed to efficiently solve this new mixed integer quadratic bi-level programming problem. The preliminary experiments on both the core

Our new bi-level MOMAKnock model can serve as an alternative method with slightly higher computational complexity to OptKnock for _{2 }distance objective function in MOMA by either _{0 }or _{1 }norm, which will lead to different bi-level optimization problems. We will develop corresponding solution strategies to solve this category of bi-level problems for large-scale networks and compare their performances with respect to the efficacy and robustness of the correspondingly derived intervention strategies.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

Conceived and designed the experiments: XQ. Designed and Implemented the algorithm: SR, BZ, XQ. Performed the experiments: SR. Analyzed the results: SR, BZ, XQ. Wrote the paper: SR, BZ, XQ.

Declarations

The publication costs for this article were funded by the corresponding author's institution.

This article has been published as part of

Acknowledgements

XQ was supported in part by Award R21DK092845 from the National Institute Of Diabetes And Digestive And Kidney Diseases, National Institutes of Health; and by the University of South Florida Internal Awards Program under Grant No. 78068.