Laboratoire de Physique Théorique et Modèles Statistiques, CNRS and Univ Paris-Sud, UMR 8626, F-91405 Orsay Cedex, France

Max Planck Institute for Mathematics in the Sciences, Inselstr. 22, 04103 Leipzig, Germany

Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland

Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, 1015 Lausanne, Switzerland

The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

INRA, UMR 0320/UMR 8120 Génétique Végétale, Univ Paris-Sud, F-91190 Gif-sur-Yvette, France

Abstract

Background

The ubiquity of modules in biological networks may result from an evolutionary benefit of a modular organization. For instance, modularity may increase the rate of adaptive evolution, because modules can be easily combined into new arrangements that may benefit their carrier. Conversely, modularity may emerge as a by-product of some trait. We here ask whether this last scenario may play a role in genome-scale metabolic networks that need to sustain life in one or more chemical environments. For such networks, we define a network module as a maximal set of reactions that are fully coupled,

Results

Using recently developed techniques to randomly sample large numbers of viable metabolic networks from a vast space of metabolic networks, we use flux balance analysis to study

Conclusions

Our work shows that modularity in metabolic networks can be a by-product of functional constraints, e.g., the need to sustain life in multiple environments. This organizational principle is insensitive to the environments we consider and to the number of reactions in a metabolic network. Because we observe this principle not just in one or few biological networks, but in large random samples of networks, we propose that it may be a generic principle of metabolic network organization.

Background

The architectures of most multi-cellular organisms are strikingly modular. On the one hand, such modularity can be spatial. Organisms are partitioned into organs and tissues whose cells have specialized functions

On the other hand, modularity can be topological, as research of the last ten years has shown. Such modularity is evident in biological networks such as protein-protein interaction networks

The prevalence of modularity (both spatial and topological) in living systems might have several ultimate evolutionary origins (see Ref.

In other scenarios for the origin of modularity, natural selection on the rate of adaptation does not shape modularity; instead modular architectures follow from developmental constraints, or from other phenomena related to epistasis and pleiotropy

In the present work we focus on metabolism, and show that modularity in genome-scale metabolic networks may be a by-product of phenotypic constraints. We will show that this scenario is likely to be very general in metabolism for traits related to an organism's ability to live in different environments. We refer to this ability as an organism's metabolic

In contrast to many other networks

To avoid this difficulty, we can take advantage of our ability to create random samples of metabolic genotypes with specific properties, including versatility. (More precisely, the genotypes we consider are discretized binary metabolic genotypes, representations of genotypes that are suitably simplified for our purpose, as explained in Methods.) This approach

Modeling framework

For our study, we use genome-scale metabolic network modeling. The set of chemical reactions that can take place in an organism and their associated metabolites define the organism's metabolic network. Each reaction is typically catalyzed by an enzyme that allows the transformation of substrate molecules into product molecules. With the advent of genome-scale metabolic network modeling

An organism's set of enzyme coding genes, identified here with a list of reactions, can be viewed as a discretized binary metabolic genotype; for brevity we refer to it from here on as the organism's genotype or metabolic genotype. Specifically, given a total universe of _{1}
_{2}
_{N}
_{i }
_{i }
^{N }

**Figures S1 to S7**. This file contains the following additional figures: Figure S1 - MCMC sampling of genotypes with a given phenotype; Figure S2 - a FCS predominantly consists of a reactions belonging to one biochemical pathway; Figure S3 -Example of a frequently arising FCS in sampled genotypes; Figure S4 - Environmental versatility enhances the modularity index _{env}

Click here for file

If an organism can grow in a specific chemical environment (defined through the nutrients it contains), its metabolic network is able to produce all of its biomass precursors (see Methods); we then call the organism (and by extension its metabolic network)

To characterize metabolic networks of a given phenotype, we cannot examine all genotypes because of their astronomical number. Instead, we use a Markov Chain Monte Carlo (MCMC)

Results

Fully Coupled Sets of reactions are proxies of pathways

The analysis of modularity in large graphs or networks is a mature field. Not surprisingly, multiple different measures of modularity have been developed

The simplest possible FCS involves reactions in a linear biochemical pathway, arguably the most intuitive form of a functional module in biochemistry. However pathways with branches and cycles can also form FCSs

Example of a FCS in the

**Example of a FCS in the E. coli metabolic network**. We display a FCS of 12 reactions in the

We first asked how modules, as defined by FCSs, relate to conventional biochemical pathways, the classical functional modules of metabolism. To this end, we mapped reactions in many different FCSs onto biochemical pathways, as defined by standardized annotations

For each FCS, we identified the pathway annotation for all of its reactions. Because each reaction can be annotated as belonging to multiple pathways, we identified for each FCS the pathway annotation that is shared by most of its reactions. We defined the quantity Q as the fraction of reactions that are annotated as belonging to that pathway, and computed Q for each FCS in the metabolic network of

The same analysis can be applied to random samples of metabolic networks with specific properties, as generated by our MCMC sampling procedure (see Methods). Specifically, we first identified FCSs from 1000

Both measures of modularity

We next asked quantitatively how network modularity is affected by environmental versatility. To answer this question, we defined two indices of network modularity, which we call _{env }
_{env }
_{env }

**Tables S1 and S2**. This file contains the following additional tables: Table S1 - The list of 89 minimal environments; Table S2 - The list of autocatalytic metabolites.

Click here for file

In Figure _{env}

A higher modularity index

**A higher modularity index M and a greater number of modules s are by-products of increasing environmental versatility**. The Environmental Versatility Index (

As a network's versatility rises, does an increase in

Figure _{env}
_{env }

The results of Figure _{env }
_{env }
_{env }

Modular architecture of the

So far we have shown averages of our modularity measures

Figure

Distribution of _{env }

**Distribution of M and of the number of modules s for genotypes of phenotype with V**. The horizontal axis shows the modularity index

The architecture of the

Reactions in versatility-dependent FCSs are just downstream of nutrients

Thus far, we saw that metabolic networks sustaining growth on more nutrients have higher modularity, that is, more reactions contained in modules and more modules (FCSs) (see Figure _{env }
_{env }
_{89 }and R_{1}, for the ensembles with _{env }
_{env }
_{1 }also belong to R_{89}. We then examined the reactions that belong to R_{89 }but that are not part of R_{1}, and called this set of reactions R_{89}\R_{1}. Are the reactions in R_{89}\R_{1 }immediately downstream of the nutrients? The notion of downstream can be made quantitative through the _{89}\R_{1 }to the scope distances of all reactions in our universe of reactions. Figure _{89}\R_{1 }generally have smaller scope distance than other reactions. A statistical test (see Methods) shows that this difference is significant with a p-value of 10^{-5}. In sum, reactions of modules involved in increased versatility tend to be more closely downstream of nutrients, suggesting that they typically belong to pathways metabolizing such nutrients. To illustrate this property with concrete examples, we determined which FCSs in R_{89}\R_{1 }involved any of the 24 reactions occurring at scope distance 1 in Figure

The increase in modularity with _{env }

**The increase in modularity with V**. The sampled genotypes with

Discussion and conclusions

Our work took advantage of a new computational method

Modularity in metabolic networks has been studied by several other authors

Intriguingly, the extent of modularity found in

Given the ubiquity of modularity in biological systems, it is tempting to propose general principles that might explain its appearance. By comparing natural with man-made systems and following the original insights of Jacob

The question whether biological modularity may have a direct benefit can be addressed in systems where a realistically complex yet computationally tractable genotype to phenotype relationship exists. Genome-scale metabolic network models are such systems

Our analysis shows that modularity can be a by-product of versatility, at least in the framework of our metabolic modeling, because our system has no selective pressure on modularity per-se; highly versatile networks that are also highly modular are simply more numerous than the less modular ones. In the language of constraint satisfaction problems

Since versatility corresponds to viability in increasing numbers of environments, it can be considered as a trait associated with fitness itself. Our work suggests that modularity can emerge as a consequence of increasing functional constraints. Because our work is not just based on one or few metabolic networks from well-studied organism, but on large samples of random viable networks, we also suggest that this scenario may be generally important. Recent observations by Parter et al.

Methods

Flux Balance Analysis (FBA)

Flux balance analysis (FBA) **S **of dimensions **v **of metabolic fluxes through the reactions satisfies the equation

so as to satisfy mass conservation. Eq. 1 represents stoichiometric and mass balance constraints on the metabolic network. For genome-scale metabolic networks, Eq. 1 leads to an under-determined system of linear equations in the metabolic fluxes, leading to a large solution space of allowable fluxes. The space of allowable solutions can be reduced by incorporating thermodynamic constraints associated with irreversible reactions, as well as flux capacity constraints which limit the maximum flux through certain reactions. To obtain a particular solution, linear programming (LP) is used to find a set of flux values - a point in the solution space - that maximizes a biologically relevant linear objective function

where the vector **c **corresponds to the coefficients of the objective function **a **and **b **contain the lower and upper limits of different metabolic fluxes in **v**. The objective function

Reaction database

In this work, we have used a hybrid database compiled by Rodrigues and Wagner

In addition to the 5870 metabolic reactions, the hybrid database has transport reactions for 143 external metabolites contained in the

Genome-scale metabolic networks typically contain

The

Viable genotypes

Any subset of _{1}
_{2}
_{N}
_{i }
_{i }

For any genotype, we can use FBA to determine whether the corresponding metabolic network has the ability to synthesize all biomass components in a given chemical environment (medium). We consider a genotype to be

Chemical environments and phenotypes

For our purpose, the metabolic phenotype of a metabolic network (genotype) is determined by the network's viability in a list of well-defined chemical environments (media). We shall denote the subset of genotypes within Ω(

Environmental versatility index (_{env}

The Markov Chain Monte Carlo (MCMC) sampling algorithm (see also below) can be used to explore the set of genotypes having a given phenotype. In our case, this phenotype is viability on a given set of minimal environments; if this set consists of _{env }
_{env}
_{env }
_{env }

We have used MCMC to sample ensembles of increasingly versatile metabolic networks, _{env }
_{env }
_{env }
_{env }
_{env }
_{env}
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env }
_{env}
_{env}

MCMC sampling of viable genotypes

It was shown in previous work _{env }
^{-22 }for genotypes with

This MCMC method starts with a genotype in ^{th }genotype in the sequence is generated from the k^{th }genotype using a probabilistic transition rule. At each transition step, one proposes a small modification to the current genotype in the sequence; if the modified genotype has the correct phenotype, one accepts the modified genotype as the next genotype of the sequence; otherwise the next genotype becomes identical to the current genotype. The modification introduced at each transition step is a reaction swap. It consists of removal of one reaction from the current genotype, followed by addition of new reaction from the global reaction set to generate a modified genotype. Note that the reaction swap preserves the number

In our simulations, starting from an initial genotype in ^{5 }attempted swaps to erase the memory of the starting genotype. After this initial phase, we continued the MCMC procedure to sample genotypes in ^{th }genotype generated in a sequence of 10^{6 }steps. This procedure produces a random ensemble of 1000 genotypes in

To start the MCMC sampling, a first genotype having the correct phenotype is required. To this end, we first determined those reactions in the _{env }
_{env }
_{env }

Fully coupled sets (FCSs) and measures of modularity

A reaction pair _{1 }
_{2 }
_{1 }
_{2 }

We denote the number of FCSs in a metabolic network genotype by

If _{max }= _{min }then _{1 }and _{2 }are fully coupled. In the above equations, **S **is the stoichiometric matrix, and vectors **a **and **b **contain the lower and upper limits of different metabolic fluxes in **v**.

We have used the algorithm of Burgard

Scope algorithm and distance of reactions from nutrient metabolites

Ebenhöh and colleagues

The Scope algorithm iteratively updates a set

A limitation of the Scope algorithm in comparison to constraint-based frameworks like FBA is its inability to deal properly with the self-generating (autocatalytic) nature of certain cofactor metabolites (e.g., ATP, NADH) in the network

We have determined the distance from nutrient metabolites for each reaction in the global reaction set for 89 different seed sets corresponding to the 89 aerobic minimal environments. For each reaction, we have designated the minimum of the 89 distances obtained for the 89 different environments as the

Statistical tests for the increase in modularity _{env}

Since the modularity index _{env }
_{env }
_{env }
_{env }
_{env }
_{89 }of FCS reactions for _{env }
_{1 }of FCS reactions for our sampled genotypes with _{env }

The consensus set R_{89 }for _{env }
_{1 }for _{env }
_{89}\R_{1 }consisting of the reactions belonging to R_{89 }but not R_{1 }gives the set of reactions that mostly account for the additional FCSs in _{env }
_{89}\R_{1}, the second is the set of all reactions in the global reaction set. (In Figure _{89}\R_{1 }are clearly concentrated at much smaller values than when considering ^{-5}). Further, a two sample Welch t-test allowed us to reject the hypothesis that the means of the two distributions are the same (P < 8.10^{-8}).

Use of pathway classification of reactions to characterize biochemical relevance of FCSs

We have classified reactions in our global reaction set into different biochemical pathways using the pathway information

For a given FCS, we define the quantity _{env }
_{env }

To test the significance of the _{env }
^{7 }swaps starting from the merged list before saving a _{env }

Authors' contributions

The project was defined by all three authors. OCM and AS conceived the algorithmic procedures. AS wrote the code and performed the numerical simulations. All authors contributed in designing research, analyzing the data, and writing the paper. All authors have read and approved the manuscript.

Acknowledgements

We thank Dominique de Vienne, Christine Dillmann and Vincent Fromion for comments, and Pierre-Yves Bourguignon for discussions. AS acknowledges support from CNRS GDRE513. AW acknowledges support through Swiss National Science Foundation grants 315200-116814, 315200-119697, and 315230-129708, as well as through the YeastX project of SystemsX.ch, and the University Priority Research Program in Systems Biology at the University of Zurich. OCM acknowledges support from the Agence Nationale de la Recherche, Metacoli grant ANR-08-SYSC-011. The LPTMS is an Unité de Recherche de l'Université Paris-Sud associée au CNRS.