Department of Bioengineering, Rice University, Houston, Texas, USA

Department of Computer Science, Rice University, Houston, Texas, USA

Abstract

Background

Standard graphs, where each edge links two nodes, have been extensively used to represent the connectivity of metabolic networks. It is based on this representation that properties of metabolic networks, such as hierarchical and small-world structures, have been elucidated and null models have been proposed to derive biological organization hypotheses. However, these graphs provide a simplistic model of a metabolic network's connectivity map, since metabolic reactions often involve more than two reactants. In other words, this map is better represented as a hypergraph. Consequently, a question that naturally arises in this context is whether these properties truly reflect biological organization or are merely an artifact of the representation.

Results

In this paper, we address this question by reanalyzing topological properties of the metabolic network of

Conclusions

These results combined suggest that the reported scaling of the clustering coefficients in the metabolic graphs and its specific power coefficient may be an artifact of the graph representation, and may not be supported when biochemical reactions are atomically treated as hyperedges. This study highlights the implications of the way a biological system is represented and the null model employed on the elucidated properties, along with their support, of the system.

Background

Graphs have been used extensively to model the connectivity of cellular processes

To investigate this question, we analyze metabolic network connectivity maps from a

Illustration of the hypergraph transformations and abstractions

**Illustration of the hypergraph transformations and abstractions**. Left: a hyperedge is turned into a complete graph linking every pair of nodes to obtain the primal graph (I). Middle: the

Some work on metabolic connectivity hypergraphs already exists. For example, Forst et al.

Further, it is worth pointing out that the hypergraph property of the dependence among metabolites participating in the same reaction has already been widely, though implicitly, captured in other modeling techniques, such as network expansion

In this paper, we address the aforementioned question by conducting three tasks on the metabolic network connectivity map of

Further, when clustering is analyzed directly on the hypergraph representation, the scaling property, which has been reported in the literature, becomes poorly supported. These results combined suggest that the reported scaling of the clustering coefficients in the metabolic graphs and its specific power coefficient may be an artifact of the graph structure produced by the abstraction process and may not be supported when biochemical reactions are atomically treated as hyperedges. This study highlights the implications of the systems representation and null model employed in an analysis on the hypotheses derived for that system. Further, these results have implications beyond metabolic networks since, for example, signal transduction networks contain many enzymatic and complexing reactions that form hyperedges. The weakening of statistical support of reported properties of biological networks when the new null model is considered calls into question claims that adaptive evolution is the (only) explanation for the emergence of complex, or non-intuitive, network features. More generally, this study further emphasizes the issue that the use of proper representations and null models is fundamental to understanding the biology underlying the abstract model.

Results and Discussion

A Binomial Distribution of Reaction Sizes and Its Effects

When transforming a hypergraph into a standard graph, under any of the aforementioned transformations, the information on the hyperedge cardinality is lost. The question, then, is whether ignoring the hyperedge cardinality distribution affects the properties elucidated from abstracted standard graphs. Further, if the answer is positive, how should this information be integrated into null models of generating random metabolic graphs in analytical studies.

To address the first question, we begin by inspecting the degree distributions of primal graphs generated randomly in a way to account for hyperedge constraints. It is analytically very hard to establish the degree distribution of the primal of randomly generated hypergraphs, since the overlap between hyperedges creates dependencies among the degrees of the nodes. Therefore, we study this issue in simulations. Given a metabolic hypergraph

**Additional Information**. The file contains additional information on methods for null model generation, reaction size distribution for four more organisms, other abstraction methods as well as their illustration on a concrete metabolic pathway, discussion on currency metabolites and on other clustering coefficients defined on hypergraphs.

Click here for file

In the case of the

The degree distributions of the primal graphs of random hypergraphs

**The degree distributions of the primal graphs of random hypergraphs**. Each of the hypergraphs has 1193 nodes and 1168 hyperedges. Columns from left to right correspond to fixed hyperedge cardinalities of 2, 3, 4, and 5, respectively. The results in each panel are based on the 300 randomly generated hypergraphs (replica). For each well represented degree value (contained in at least 10 replica), the median is plotted. Error bars indicate quartiles. Green dots correspond to the degree distribution of the primal graph of the (undirected) metabolic hypergraph of

Notice that hypergraphs with different hyperedge cardinalities give rise to standard graphs with different degree distributions. In general, the degree distribution of the primal of a random undirected hypergraph with hyperedge cardinality larger than 2 has a zig-zag shape when the degree value is low and becomes more complex as the degree value increases. This is due to the fact that the metabolic hypergraphs we consider are very sparse.

In a hypergraph with

In the case of the _{1 }≡ _{2 }(mod

Clearly, the hypergraphs of different hyperedge cardinalities contribute to different but overlapping ranges of degree values. In particular, the leftmost panel of Figure

Indeed, in the case of metabolic hypergraphs, neither do all the hyperedge cardinalities take one same value nor do they follow a simple uniform distribution. Their effect on the properties of the abstracted standard graphs has not been studied. In Figure

The hyperedge cardinality distribution of the metabolic hypergraph of

**The hyperedge cardinality distribution of the metabolic hypergraph of E. coli**. Poisson distribution and Binomial distribution with different sample sizes are shown in dashed lines. Parameters of these distributions (

Incorporating the Reaction Size Distribution Into a Null Model

Based on the above results, we believe it is important for a null model for generating random graphs in the context of metabolic networks to use both the number and cardinality distribution of hyperedges. We study a null model where a random graph is generated from the metabolic hypergraph by first rewiring the hypergraph (thus, keeping the number and cardinality distribution of hyperedges unchanged) and then abstracting the random hypergraph (through a

Comparison of the two null models on a toy hypergraph

**Comparison of the two null models on a toy hypergraph**. The hypergraph-graph abstraction follows the

To rewire the metabolic standard graph of

The degree distributions of the

Comparison of the degree distributions of the metabolic standard graph of

**Comparison of the degree distributions of the metabolic standard graph of E. coli against two different null models**. The degree distributions are derived based on three versions of the metabolic hypergraph of

We also fit the tail of the degree distribution of the standard graph of ^{-α
}using the least squares fitting. By inspecting the data, the fitting region for standard graphs is manually set to

Two observations are in order based on Figure

1. The tail shifts to the higher degree region in the graphs abstracted after rewiring the metabolic hypergraph compared with the graphs rewired after being abstracted from the real metabolic hypergraph. Comparison with similar situation in undirected hypergraphs (Figure

2. The

These two observations are in agreement with the statement of Wagner and Fell

The scaling of clustering coefficient

It has been proposed that metabolic graphs are ^{-1 }for a variety of metabolic networks, including that of

In Figure

Scaling of average clustering coefficients

**Scaling of average clustering coefficients C(k)**. (I) The primal of

(I) The primal of the

(II) Erdös-Rényi random graphs with 1193 nodes and 5719 randomly chosen edges.

(III) Random graphs generated by 100,000 rewiring operations applied to the graph in (I), where in each rewiring operation, a pair of non-adjacent edges are selected, and the neighbors of an endpoint of one edge are swapped with the neighbors of an endpoint of the other edge. This procedure generates random graphs with the same degree distribution as that of the graph in (I).

(IV) The primal of hypergraphs generated by 100,000 rewiring operations applied to the

Very similar patterns were observed when taking

For an Erdös-Rényi random graph with 1193 nodes and 5710 edges, a small value of ^{2 }- ^{2 }- ^{-2 }for large

If we rewire the primal of

We also studied the clustering coefficient on reaction graphs obtained through PLGT (see Figure ^{T}
^{T}
^{0.08 }
^{T}

The scaling of averaged clustering coefficients in the reaction graph obtained via PLGT

**The scaling of averaged clustering coefficients in the reaction graph obtained via PLGT**. Left panel: The green dots are the average clustering coefficients of the PLGT of ^{6 }times, to guarantee convergence).

Further, in this case we find that the clustering coefficients are greatly affected by the presence of metabolites that participate in a large number of reactions, or the so-called "currency metabolites", such as water. With water removed from the original hypergraph, the entire rightmost vertical strip in the PLGT's clustering coefficients disappears (red dots in Figure

The results of

The question, then, is: why is this scaling of clustering coefficients? Or, why is this hierarchical structure of graphs abstracted from hypergraphs? We believe that this is simply an artifact of the way standard graphs are abstracted from metabolic hypergraphs. For example, the primal of an undirected hypergraph connects all the reactants in the same reaction, thereby forming cliques in the abstracted standard graph. These cliques contribute the same number of 2-paths and triangles in computing the clustering coefficient of a reactant. Since the number and size of such cliques remain unchanged as a hypergraph is rewired, their contribution remains the same as well. The similarity between the scaling of

In order to figure out whether the scaling of clustering coefficients is due to the inherent "hierarchy" of the metabolic graph, or is just a consequence of the graph abstraction process and the hyperedge cardinality distribution, we computed the hypergraph clustering coefficient using a new measure we devised to apply directly to hypergraphs (see Methods). Results are shown in Figure

The scaling of hypergraph clustering coefficient

**The scaling of hypergraph clustering coefficient**. The green dots are the local clustering coefficients. The red dots are averaged value of the local clustering ceofficients for each degree. Left panel:

To summarize, we believe topological characteristics of metabolic networks, such as scale-free degree distributions and scaling of clustering coefficients, are not necessarily a ground for invoking natural selection or making connections to functional organizations. Instead, these properties may lose statistical significance when a null model taking into account of the reaction sizes is used, and may even disappear when computations are done on the appropriate representation of metabolic networks.

Conclusion

In this article, we investigated the impact of choosing a null model that incorporates the hypergraph property of the metabolic system such as the reaction size distribution to the networks' connectivity analyses. By reanalyzing the degree distribution and clustering coefficient we found that the reported scaling of the clustering coefficients in the metabolic graphs and its specific power coefficient may be an artifact of the hypergraph abstraction, and is not supported when biochemical reactions are atomically treated as hyperedges. Also we found that by taking into the reaction size distribution, a null model can explain some of the details in the shape of the degree distribution that have not been explained otherwise, further highlighting the necessity of using appropriate null models in exploring adaptive evolution, along with the analysis of their support in biological systems.

Methods

Data

We assembled the metabolic hypergraph of

Metabolic Hypergraphs

An _{t}
_{h}
_{t}
_{h}

The

The neighborhood of a set of nodes,

From Hypergraphs to Standard Graphs

A variety of transformations can be applied to a hypergraph to obtain standard graph representations. We now define transformations that are applicable (and have been applied) in the context of representing metabolic networks. Let _{p }
_{p}

The primal of a metabolic hypergraph is also called the _{cp }
_{cp}

This corresponds to the _{tp }
_{tp}

This corresponds to the

Every undirected hypergraph can be completely described by a binary matrix

Finally, common set operations, such as union and intersection, can also be introduced into the hypergraph transformation. One of the widely, yet implicitly, used case is the generation of enzyme/gene hypergraphs from the underlying reaction hypergraph

Clustering Coefficients on Hypergraphs

A commonly used statistic for elucidating properties of metabolic networks, such as modularity

According to _{local}, for any given node _{local}(_{local }measures, for a node

According to _{global }is defined as the fraction of the number of 2-paths with linked end points (i.e., triangles) over the number of all possible 2-paths. Intuitively, _{global }measures the probability of having an edge (

For a proper extension of _{local }and _{global }to the domain of hypergraphs (denoted by _{local }and _{global}, respectively), the following intuitive properties may be desirable, in addition to reflecting the extent of clustering in a hypergraph:

P1 The values of _{local }and _{global }fall in the range [0, 1].

P2 _{local }and _{global }should reduce to _{local }and _{global}, respectively, when every hyperedge connects exactly two nodes (i.e., the hypergraph is a standard graph).

P3 _{local}(

The rationale behind property P1 is to retain the probabilistic interpretation of the clustering coefficient statistic, as well as to enable comparing two different hypergraphs under the statistic. The rationale behind property P2 is to allow treating hypergraphs and standard graphs (which are a special case of hypergraphs) in a uniform manner. Property P3 reflects the fact that neighbors of a node can also be neighbors simply since all three belong to the same hyperedge--a case that should be treated carefully to reflect a proper notion of clustering.

Based on these properties, we define _{local}(_{global}(

where, ℐ = {{_{i}
_{j}
_{i }
_{j }
_{i }
_{j}
_{i }
_{j }

where _{ij }
_{i }
_{j}
_{local }under a variety of scenarios. For _{global}, the numerator is the sum of extra overlap between any pairs of hyperedges that contain

Illustration of the

**Illustration of the Extra Overlap and the Local Clustering Coefficient for hypergraphs**.

From the definition of

1.

2. For two non-identical, intersecting hyperedges, _{i }
_{j}
_{i}
_{j}
_{i}
_{j}

3. For any two sets

It follows from these observations that _{local }and _{global }satisfy the three aforementioned properties P1--P3.

Note that we are not the first to define clustering coefficient measures for hypergraphs. Estrada and Rodríguez-Velázquez

where a hyper-triangle is a set of three nodes and three hyperedges that connect them, and a 2-path is a sequence {_{1}, _{2}, _{1}, _{2 }are two distinct hyperedges, {_{1 }and {_{2}. The numerator is essentially the number of the closed-walks of length 3 without reusing hyperedges or revisiting nodes except at the end points

To analyze how the two measures of global clustering coefficients compare, we conducted a simple test, where we generated random hypergraphs with increasing connectivity and applied the measures to them. More precisely, we generated a random graph by starting with 30 disconnected nodes, and then, for each subset of

Comparison of the two global hypergraph clustering coefficient measures on random hypergraphs

**Comparison of the two global hypergraph clustering coefficient measures on random hypergraphs**. The x-axis shows the probability

Three observations are in order. First, the two measures yield identical results in the case of standard graphs (where

Authors' contributions

All authors read and approved the final manuscript.

Acknowledgements

The authors would like to thank the three anonymous reviewers whose extensive comments helped improve the manuscript significantly. This work was supported in part by NSF grant CCF-0622037, grant R01LM009494 from the National Library of Medicine, and an Alfred P. Sloan Research Fellowship to Luay Nakhleh. The contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Science Foundation, National Library of Medicine, the National Institutes of Health or the Alfred P. Sloan Foundation.