Abstract
Background
With the advent of highthroughput targeted metabolic profiling techniques, the question of how to interpret and analyze the resulting vast amount of data becomes more and more important. In this work we address the reconstruction of metabolic reactions from crosssectional metabolomics data, that is without the requirement for timeresolved measurements or specific system perturbations. Previous studies in this area mainly focused on Pearson correlation coefficients, which however are generally incapable of distinguishing between direct and indirect metabolic interactions.
Results
In our new approach we propose the application of a Gaussian graphical model (GGM), an undirected probabilistic graphical model estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is pairwise Pearson correlation coefficients conditioned against the correlation with all other metabolites. We first demonstrate the general validity of the method and its advantages over regular correlation networks with computersimulated reaction systems. Then we estimate a GGM on data from a large human population cohort, covering 1020 fasting blood serum samples with 151 quantified metabolites. The GGM is much sparser than the correlation network, shows a modular structure with respect to metabolite classes, and is stable to the choice of samples in the data set. On the example of human fatty acid metabolism, we demonstrate for the first time that high partial correlation coefficients generally correspond to known metabolic reactions. This feature is evaluated both manually by investigating specific pairs of highscoring metabolites, and then systematically on a literaturecurated model of fatty acid synthesis and degradation. Our method detects many known reactions along with possibly novel pathway interactions, representing candidates for further experimental examination.
Conclusions
In summary, we demonstrate strong signatures of intracellular pathways in blood serum data, and provide a valuable tool for the unbiased reconstruction of metabolic reactions from largescale metabolomics data sets.
Background
Metabolomics is a newly arising field aiming at the measurement of all endogenous metabolites of a tissue or body fluid under given conditions [13]. The resulting metabolome of a biological system is considered to provide a readout of the integrated response of cellular processes to genetic and environmental factors [4]. Understanding the complex biochemical interplay between hundreds of measured metabolite species is a daunting task, which can be approached by combining advanced computational methods with data from large populationbased studies. On the biochemical level, metabolite concentrations are determined by a set of specific metabolic enzymes. Variabilities in both enzyme activity and metabolite exchange rates  induced by a continuous spectrum of metabolic states throughout measured samples  give rise to characteristic patterns in the metabolite profiles which are directly linked the underlying biochemical reaction network [5,6]. Although human metabolism has been extensively characterized in the past decades [7], the reconstruction of metabolic networks from such metabolite patterns is a key question in the computational research field. Previous attempts focused on linear metabolite associations measured by Pearson correlation coefficients. These include studies utilizing timecourse measurements and clustering [8], theoretical approaches relating metabolite fluctuations to properties of the dynamical system [5] and metabolic control analysis to derive effects of enzyme variability [6]. Other reconstruction methods rely on specific perturbations of the biological system, like the induction of concentration pulses for certain metabolites [9].
A major drawback of correlation networks, however, is their inability to distinguish between direct and indirect associations. Correlation coefficients are generally high in largescale omics data sets, suggesting a plethora of indirect and systemic associations. For example, transcriptional coregulation amongst many genes will give rise to indirect interaction effects in mRNA expression data [10]. Similar effects can be observed in metabolic systems which, in contrast to genetic networks, contain fast biochemical reactions in an open mass flow system. Metabolite levels are supposed to be in quasisteady state compared to the time scales of upstream regulatory processes [11]. That is, metabolites will follow changes in gene expression and physiological processes on the order of minutes and hours, but will appear unchanged on the order of seconds. These properties, even though substantially different from mRNA expression mechanisms, also give rise to indirect, systemwide correlations between distantly connected metabolites.
Gaussian graphical models (GGMs) circumvent indirect association effects by evaluating conditional dependencies in multivariate Gaussian distributions [10]. A GGM is an undirected graph in which each edge represents the pairwise correlation between two variables conditioned against the correlations with all other variables (also denoted as partial correlation coefficients). GGMs have a simple interpretation in terms of linear regression techniques. When regressing two random variables X and Y on the remaining variables in the data set, the partial correlation coefficient between X and Y is given by the Pearson correlation of the residuals from both regressions. Intuitively speaking, we remove the (linear) effects of all other variables on X and Y and compare the remaining signals. If the variables are still correlated, the correlation is directly determined by the association of X and Y and not mediated by the other variables. Partial correlations have recently been applied to biological data sets for the inference of association networks from mRNA expression data [1215], and for the elucidation of relationships between genomic features in the human genome [16]. One previous study used secondorder partial correlations of genetic associations to elucidate genetically determined relations between metabolites [17].
In this manuscript we now study the capabilities of GGMs to recover metabolic pathway reactions solely from measured metabolite concentrations. First, we discuss the quality of the method and possible problems and pitfalls on computersimulated systems. We then apply GGMs to a lipidfocused targeted metabolomics data set of 1020 blood serum samples with 151 measured metabolites from the German population study KORA [18,19]. The GGM is sparse in comparison to the corresponding Pearson correlation network, displays a modular structure with respect to different metabolite classes, and is stable towards changes in the underlying data set. We demonstrate that topranking metabolite pairs and further densely connected subgraphs in the GGM can indeed be attributed to known reactions in the human fatty acid biosynthesis and degradation pathways. In order to systematically verify this finding, we map partial correlation coefficients to the number of reaction steps between all metabolite pairs based on a literaturecurated fatty acid pathway model. We observe statistically significant discriminatory features of GGMs to distinguish between directly and nondirectly interacting metabolites in the metabolic network. In addition, loworder partial correlations turned out to be a suitable alternative to fullorder GGMs for the present dataset. Finally, we will summarize and discuss the relevance of GGMs for metabolomics data sets, point out limitations of the method and suggest future steps. All metabolomics data used in this study, the generated correlation networks, model files and metabolite annotations are available online at http://hmgu.de/cmb/ggm webcite.
Results and Discussion
GGMs delineate direct relationships in artificial reaction systems
Computersimulated reaction systems are a valuable tool for the evaluation of correlationbased measures prior to their application to real metabolomics data sets. Previous works focused on the modeling of biological replicates with intrinsic noise on the metabolite levels [5]. In contrast, we here investigate the effects of variation of enzymatic activity in a human population cohort. Such variation might be genetically determined or, more likely, be the result of distinct regulatory effects and metabolic states between individuals. All reaction systems were implemented as ordinary differential equations with simple massaction kinetics rate laws and reversible MichaelisMententype enzyme kinetics (see Methods). In order to account for the abovementioned enzymatic variability we applied a lognormal noise model, which has been previously described to be a reasonable approximation of cellular rate parameter distributions [20]. The standard deviation σ was set to a value of 0.2 for the underlying normal distribution (note that the results are insensitive to the magnitude of σ). For each parameter sample we calculated the metabolite steady state concentrations on logscale, and subsequently estimated the GGM by calculating partial correlation coefficients. All analyzed systems exhibit single, unique steady states independent of the respective parameter values. This feature was structurally verified using the ERNEST toolbox [21] for all networks except the negative feedback system. For the latter one, we employed empirical initial state sampling to ensure monostability in the given parameter range (see Additional file 1, section 1).
Additional file 1. Further results on computersimulated networks.
Format: PDF Size: 419KB Download file
This file can be viewed with: Adobe Acrobat Reader
The first network we analyzed consists of a linear chain of three metabolites with different variants of reaction reversibility (Figure 1AC). We observe high pairwise correlations for metabolites in mutual equilibrium due to reversible reactions (Figure 1A). This is in accordance with previous findings from [6], where correlationgenerating mechanisms in metabolic reaction networks were identified. Furthermore, this simple example demonstrates how partial correlation coefficients in GGMs discriminate between directly and indirectly related metabolites. If only irreversible reactions are employed in the chain, neither regular correlation networks nor GGMs can distinguish between direct and indirect effects (Figure 1B). Species A is the only input metabolite in the system, and thus completely determines the levels of both B and C. This leads to generally high and nondistinguishable correlations between the three metabolites. However, if we introduce exchange reactions for all species, the GGM again correctly describes the network connectivity (Figure 1C). Such exchange mechanisms are likely to be present for most intracellular metabolites, which usually participate in multiple metabolic pathways (see e.g. KEGG PATHWAY online). Note that for this third case both regular and partial correlation values are notably lower than for the first two chain variants. In addition to linear chains, pathway modules consisting of branched topologies with firstorder, reversible reactions are correctly reconstructed by our method (Figure 1D). An overview of the reconstruction accuracy of GGMs on various types of firstorder networks with different variants of reaction reversibility can be found in Additional file 1, section 2.
Figure 1. Evaluation of correlation networks (CN) and Gaussian graphical models (GGM) on artificial systems. Line widths represent relative edge weights in the respective networks (scaled to the strongest edges). A: Linear chain of three metabolites with reversible intermediate reactions. While the standard Pearson correlation network (CN) is fully connected, implying an overall high correlation of all metabolites, the GGM correctly discriminates between direct and indirect interactions. B: Linear chain with irreversible intermediate reactions. Neither CN nor GGM can distinguish direct from indirect effects, as metabolite A equally determines the levels of both B and C. C: Linear chain with irreversible reactions and input/output reactions for each metabolite. Although the edge weights for both CN and GGM are generally lower, the GGM now correctly predicts the network topology. D: Branchedchain firstorder networks are correctly reconstructed by the GGM. E: Endproduct inhibition modules. When modeled as an open system, A is decoupled from the other metabolites and reconstruction fails at this point. Dashed lines mark enzyme inhibition interactions, larger arrows to the right indicate faster forward than backward reactions. F: Cofactordriven network resembling the first three reactions from the glycolysis pathway. A correlation network fails to predict the correct pathway relationships. G: Nonlinear system with a bimolecular reaction. The GGM predicts only a only weak interaction between B and C. This is due to counterantagonistic processes of isomerization and substrate participation in the same reaction.
Interestingly, for some reaction setups, the accuracy of the method improves drastically with an increasing amount of external noise. Specifically, if the metabolite transport towards a pathway is subject to higher fluctuations, the GGM edge weight difference between directly and indirectly connected metabolites becomes larger. For a detailed discussion of this finding we refer the reader to Additional file 1, section 3. The second question we addressed with artificial reaction networks was the influence of enzymecatalyzed reactions on GGM estimation. Therefore we setup reaction chains with four metabolites incorporating reversible enzymatic reactions. Forward maximal reaction rates V_{max }were set twice as fast as the backward reactions in order to ensure a directed mass flow. We found that the usage of MichaelisMententype enzyme kinetics instead of massaction kinetics does not alter our general findings. When forward reaction rates exceed backward reactions by far, the GGM discrimination quality is impaired. This is in line with the observation that purely irreversible reactions cannot be distinguished in the massaction case (see above). Other specific parameters, like the Michaelis constant K_{M }, did not affect GGM calculation (Additional file 1, section 4). Another important aspect of enzymecatalyzed reactions are allosteric regulation mechanism, like endproduct inhibition for instance, which constitutes a negative feedback from the end to the beginning of a pathway [22]. The reconstruction results differ depending on whether exchange reactions are included in the system for not (Figure 1E). If the inhibitory module represents a closed system (no external fluxes except for the first and last metabolite), the regulatory interaction does not in influence GGM calculation. The net metabolite turnover speed might be drastically affected, but the topological effects of this reaction chain on the correlation structure remain unchanged. In contrast, when exchange reactions are introduced (second example in Figure 1E), the inhibition decouples A from the other metabolites and the reconstruction fails for this metabolite. Detailed results for different strengths of the inhibitory interaction are presented in Additional file 1, section 5.
Next, we studied the influence of cofactordriven reactions on the reconstruction. Cofactors are ubiquitous substances usually involved in the transfer of certain molecular moieties or redox potentials [23]. We investigated such cofactorcoupled reactions (a) because they introduce nonlinearity in the simulated dynamical systems, and (b) because cofactors are usually involved in many reactions and thus generate networkwide metabolite dependencies. We set up a network resembling the first three reactions from the glycolysis pathway. It consists of four metabolites and two energy transferrelated cofactors, ATP and ADP, involved in two phosphorylation reactions [24]. Again the GGM precisely describes metabolite connectivity in the system, whereas a regular correlation graph leads to false interpretations of the network topology (Figure 1F). Cofactors were modeled with input and output reactions to the rest of the metabolic system in order to account for the abovementioned participation of cofactors in various reactions of the system. Again, it makes a substantial difference whether such exchange reactions are included in the model or not. Since our toy model only represents a small part of a larger system, missing exchange reaction for cofactors would create a false mass conservation relation that compromises correlation calculation. Finally, we investigated the effects of rate laws with nonlinear substrate dependencies in the absence of cofactors. Therefore we modeled a reversible, bimolecular split reaction with isomerization of the two substrates (Figure 1G). An example of such a reaction network can be found in the glycolysis pathway between fructose1,6bisphosphate, glyceraldehyde3phosphate and dihydroxyacetone phosphate. Our simulations demonstrate that again a regular Pearson correlation network cannot delineate direct from indirect relationships in the pathway. The GGM only detects a weak association between B and C. This is due to counterantagonistic processes in this reaction setup: isomerization and other reversible reactions generally induce positive correlations, whereas coparticipation as substrates in the same reaction induces negative correlations. Such effects of correlationgenerating mechanisms which cancel each other out have been described before [6] and pose a problem to all reconstruction approaches which rely on linear dependencies.
The drawbacks of correlationbased methods discussed in this section, especially inhibitory mechanisms with exchange reactions and antagonistic mechanism, have to be kept in mind when attempting to reconstruct metabolic reactions from steady state data. For the present study, however, we assume the primarily linear lipid pathways not to contain such problematic reaction motifs.
A GGM inferred from a largescale populationbased data set displays a sparse, modular and robust structure
In the following we estimated a Gaussian graphical model using targeted metabolomics data from the German population study KORA [18] ("Kooperative Gesundheitsforschung in der Region Augsburg"). We used a subset of the data set previously evaluated in a genomewide association study [19], containing 1020 targeted metabolomics fasting blood serum measurements with 151 quantified metabolites. The metabolite panel includes acylcarnitines, four classes of phospholipid species, amino acids and hexoses (see Methods). Both regular Pearson correlation coefficients and partial correlation coefficients (inducing the GGM) were calculated on the logarithmized metabolite concentrations. All edges corresponding to correlation values significantly different from zero now induce the networks displayed in Figure 2A+B. In order to exclude correlation effects generated by genetic variation in the study cohort, we investigated the in influence of SNP allele data from [19] on the GGM calculation. We found genetic effects to be neglectable (see Additional file 2), indicating that GGMs capture intrinsic biochemical properties of the system.
Figure 2. Network properties of the correlation network (CN) and Gaussian graphical model (GGM) inferred from a targeted metabolomics population data set (1020 participants, 151 quantified metabolites). A+B: Graphical depiction of significantly positive edges in both networks, emphasizing local clustering structures. Each circle color represents a single metabolite class. C+D: Histograms of pairwise correlation coefficients (i.e. edge weights) for both networks. Green lines indicate the median values, red lines denote a significance level of 0.01 with Bonferroni correction. The CN displays a general bias towards positive correlations throughout all metabolites. For the GGM, the median value lies around zero and we observe a shift towards significantly positive values. E+F: Modularity between metabolite classes measured as the relative outdegree from each class (rows) to all other classes (columns). The GGM (right) shows a clear separation of metabolite classes, with some overlaps for the different phospholipid species diacylPCs, lysoPCs, acylalkylPCs and sphingomyelins. Values range from white (0.0 outdegree towards this class) to black (1.0). PCs = phosphatidylcholines.
Additional file 2. Effects of genetic variation on GGM calculation.
Format: PDF Size: 129KB Download file
This file can be viewed with: Adobe Acrobat Reader
Pearson correlation coefficients show a strong bias towards positive values in our data set (Figure 2C); a typical feature of highthroughput data sets, also observed e.g. in microrarray expression data, which can be attributed to unspecific or indirect interactions [10]. We obtain 5479 correlation values significantly different from zero with (α = 0.01 after Bonferroni correction), yielding an absolute significance correlation cutoff value of 0.1619 (see Methods). In contrast, the GGM shows a much sparser structure with 417 significant partial correlations after Bonferroni correction (Figure 2D). Most values center around a partial correlation coefficient of zero, whereas we observe a clear shift towards positive significant values. Note that negative partial correlations provide particular information that will be discussed later in this manuscript.
The GGM displays a modular structure with respect to the seven metabolite classes in our panel, while the class separation in the correlation network appears rather blurry (Figure 2E+F). We observe a clear separation of the amino acids and acylcarnitines from all other classes. The four groups of phospholipids (diacylPCs, lysoPCs, acylalkylPCs, and sphingomyelins) still showed locally clustered structures, but are strongly interwoven in the network. This is probably an effect of the dependence of all phospholipids on a similar fatty acid pool and, subsequently, the biosynthesis pathway acting on this substrate pool. In order to get an objective quantification of this observation, we calculated the groupbased modularity Q on all significantly positive GGM edges according to [25] (see Methods). The same measure was calculated for 10^{5 }randomized GGM networks (random edge rewiring). For the original GGM we obtain a modularity of Q = 0.488, and the random networks yield Q = 0.118 ± 0.016, resulting in a highly significant zscore of z = 23.49. Furthermore, the modularity value induced by using the metabolite classes was compared to a partitioning optimized by simulated annealing. The optimized modularity is only slightly higher with Q = 0.557 and the resulting partitioning is very similar to the metabolite classes (see Additional file 3). Performing the modularity analysis with the full, weighted partial correlation matrix produces equivalent results (also shown in S3).
Additional file 3. Modularity: Optimized partitioning and weighted calculation.
Format: PDF Size: 120KB Download file
This file can be viewed with: Adobe Acrobat Reader
An important question for a multivariate statistical measure such as partial correlations is the robustness with respect to changes in the underlying data set. Furthermore, the dependence of the measure on the size of the data set needs to be addressed. To answer these questions, we performed two types of perturbations of our data set. First, we applied sample bootstrapping with 1000 repetitions and compared the resulting partial correlations to the original data set (Additional file 4, Figure S1). We observe small mean differences with low standard deviation (0.03 ± 8. 2 · 10^{4}). This indicates that for a large data set with n = 1020 samples, GGMs are robust against the choice of samples. We assume that each distinct metabolic state in the cohort is captured by a bootstrap sample, and thus all information required to calculate the GGM is contained. In addition to the bootstrap analysis, we estimated partial correlations for continuously decreasing sample sizes (Additional file 4, Figure S2). For each data set size we randomly picked samples from the original data set and repeated the procedure 100 times. The analysis shows that the GGM is stable even under decrease of the sample number. For instance, for a data set containing only around half of the original samples (n = 530) we get a partial correlation difference of 0.03 ± 6.9 10^{4}. Only when the number of samples gets close to the number of variables (m = 151) the correlation matrix becomes illconditioned and strong differences from the original partial correlations occur. These problems of smaller metabolomics studies could be dealt with by regularization approaches or the usage of loworder partial correlation [26]. Taken together, our results demonstrate that the analyzed metabolomics data set is sufficient to robustly elucidate relationships between the measured metabolites.
Additional file 4. Stability of the GGM with respect to changes in the underlying data set.
Format: PDF Size: 75KB Download file
This file can be viewed with: Adobe Acrobat Reader
Strong GGM edges represent known metabolic pathway interactions
The next step in our analysis was the manual investigation of metabolite pairs displaying strong partial correlation coefficients. Remarkably, we are able to provide pathway explanations for most metabolite pairs in the top 20 positive partial correlations (Table 1). In the following, we will specifically discuss interesting, highscoring metabolite pairs along with their responsible enzymes in the metabolic pathways.
Table 1. Top 20 positive GGM edge weights (i.e. partial correlation coefficients, PCC) in our data set along with proposed metabolic pathway explanations
The highest partial correlation in the data set with ζ = 0.821 is found for the two branchedchain amino acids Valine and xLeucine, where the latter compound represents both Leucine and Isoleucine (which have equal masses and are not distinguishable by the present method). The three metabolites are in close proximity in the metabolic network concerning their biosynthesis and degradation pathways. Further related amino acid pairs that display significant partial correlations are Histidine and Glutamine (ζ = 0.383), Glycine and Serine (ζ = 0.326) as well as Threonine and Methionine (ζ = 0.298).
Clearcut signatures of the desaturation and elongation of long chain fatty acids can be seen for various sphingomyelins and lysoPCs (Figure 3A). For example, SM C18:0 and SM C18:1 strongly associate with ζ = 0.767, most probably representing the initial Δ9 desaturation step of the polyunsaturated fatty acid biosynthesis pathway from C18:0 to C18:1Δ9 by SCD (SteaorylCoA desaturase). The similarly high partial correlation between SM C16:1 and SM C18:1 (ζ = 0.765) as well as lysoPC a C16:1 and lysoPC a C18:1 (ζ = 0.315) can be attributed to the ELOVL6dependent elongation from C16:1Δ 9 to C18:1Δ 11. Interestingly, this reaction is not contained in the public reaction databases but has been previously described by [27].
Figure 3. Biochemical subnetworks identified by the GGM. Line widths correspond to partial correlation coefficients. A: Elongation and desaturation signatures, most likely mediated by ELOVL6 and SCD, for C16 and C18 fatty acids incorporated in lysoPCs and sphingomyelins. B: Top: Diacylphosphatidylcholine (PC aa) species with elongation and peroxisomal βoxidation associations. Several combinatorial variants of side chain compositions are possible for C36:4 and C38:4, and thus different enzymes could mediate this connection. Bottom: Alkylacylphosphatidylcholines (PC ae) with supposedly distinct side chain composition, giving rise to a low association with a directly connected species (C36:2). C: Recovered βoxidation pathway from C18 down to C4. Four enzymes with overlapping substrate specificities catalyze the ratelimiting reactions of this pathway. D: Two highscoring triads, where metabolite pairs with a pathway distance of two constitute strong partial correlations. This feature of partial correlations aids in the reconstruction of the network topology beyond the direct neighborhood of each metabolite.
We identify a variety of strong GGM edges between diacylPC (lecithins, PC aa) and acylalkylPC (plasmalogens, PC ae) metabolite pairs (Figure 3B). For instance, PC aa C34:2 and PC aa C36:2 associate strongly with ζ = 0.735, and PC aa C36:4 and PC aa C38:4 show a partial correlation of ζ = 0.672. While the first pair can be precisely explained by an elongation from C16:0 to C18:0 by ELOVL6, different combinatorial variants come into play for the PC aa C36:4/PC aa C38:4 pair. Our massspectrometry technique only measures brutto compositions, that is the bulk side chain carbon content and total degree of desaturation. Depending on the exact composition of both fatty acid residues in the respective lipids, this association could be caused by longchain elongations (C14 to C16 and C16 to C18 through fatty acid synthase and ELOVL6, respectively), by verylongchain elongations (C22:4 to C24:4 through ELOVL2 or ELOVL5) and even by peroxisomal β oxidation of fatty acids (through ACOX1 or ACOX3). An interesting situation arises for the phospholipids PC ae C34:2, PC ae C36:3 and PC ae C36:2. From its brutto formula the latter species could represent an intermediate step between the other two metabolites. However, it associates poorly with both other phospholipids, which in turn display a strong partial correlation (ζ = 0.752). This finding can be explained by distinct fatty acid side chain compositions, showing differential incorporation of C18:0, C18:1 and C18:2 (Figure 3B, bottom).
For the acylcarnitine group we observe a remarkably high partial correlation of ζ = 0.735 for C8carn and C10carn and further acylcarnitine pairs with a carbon atom difference of two (Figure 3C). These associations can be attributed to the βoxidation pathway, i.e. the catabolic breakdown of fatty acids in the mitochondria [23]. During this degradation process, C_{2 }units are continuously split off from the shrinking fatty acid chain. Four acylCoA dehydrogenases, ACADS, ACADM and ACADL, ACADVL, catalyze the rate limiting reactions of βoxidation for different fatty acid chain lengths [28,29]. Our interpretation of acylcarnitine correlations as signatures of mitochondrial βoxidation is in accordance with [19], where we identified associations between C8+C10, C12 and C4 with genetic variation in the ACADM, ACADL and ACADS loci, respectively.
We observe several associations that were not directly attributable to enzymatic interactions in the fatty acid biosynthesis or degradation pathways. For instance, lysoPC a 18:1 and lysoPC a 18:2 share a strong GGM edge (ζ = 0.543) although the Δ12desaturation step from oleic acid to linoleic acid is known to be missing in humans [30]. This missing reaction gives rise to the essentiality of fatty acids in the ω6 unsaturated fatty acid pathway. A functional explanation could be a systemic equilibrium between the two fatty acids or remodeling processes specific for the lysoPC metabolite class. Further examples are high partial correlations between the hydroxy sphingomyelins SM (OH) C22:1 and SM (OH) C22:2 (ζ = 0.743) as well as the sphingomyelins SM C24:0 and SM C24:1 (ζ = 0.577). To the best of our knowledge, there is no evidence for such fatty acid desaturation reactions in humans. The detected associations might therefore represent novel pathway interactions recovered by the Gaussian graphical model.
Negative values play a particular role in the interpretation of partial correlations coefficients. On the one hand, they obviously occur whenever regular negative correlations are involved. Mechanisms giving rise to negative correlations are, for example, coparticipation in the same reaction (cf. Figure 1E), mass conservation relations [6] or opposing regulatory effects. It is to be noted, however, that negative correlations are rare in our specific metabolomics data set (cf. Figure 2C). On the other hand, due to the mathematical properties of partial correlation coefficients negative partial correlation coefficients occur whenever two metabolites A and B have a strong correlation with a third metabolite C, but do not share a high correlation value with each other. Two examples from our data set are shown in Figure 3D. First, SM C18:0 is negatively partially correlated with SM C16:1, and both of these in turn are highly positively partially correlated with SM C18:1. The fatty acids C16:1 and C18:0 have no direct connection in the pathway, causing the strong negative partial correlation value. A similar situation can be found for three diacylPCs: PC aa C34:2 and PC aa C36:1 show a high partial correlation with PC aa C36:2, but a negative partial correlation with each other. Again, there is no possible direct reaction from a C34:2 lipid species to a C36:1 species. Not all metabolite triads in the network show such a onenegative/twopositive motif. But if present, they provide another step in the reconstruction of metabolic pathways (beyond the direct neighborhood of each metabolite) by detecting metabolites which are exactly two steps apart.
Partial correlation coefficients discriminate between directly and indirectly connected metabolites in a literaturecurated fatty acid pathway model
The analyses from the previous section strengthened our conception that a GGM inferred from blood serum metabolomics data represents true metabolite associations. To systematically assess how GGM edges and pathway proximity between our lipid metabolites are related, we generated a literaturebased model of fatty acid biosynthesis (Figure 4A). This model includes reactions from the public databases BiGG (H. sapiens Recon 1) [7], the Edinburgh Human Metabolic Network [31] and KEGG PATHWAY [29]. We then mapped the partial correlation coefficients from the KORA data set onto the minimal number of reaction steps between each pair of metabolites (pathway distance). Since our metabolite panel contains fattyacid based lipids, we project the respective lipid compositions onto the fatty acid biosynthesis pathway (Figure 4BD). For the analysis of acylcarnitines we implemented a model of the βoxidation pathway, consisting of a linear chain of C2 degradation steps (C10→C8→C6 etc.).
Figure 4. Fatty acid biosynthesis model and pathway distance calculation method. A: De novo synthesis of fatty acids with initial SCDdependent desaturations (left), and the ω3 and ω6 polyunsaturated fatty acid pathways (middle and right). Note that we omitted the specific positions of each doublebond since the massspectrometry technique in our study does not resolve positional information. B: Exemplary distance calculation on two lysoPCs. We projected lipid side chain compositions onto the respective fattyacid biosynthesis reactions. Reaction reversibility is not taken into account in our calculation, i.e. distances are always symmetric. C: If no known pathway connection between two fatty acids exists, we assign a formal distance of infinity. D: For phospholipids that contain two fatty acid residues we need to take into account all combinatorial variants. We here show three variants for the connection between PC aa C38:4 and PC aa C38:5. In these examples, PC aa C38:4 could either consist of C18:0+C20:4 or C16:0+C22:4, while PC aa C38:5 could be C18:0+C20:5 or C16:0+C22:5. The shortest possible distance, one in this case, will be used for further calculations.
We observe a strong tendency towards significantly positive partial correlations for a pathway distance of one, i.e. directly connected metabolite pairs, for all five metabolite classes (Figure 5A). In total, 86 out of 130 partial correlations (66%) for a pathway distance of one are significantly positive. For instance, for the lysoPC class (Figure 5A) nearly all partial correlation coefficients for a pathway distance of one are above significance level, whereas most values for a distance of two or larger remain insignificant. Some outliers from this observation, however, require closer inspection: First, for some metabolite classes we observe negative partial correlation values for metabolite pairs that are exactly two steps apart in the metabolic pathway: 10 of 73 partial correlations in the diacylPC class and 2 of 2 partial correlations in the sphingomyelin class are significantly negative for a distance of two. These negative values are effects of the coregulated metabolite triads described previously in this text. Second, we find 91 of 932 (~9:8%) unconnected metabolite pairs (pathway distance = ∞) with a partial correlation above significance level. These pairs represent potentially novel pathway predictions, missing interactions in the model or effects upstream of the metabolic network like enzyme coregulation.
Figure 5. Systematic evaluation of partial correlation coefficients versus pathway distances. Dashed lines in A and B indicate a significance level of 0.01 with Bonferroni correction. A: Pathway distances from our consensus model against partial correlation coefficients for the five lipidbased metabolite classes in our data set. We observe an enrichment of significant partial correlations for a pathway distance of one, which rapidly drops for an increasing number of pathway steps. B: Comparison of partial correlation coefficients and Pearson correlation coefficients. Pearson correlation coefficients are generally high, independent of the actual pathway distance, indicating for systemic coregulation effects throughout the lipid metabolism. C: Wilcoxon rank sum test pvalues between the partial correlation distributions of directly and indirectly connected pairs, and sensitivity/specificity/F_{1 }values measuring the discriminatory power to distinguish direct from indirect pairs.
A direct comparison of both partial and Pearson correlation coefficients for the diacylphosphatidylcholine class is shown in Figure 5B. As described earlier in this manuscript, we observe a general overabundance of significant Pearson correlations independent of the actual pathway distance. Even for the metabolites without a known pathway connection, 1394 of a total of 1569 Pearson correlations are significant (88.85%, over all classes), in contrast to 131 out of 1569 for the partial correlations (8.35%).
The significantly different correlation value distributions between directly and indirectly linked metabolites (Figure 5A+B) barely provide a good quantification of the actual discrimination accuracy of this feature. Therefore we assessed the discriminative power of partial correlations to tell apart direct from indirect interactions by means of sensitivity and specificity. The sensitivity evaluates which fraction of directly connected metabolites in the pathway are recovered by significant GGM edges, whereas the specificity states how many of the significant edges actually represent a direct connection. A commonly used trade off measure between sensitivity and specificity is the F_{1 }score, which is defined as the harmonic mean of both quantities [32] (see Methods). Figure 5C lists sensitivity, specific city and F_{1 }for all 5 metabolite classes along with an evaluation of partial correlation distribution differences between directly and indirectly linked metabolites (determined by Wilcoxon's ranksum test). F_{1 }values over 0.75 and significant pvalues for the ranksum test indicate a strong discrimination effect of partial correlation coefficients concerning direct vs. indirect pathway interactions. Possible reasons for nonperfect sensitivity and specific city values will be discussed in detail at the end of this text.
Loworder partial correlations
The data set from our present study contained enough samples to calculate fullorder partial correlations, that is to calculate pairwise correlations conditioned against all other n2 metabolites. However, previous studies demonstrated that loworder partial correlation approaches can already be sufficient to elucidate direct interactions [12,16]. In order to assess how these measures perform in comparison to the fullorder GGM, we calculated first, second and third order partial correlations using the approach developed by [12] for both computersimulated networks and the metabolomics data (Additional file 5). The toy systems reveal clear cases where loworder approaches fail, for instance in the diamond motif displayed in Figure 1D. Surprisingly, however, especially firstorder partial correlations worked remarkably well in discriminating direct from indirect interactions in the real data (F_{1 }values close to those displayed in Figure 5C). This result provides two valuable pieces of information. First, loworder partial correlation approaches, which require much less samples to obtain stable estimates, appear to be a suitable alternative to GGMs for the metabolite panel used in this study. Second, the high relative scoring of firstorder partial correlations provides insights into the correlation structures in the data set. In particular, this result indicates that the underlying metabolic pathways are primarily composed of acyclic, linear chains, which fits well to the fatty acid pathways dominating our measured lipid species.
Additional file 5. comparison with loworder partial correlation approaches.
Format: PDF Size: 153KB Download file
This file can be viewed with: Adobe Acrobat Reader
Conclusions
In this paper we addressed the reconstruction of metabolic pathway reactions from highthroughput targeted metabolomics measurements. Previous reconstruction approaches employed pairwise association measures, primarily standard Pearson correlation coefficients, to infer network topology information from metabolite profiles [5,6,8,33]. We here demonstrated the usefulness of Gaussian graphical models and their ability to distinguish direct from indirect associations by estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is the Pearson correlation between two metabolites corrected for the correlations with all other metabolites.
From computer simulations of metabolic reaction networks we deduced a set important aspects to be considered when interpreting partial correlation coefficients in reaction systems: (a) Metabolites in equilibrium due to reversible reactions can readily be recovered, whereas irreversible reactions pose a substantial problem to associationbased reconstruction attempts (in concordance with [6]). (b) Input and output reactions for intermediate metabolites, however, improve the reconstruction accuracy. Such exchange reactions are likely to be present for most naturally occurring metabolites due to highly interconnected metabolic pathways. (c) With an increasing amount of fluctuations on the input reaction, the partial correlation difference between direct and indirect interactions increases for certain network topologies (e.g. for the irreversible linear metabolite chains). This indicates that a high heterogeneity of metabolic states in a population data set like the KORA cohort might be beneficial rather than problematic for our approach. (d) Metabolite connectivity in cofactordriven networks can be accurately reconstructed. The presence of exchange reactions for cofactors, as they are likely to be present in real systems, has substantial impact on the reconstruction quality. The connectivity of the cofactors themselves, however, remains spurious. (e) Saturation effects in enzymecatalyzed reactions do not pose a problem for the reconstruction process. However, inhibitory influences in metabolic modules that include exchange reactions might decouple certain metabolites and lead to false negative results. (f) Nonlinear rate laws and antagonistic, correlationgenerating mechanisms might impair reconstruction quality.
In the next step we inferred both a GGM and a regular correlation network from a largescale metabolomics data set with 1020 strictly standardized samples from overnight fasting individuals measured by stateofthe art metabolomics technologies [19]. We investigated the influence of the 15 genomewidesignificant SNPs from this study on our GGM and demonstrated that genetic variation in the general population is neglectable for partial correlation calculation. We found that the GGM displays a much sparser structure than regular correlation networks. Only around 400 partial correlation values were above significance level (~3.6%), whereas half of all Pearson correlation values were significant after Bonferroni correction. This depicted the nature of partial correlation coefficients to neglect indirect associations between distantly related metabolites. We detected a strongly modular structure in the GGM with respect to the different metabolite classes, except for the four types of phospholipids which appear slightly interwoven. This provides a unique picture of the separation of metabolic pathways (synthesis, degradation and amino acid metabolism), but also the interaction between different lipid classes dependent on a single intracellular fatty acid pool. Finally, GGMs were stable with respect to both choice and number of samples in the data set. Even a smaller data set with only a few hundred samples would have been sufficient to achieve the results from this study. The estimation of GGMs for data sets with less samples than metabolites is possible [26], but notable deviations from the true partial correlation coefficient shave to be expected.
Manual investigation of highscoring substructures in the GGM revealed groups of metabolites that could be directly attributed to reaction steps from the human fatty acid biosynthesis and degradation pathways. We detected effects of ELOVLmediated elongations and FADSmediated desaturations of fatty acids as well as signatures of the catabolic βoxidation pathway. For instance, our method successfully recovered a direct elongation from C16:1 to C18:1, which has been experimentally shown by [27] but is not present in the public reaction databases. Furthermore, we identified highly negative partial correlations as an indication for a pathway distance of two, serving as a further hint in the reconstruction of metabolic network topology. In order to systematically evaluate whether high partial correlations represent direct interactions, we generated a consensus model of fatty acid biosynthesis reactions from three publically available reaction databases. By mapping partial correlation coefficients to the number of reaction steps between two metabolites we observed a statistically significant enrichment of high values for a pathway distance of one. We calculated a high accuracy for partial correlations to discriminate between directly and indirectly associated metabolites, as measured by sensitivity, specificity and the F_{1 }measure. Interestingly, we could show that the discrimination quality of loworder partial correlations [12], especially the firstorder variants, is close to the fullorder GGM. Even though this might be a feature specific to the metabolite panel used in this study, loworder partial correlations represent a suitable alternative especially for studies with only few samples. If more samples than variables are available, however, we recommend GGMs as an unbiased approach conditioning against as many parameters as possible.
Taken together, our results demonstrate that GGMs inferred from metabolomics measurements in blood plasma samples reveal strong signatures of intracellular and even innermitochondrial processes. Previous studies on blood plasma samples detected similar relationships with cellular processes based on genetic associations [19] and case/control drug trials [34]. In this work we could now show that metabolite profiles alone are sufficient to capture the dynamics of metabolic pathways.
However, GGMs can never provide a perfect reconstruction of the underlying system. There are several factors that lead to the absence of high partial correlations between interacting metabolites, that is false negative edges in the GGM: (a) Counterantagonistic correlationgenerating processes and bimolecular reactions (see above) might lead to the elimination of pairwise association; cf. [6]. (b) The respective enzyme might not be active in the current metabolic state, or its effects on the respective metabolite pools are neglectable. (c) Contrary to our general finding that even blood plasma metabolites carry strong signatures of metabolic pathways, the signal might be diminished for certain types of metabolites. Furthermore, the actual origins of blood plasma metabolites, e.g. in terms of measured cell types or causal tissue activity, still remain to be unraveled. The abovementioned mechanisms are possible explanations for the nonperfect sensitivity values observed in Figure 5C. False positive GGM edges, on the other hand, provide interesting new metabolic pathway hypothesis. The presence of strong partial correlations in the absence of known metabolic connections could point out missing pathway information or regulatory effects not captured in a simple stoichiometric representation of the pathway.
Conclusively, this study presented Gaussian graphical models as a valuable tool for the recovery of biochemical reactions from highthroughput targeted metabolomics data. The present work could be extended by comparing high partial correlation coefficients with enzyme activity or expression data, or by the experimental validation of promising interaction candidates. We suggest using GGMs as a standard tool of investigation in future metabolomics studies, utilizing the upcoming wealth of metabolic profiling data to form a more comprehensive picture of cellular metabolism.
Methods
In silico simulation of artificial reaction networks
Let x = (x_{1},..., x_{r}) be a vector of metabolite concentrations and S ∈ ℤ^{m×r }the stoichiometry matrix of a dynamical system with m metabolites and r reactions. Each column in S represents the compound stoichiometry of a single reaction, with negative values for the educts of a reaction and positive values for its products (cf. [35]). Furthermore, we define an educt stoichiometry matrix S^{e}, which only contains the negative values from S. The reaction rate laws v can be written as v(x, k) = diag(k)c(x), where k := (k_{1},...,k_{r}) represents a vector of elementary rate constants and contains the products of substrate concentrations according to the law of mass action[36]. For example, for the reaction x_{1 }+ x_{2 }→ x_{3 }we obtain c = x_{1}x_{2}, and 2x_{1 }+ 3x_{2 }→ 2x_{3 }yields c = x_{1}^{2}x_{2}^{3}. For enzymecatalyzed reactions i, the corresponding entries in v are formulated using reversible MichaelisMententype kinetics [37,38] instead of the massaction term above:
Where and are the product and substrate formation constants, respectively, and represent the Michaelis constants for substrate and product, [S] represents the substrate concentration and [P] represents the product concentration. Note that we omitted reactionspecific parameter indices for simplicity here. Allosteric regulation was modeled using a mixed inhibition mechanism, which extends the rate law from equation (1) as follows:
with [I] being the inhibitor concentration, K_{i }the binding rate of the inhibitor to the enzyme and K_{ii }the binding rate of the inhibitor to the substrateenzyme (or productenzyme) complex. In a simple mixed (noncompetitive) inhibition scenario, we assume K_{i }= K_{ii}.
The ordinary differential equations describing the temporal evolution of the system are now given as
To introduce variability each parameter is subject to fluctuations according to a lognormal distribution with mean 1 and changing variances: . Finally, for fixed S and k, Pearson and partial correlations (see below) are calculated by drawing the vector k multiple times from the parameter distribution, calculating the corresponding metabolite steady state concentrations and logarithmizing the obtained values. If the system contains only zerothorder and firstorder reactions (i.e. input reactions and reactions with only one substrate), the steady state concentrations for a given k can be readily computed by equating (2) to zero and solving for c using linear algebra techniques. On the other hand, if higher order reactions are present, the ODEs are integrated numerically and simulated until equilibrium to get corresponding steady states. For this purpose, a variableorder solver for stiff differential equations (ode15s) from MATLAB was used [39]. The presence of a unique, single positive steady state was shown for each network individually using the ERNEST toolbox [21], or by empirical evaluation (parameter and initial value sampling). For a detailed analysis we refer the reader to Additional file 1, section 1.
Computation of correlation network and Gaussian graphical model
Let X = (x_{kl}) be the ℝ^{n×m }matrix of logarithmized metabolite concentrations (either measured data samples or computersimulated steady states), where n is the number of samples, and m again represents the number of metabolites. Then the standard Pearson productmoment correlation coefficients P = (ρ_{ij}) between metabolites are calculated as
where represents the mean value of metabolite i. Since we use a Gaussian graphical model, the conditional distributions are also Gaussian. Their width and the corresponding partial correlation coefficients can be calculated as
A partial correlation value ζ_{ij }denotes the pairwise correlation of metabolites i and j corrected for the effects of all remaining metabolites. Since our study design contains more samples than measured variables, the correlation matrix has full rank and its inverse can be straightforwardly determined. First,second, and thirdorder partial correlations were calculated using the software published in [12]. To assess the significance of partial correlations, pvalues p(ζ_{ij}) were calculated using Fisher's ztransform [40]:
where ϕ stands for the cumulative distribution function of the standard normal distribution. In order to account for multiple testing, Bonferroni correction was applied to obtain an estimate of the significance level. Note that Bonferroni correction is the most conservative approach for multiple testing; it assumes independence of all tested values, which is certainly not the case for partial correlation coefficients. Based on a nominal significance level of α = 0.01, we retrieve an adjusted level of after Bonferroni correction. Solving equation (3) for ζ_{ij }yields a minimum absolute partial correlation coefficient of 0.1619 for the given significance level. That is, all partial correlations smaller than 0.1619 or larger than 0.1619 are considered significant.
Bootstrapping was performed by randomly drawing 1020 samples with replacement from the original data set. For the second stability analysis, the investigation of different data set sizes, the respective number of samples was randomly drawn from the original data set. The whole procedure was repeated 100 times to get a stable estimate of the deviation.
Network modularity calculation
We define the adjacency matrix ξ_{ij }of a new unweighted, undirected graph induced by all significantly positive partial correlations in ζ_{ij}:
where represents the significance level after multiple testing correction. Now let (V_{1},...,V_{6}) be the partitioning of the metabolites into the six metabolite classes: acylcarnitines, diacylPCs, lysoPCs, acylalkylPCs, sphingomyelins and amino acids (the hexose is left out as only a single metabolite belongs to that class). We calculated the relative outdegree R_{ij }∈ ℝ^{6×6 }from each class to the other classes, (i.e. the proportion of its edges each class shares with the other classes) as:
where represents the total number of edges between V' and V", and V = UV_{i }contains all metabolites in the network. The total network modularity Q of the network can be quantified according to [41] as:
Intuitively, this measure compares the withinclass edges with the edges to the rest of the network. The more edges there are within each class in comparison to the other classes, the higher Q will be. Note that equation (4) can be applied to both weighted and unweighted graphs. To assess the significance of the observed value, we performed graph randomization by edge rewiring [42,43] and subsequent calculation of Q. During the rewiring process we randomly pick two edges from the network and exchange the target nodes of each edge. In order to achieve sufficient randomization, this operation is repeated 5 · e times, where e represents the number of edges in the graph. To perform edge reshuffling on weighted graphs, we decided on a neighborpreserving variant as described in [44].
Study cohort and metabolite panel
KORA (Kooperative Gesundheitsforschung in der Region Augsburg) is a research platform in southern Germany with a primary focus on cardiovascular diseases, Diabetes mellitus type 2, and genetic epidemiology [18]. Fasting serum concentrations from n = 1020 individuals in the KORA F4 were determined by electrospray ionization tandem mass spectrometry (ESIMS/MS) using the Biocrates AbsoluteIDQ™targeted metabolomics kit technology. These samples represent a subset of the data set previously evaluated in a genomewide association study in [19].
A total of m = 151 metabolites were measured in the experiments: 14 amino acids including 13 proteinogenic amino acids and ornithine; hexose (sugars with 6 carbon atoms, e.g. glucose and fructose); 23 acylcarnitines [Cx:ycarn] (with x carbon atoms and y double bonds), 7 hydroxyacylcarnitines [Cx:yOHcarn], 6 dicarboxyacylcarnitines [Cx:yDCcarn], and 2 methylated dicarboxyacylcarnitines variants [Cx:yMDCcarn]; 9 sphingomyelins [SM Cx:y] and 5 hydroxysphingomyelins [SM Cx:yOH]; and 87 phosphatidylcholines (PC). These glycerophospholipids are further subdivided with respect to the presence of ester and ether bonds of fatty acid residues with the glycerol moiety. The set contains 36 diacylPCs with two esterified fatty acid residues [PC aa Cx:y], 38 acylalkylPCs with one etherbond at the sn2 position [PC ae Cx:y] and 13 lysoPCs with only one esterified fatty acid residue at the sn1 or sn2 position [lysoPC a Cx:y]. Our mass spectometry technology cannot distinguish between the side chains of diacylphospholipids. The measured compounds are thus associated with the sum of carbon atoms and double bounds for both fatty acid residues. To ensure lognormality, we compared QQplots against normal distributions [45] for both nonlogarithmized and logarithmized metabolite concentrations. All distributions were closer to lognormality than to regular normality (not shown), so we logarithmized the metabolite concentrations for the following analysis steps.
Sensitivity and specificity
In order to objectively evaluate the discrimination between directly and indirectly connected metabolites, we calculated sensitivity and specificity as:
with TP true positives, FP false positives, TN true negatives, FN false negatives [46].
A metabolite pair is considered true positive if it exhibits a partial correlation above the threshold and has a direct pathway connection; a false positive represents a metabolite pair also above the threshold but with no direct pathway connection; a false negative pair lies below the threshold but does have a direct pathway connection; and finally a true negative pair lies below the threshold and also has no direct pathway connection. The F_{1 }score was calculated as the harmonic mean of both quantities:
Pathway model
Pathway reactions in the human fatty acid metabolism were drawn from three independent databases: (1) H. sapiens Recon 1 from the BiGG databases (confidence score of at least 4) [7], (2) the Edinburgh Human Metabolic Network reconstruction [31] and (3) the KEGG PATHWAY database [29] as of July 2010. A complete list of all curated reactions and the corresponding database identifiers can be found in Additional file 6. The reaction set was subdivided into two groups: (1) Fatty acid biosynthesis reactions which apply to the metabolite classes lysoPC, diacylPC, acylalkylPC and sphingomyelins. (2) βoxidation reactions representing fatty acid degradation to model reactions between the acylcarnitines. The βoxidation model consists of a linear chain of C2 degradation steps (C10→C8→C6 etc.).
Additional file 6. Literaturecurated pathway model of human fatty acid biosynthesis and degradation.
Format: XLS Size: 14KB Download file
This file can be viewed with: Microsoft Excel Viewer
Fatty acid residues with identical masses, that cannot be distinguished by our massspectrometry technology, are merged into a single metabolite in the reaction set. For instance, the polyunsaturated fatty acids C20:4Δ8,11,14,17 from the omega3 pathway and C20:4 Δ5,8,11,14 from the omega6 pathway have identical numbers of carbon atoms and double bonds and are thus merged into a single metabolite C20:4.
Authors' contributions
JK, KS and FJT conceived this data analysis project. TI and JA performed the sample preparation and data acquirement. JK performed the analysis and wrote the primary manuscript. All authors approved the final manuscript.
Acknowledgements
The authors thank the anonymous reviewers for valuable comments and suggestions to improve the original manuscript. This research was partially supported by the Initiative and Networking Fund of the Helmholtz Association within the Helmholtz Alliance on Systems Biology (project CoReNe), by a grant from the German Federal Ministry of Education and Research (BMBF) to the German Center Diabetes Research (DZD e.V.), and by the BMBFfunded "Medizinische Systembiologie  MedSys" initiative (subproject SysMBo, project label 0315494A). Jan Krumsiek is supported by a PhD student fellowship from the "Studienstiftung des Deutschen Volkes". Thanks to Harold Gutch for critically proofreading and correcting this manuscript.
References

Tweeddale H, NotleyMcRobb L, Ferenci T: Effect of slow growth on metabolism of Escherichia coli, as revealed by global metabolite pool ("metabolome") analysis.
J Bacteriol 1998, 180(19):51095116. PubMed Abstract  PubMed Central Full Text

Wenk MR: The emerging field of lipidomics. [http://dx.doi.org/10.1038/nrd1776] webcite
Nat Rev Drug Discov 2005, 4(7):594610. PubMed Abstract  Publisher Full Text

Griffin JL: The Cinderella story of metabolic profiling: does metabolomics get to go to the functional genomics ball? [http://dx.doi.org/10.1098/rstb.2005.1734] webcite
Philos Trans R Soc Lond B Biol Sci 2006, 361(1465):147161. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fiehn O: Metabolomicsthe link between genotypes and phenotypes.
Plant Mol Biol 2002, 48(12):155171. PubMed Abstract  Publisher Full Text

Steuer R, Kurths J, Fiehn O, Weckwerth W: Observing and interpreting correlations in metabolomic networks.
Bioinformatics 2003, 19(8):10191026. PubMed Abstract  Publisher Full Text

Camacho D, de la Fuente A, Mendes P: The origin of correlations in metabolomics data. [http://dx.doi.org/10.1007/s1130600511073] webcite
Metabolomics 2005, 1:5363. Publisher Full Text

Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BO: Global reconstruction of the human metabolic network based on genomic and bibliomic data. [http://dx.doi.org/10.1073/pnas.0610772104] webcite
Proc Natl Acad Sci USA 2007, 104(6):17771782. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Arkin A, Shen P, Ross J: A Test Case of Correlation Metric Construction of a Reaction Pathway from Measurements. [http://www.sciencemag.org/cgi/content/abstract/277/5330/1275] webcite
Science 1997, 277(5330):12751279. Publisher Full Text

Vance W, Arkin A, Ross J: Determination of causal connectivities of species in reaction networks. [http://dx.doi.org/10.1073/pnas.022049699] webcite
Proc Natl Acad Sci USA 2002, 99(9):58165821. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Schäfer J, Strimmer K: Learning LargeScale Graphical Gaussian Models from Genomic Data. [http://link.aip.org/link/?APC/776/263/1] webcite

Lee JM, Lee JM, Gianchandani EP, Eddy JA, Papin JA: Dynamic analysis of integrated signaling, metabolic, and regulatory networks. [http://dx.doi.org/10.1371/journal.pcbi.1000086] webcite
PLoS Comput Biol 2008, 4(5):e1000086. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

de la Fuente A, Bing N, Hoeschele I, Mendes P: Discovery of meaningful associations in genomic data using partial correlation coefficients. [http://dx.doi.org/10.1093/bioinformatics/bth445] webcite
Bioinformatics 2004, 20(18):35653574. PubMed Abstract  Publisher Full Text

Magwene PM, Kim J: Estimating genomic coexpression networks using firstorder conditional independence. [http://dx.doi.org/10.1186/gb2004512r100] webcite
Genome Biol 2004, 5(12):R100. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Schäfer J, Strimmer K: An empirical Bayes approach to inferring largescale gene association networks. [http://dx.doi.org/10.1093/bioinformatics/bti062] webcite
Bioinformatics 2005, 21(6):754764. PubMed Abstract  Publisher Full Text

Wille A, Zimmermann P, Vranová E, Fürholz A, Laule O, Bleuler S, Hennig L, Prelic A, von Rohr P, Thiele L, Zitzler E, Gruissem W, Bühlmann P: Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. [http://dx.doi.org/10.1186/gb2004511r92] webcite
Genome Biol 2004, 5(11):R92. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Freudenberg J, Wang M, Yang Y, Li W: Partial correlation analysis indicates causal relationships between GCcontent, exon density and recombination rate in the human genome. [http://dx.doi.org/10.1186/1471210510S1S66] webcite
BMC Bioinformatics 2009, 10(Suppl 1):S66. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Keurentjes JJB, Fu J, de Vos CHR, Lommen A, Hall RD, Bino RJ, van der Plas LHW, Jansen RC, Vreugdenhil D, Koornneef M: The genetics of plant metabolism. [http://dx.doi.org/10.1038/ng1815] webcite
Nat Genet 2006, 38(7):842849. PubMed Abstract  Publisher Full Text

Holle R, Happich M, Löwel H, Wichmann HE, Group MONICAORAS: KORAa research platform for population based health research.
Gesundheitswesen 2005, 67(Suppl 1):S19S25. PubMed Abstract  Publisher Full Text

Illig T, Gieger C, Zhai G, RömischMargl W, WangSattler R, Prehn C, Altmaier E, Kastenmüller G, Kato BS, Mewes HW, Meitinger T, de Angelis MH, Kronenberg F, Soranzo N, Wichmann HE, Spector TD, Adamski J, Suhre K: A genomewide perspective of genetic variation in human metabolism. [http://dx.doi.org/10.1038/ng.507] webcite
Nat Genet 2010, 42(2):137141. PubMed Abstract  Publisher Full Text

Liebermeister W, Klipp E: Bringing metabolic networks to life: integration of kinetic, metabolic, and proteomic data. [http://dx.doi.org/10.1186/17424682342] webcite
Theor Biol Med Model 2006, 3:42. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Soranzo N, Altafini C: ERNEST: a toolbox for chemical reaction network theory. [http://dx.doi.org/10.1093/bioinformatics/btp513] webcite
Bioinformatics 2009, 25(21):28532854. PubMed Abstract  Publisher Full Text

Winicov I, Pizer LI: The mechanism of end product inhibition of serine biosynthesis. IV. Subunit structure of phosphoglycerate dehydrogenase and steady state kinetic studies of phosphoglycerate oxidation.
J Biol Chem 1974, 249(5):13481355. PubMed Abstract  Publisher Full Text

Berg JM, Tymoczko JL, Stryer L: [http://www.worldcat.org/isbn/0716787245] webcite

Hynne F, Danø S, Sørensen PG: Fullscale model of glycolysis in Saccharomyces cerevisiae. [http://linkinghub.elsevier.com/retrieve/pii/S03014622(01)002290] webcite
Biophys Chem 2001, 94(12):121163. PubMed Abstract  Publisher Full Text

Newman MEJ, Girvan M: Finding and evaluating community structure in networks.
Phys Rev E 2004, 69(2):026113. Publisher Full Text

Schäfer J, Strimmer K: A shrinkage approach to largescale covariance matrix estimation and implications for functional genomics. [http://dx.doi.org/10.2202/15446115.1175] webcite
Statistical applications in genetics and molecular biology 2005., 4

Matsuzaka T, Shimano H, Yahagi N, Kato T, Atsumi A, Yamamoto T, Inoue N, Ishikawa M, Okada S, Ishigaki N, Iwasaki H, Iwasaki Y, Karasawa T, Kumadaki S, Matsui T, Sekiya M, Ohashi K, Hasty AH, Nakagawa Y, Takahashi A, Suzuki H, Yatoh S, Sone H, Toyoshima H, ichi Osuga J, Yamada N: Crucial role of a longchain fatty acid elongase, Elovl6, in obesityinduced insulin resistance. [http://dx.doi.org/10.1038/nm1662] webcite
Nat Med 2007, 13(10):11931202. PubMed Abstract  Publisher Full Text

Eaton S, Bartlett K, Pourfarzam M: Mammalian mitochondrial betaoxidation.
Biochem J 1996, 320(Pt 2):345357. PubMed Abstract  PubMed Central Full Text

Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes.
Nucleic Acids Res 2000, 28:2730. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Spector A: Essentiality of fatty acids. [http://dx.doi.org/10.1007/BF02562220] webcite
Lipids 1999, 34(0):S1S3. PubMed Abstract  Publisher Full Text

Ma H, Sorokin A, Mazein A, Selkov A, Selkov E, Demin O, Goryanin I: The Edinburgh human metabolic network reconstruction and its functional analysis. [http://dx.doi.org/10.1038/msb4100177] webcite
Mol Syst Biol 2007, 3:135. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Van Rijsbergen CJ: [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.2325] webcite
Information Retrieval. 2nd edition. Dept. of Computer Science, University of Glasgow; 1979.

Steuer R: Review: on the analysis and interpretation of correlations in metabolomic data. [http://dx.doi.org/10.1093/bib/bbl009] webcite
Brief Bioinform 2006, 7(2):151158. PubMed Abstract  Publisher Full Text

Altmaier E, Ramsay SL, Graber A, Mewes HW, Weinberger KM, Suhre K: Bioinformatics analysis of targeted metabolomicsuncovering old and new tales of diabetic mice under medication. [http://dx.doi.org/10.1210/en.20071747] webcite
Endocrinology 2008, 149(7):34783489. PubMed Abstract  Publisher Full Text

Palsson BO: [http://www.worldcat.org/isbn/0521859034] webcite
Systems Biology: Properties of Reconstructed Networks. 1st edition. Cambridge University Press; 2006.

Famili I, Mahadevan R, Palsson BO: kCone analysis: determining all candidate values for kinetic parameters on a network scale. [http://dx.doi.org/10.1529/biophysj.104.050385] webcite
Biophys J 2005, 88(3):16161625. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Dräger A, Kronfeld M, Ziller MJ, Supper J, Planatscher H, Magnus JB, Oldiges M, Kohlbacher O, Zell A: Modeling metabolic networks in C. glutamicum: a comparison of rate laws in combination with various parameter optimization strategies. [http://dx.doi.org/10.1186/1752050935] webcite
BMC Syst Biol 2009, 3:5. PubMed Abstract  PubMed Central Full Text

Shampine LF, Reichelt MW: The MATLAB ODE Suite.
SIAM Journal on Scientific Computing 1997, 18:122. Publisher Full Text

Fisher RA: The Distribution of the Partial Correlation Coefficient. [http://hdl.handle.net/2440/15182] webcite

White S, Smyth P: A Spectral Clustering Approach To Finding Communities in Graphs. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.8978] webcite

Maslov S, Sneppen K: Specificity and stability in topology of protein networks. [http://dx.doi.org/10.1126/science.1065103] webcite
Science 2002, 296(5569):910913. PubMed Abstract  Publisher Full Text

Wong P, Althammer S, Hildebrand A, Kirschner A, Pagel P, Geissler B, Smialowski P, Blöchl F, Oesterheld M, Schmidt T, Strack N, Theis FJ, Ruepp A, Frishman D: An evolutionary and structural characterization of mammalian protein complex organization. [http://dx.doi.org/10.1186/147121649629] webcite
BMC Genomics 2008, 9:629. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Hartsperger ML, Blöchl F, Stümpflen V, Theis FJ: Structuring heterogeneous biological information using fuzzy clustering of kpartite graphs. [http://dx.doi.org/10.1186/1471210511522] webcite
BMC Bioinformatics 2010, 11:522. PubMed Abstract  BioMed Central Full Text

Altman DG, Bland JM: Diagnostic tests. 1: Sensitivity and specificity.
BMJ 1994, 308(6943):1552. PubMed Abstract  PubMed Central Full Text