<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1753-6561-5-S2-S9</ui>
   <ji>1753-6561</ji>
   <fm>
      <dochead>Proceedings</dochead>
      <bibl>
         <title>
            <p>MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets</p>
         </title>
         <aug>
            <au id="A1"><snm>Liu</snm><fnm>Bo</fnm><insr iid="I1"/><insr iid="I2"/><email>boliu@umiacs.umd.edu</email></au>
            <au ca="yes" id="A2"><snm>Pop</snm><fnm>Mihai</fnm><insr iid="I1"/><insr iid="I2"/><email>mpop@umiacs.umd.edu</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA</p></ins>
            <ins id="I2"><p>Department of Computer Science, University of Maryland-College Park, College Park, MD 20742, USA</p></ins>
         </insg>
         <source>BMC Proceedings</source>
         
         
         <supplement><title><p>Selected Proceedings of the 6th International Symposium on Bioinformatics Research and Applications (ISBRA'10)</p></title><editor>Ion Mandoiu, J Peter Gogarten and Alex Zelikovsky</editor><note>Proceedings</note></supplement><conference><title><p>6th International Symposium on Bioinformatics Research and Applications (ISBRA'10)</p></title><location>Storrs, CT, USA</location><date-range>23-26 May 2010</date-range><url>http://www.cs.gsu.edu/isbra10/</url></conference><issn>1753-6561</issn>
         <pubdate>2011</pubdate>
         <volume>5</volume>
         <issue>Suppl 2</issue>
         <fpage>S9</fpage>
         <url>http://www.biomedcentral.com/1753-6561/5/S2/S9</url>
         <xrefbib><pubid idtype="doi">10.1186/1753-6561-5-S2-S9</pubid></xrefbib>
      </bibl>
      <history><pub><date><day>28</day><month>4</month><year>2011</year></date></pub></history>
      <cpyrt><year>2011</year><collab>Liu and Pop; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of metagenomic studies is to identify specific functional adaptations of microbial communities to their habitats. The functional profile and the abundances for a sample can be estimated by mapping metagenomic sequences to the global metabolic network consisting of thousands of molecular reactions. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic datasets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>First, we introduce a scoring function for an arbitrary subnetwork and find the max-weight subnetwork in the global network by a greedy search algorithm. Then we compute two <it>p</it> values (<it>p<sub>abund</sub></it> and <it>p<sub>struct</sub></it>) using nonparametric approaches to answer two different statistical questions: (1) is this subnetwork differentically abundant? (2) What is the probability of finding such good subnetworks by chance given the data and network structure? Finally, significant metabolic subnetworks are discovered based on these two <it>p</it> values.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>In order to validate our methods, we have designed a simulated metabolic pathways dataset and show that MetaPath outperforms other commonly used approaches. We also demonstrate the power of our methods in analyzing two publicly available metagenomic datasets, and show that the subnetworks identified by MetaPath provide valuable insights into the biological activities of the microbiome.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>We have introduced a statistical method for finding significant metabolic subnetworks from metagenomic datasets. Compared with previous methods, results from MetaPath are more robust against noise in the data, and have significantly higher sensitivity and specificity (when tested on simulated datasets). When applied to two publicly available metagenomic datasets, the output of MetaPath is consistent with previous observations and also provides several new insights into the metabolic activity of the gut microbiome. The software is freely available at <url>http://metapath.cbcb.umd.edu</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Metagenomics is a new scientific field that involves the analysis of organismal DNA sequences obtained directly from an environmental sample, enabling studies of microorganisms that are not easily cultured in a laboratory <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Metagenomic studies, pioneered in the early 2000s <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, have recently increased in number and scope due to the emergence of next generation sequencing technologies. Due to the difficulty of assembling entire organisms from a metagenomic dataset <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, most analyses take a gene-centric view, treating the community as an aggregate and ignoring the exact assignment of genes to individual organisms. In fact, it can be argued that the environment is better characterized by its gene complement rather than by its taxonomic composition, given that similar biological functions can be performed by microbes of distinct taxonomic origins. Supporting this view is the observation that, while the taxonomic composition of the human gut microbiome varies significantly between people, the functional profile is remarkably stable across samples <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The functional profile for a sample can be recovered by mapping sequences to gene families <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, subsystems <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> or metabolic pathways <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The relative abundance of each functional category can be estimated by counting how many sequences are assigned to each category, and this information is the basis for detailed comparisons of the functional potential of different functions. In a typical comparative metagenomics experiment, random shotgun sequences are generated from a collection of samples belonging to two groups, for example, obese or lean twins <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, and healthy infants or adults <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. An important biological problem is to find differentially abundant functional signatures (e.g., genes or metabolic pathways) that are selected for by their local environments. Traditional analysis approaches compare the relative abundances of the categories one-at-a-time between different phenotypes, and compute the significance using one of several statistical approaches <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. When comparing communities at the gene family level, many functional categories are commonly found to be differentially abundant, even after correcting for multiple hypothesis testing <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B7">7</abbr></abbrgrp>. The interpretation of these data can be daunting. An alternative approach focuses on functional subsystems and metabolic pathway comparisons <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, the number of which is much smaller than gene families. Results at these levels are easier to interpret and can provide a stronger evidence of distinct functional capacities than at the level of individual gene families. Such analyses, however, can be unnecessarily coarse. For example, the use of KEGG pathways as a basis for analysis is complicated by the following issues: (1) the definitions of pathways in KEGG are coercive, and the interactions between these pathways are ignored; (2) the genes in a pathway may not be fully covered by the identified genes in a metagenomic sample; (3) significant differences in the abundance of certain genes may be masked once the abundance of all genes in a pathway is aggregated.</p>
         <p>To address these problems, we introduce a general method (MetaPath) for searching the global metabolic network to find differentially abundant finer-level subnetworks. For the purposes of this paper we define a subnetwork to be a connected set of genes that is statistically enriched or depleted in one group of samples. Underlying our approach is a statistical scoring system that captures the differential abundance for a given subnetwork, combined with a greedy search algorithm for a maximum weighted subgraph, to indentify the highest scoring subnetworks. Unlike previous approaches, MetaPath explicitly searches significant subnetwork in the global metabolic network (rather than the KEGG defined pathways), enabling us to detect subnetworks spanning predefined &#8220;containers&#8221;. In addition, we developed rigorous statistical methods that take into account the topology of the network when testing the significance of the subnetworks.</p>
         <p>Using simulated datasets, we demonstrate that Metapath outperforms previously described approaches for comparing biological networks based on abundance data. We show that our findings are more robust to noisy data than the results of single gene comparisons, and that MetaPath can find finer-level subnetwork than can be found by comparing predefined KEGG pathways. We also discuss the biological significance of the results derived from the application of MetaPath to actual metagenomic datasets, demonstrating that the output from MetaPath is easy to interpret and provides valuable biological insights. The software is freely available at <url>http://metapath.cbcb.umd.edu</url>.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <p>We tested our methods on two previously published metagenomic datasets, which were downloaded from the NCBI Trace Archive or Short Read Archive databases: (1) gut microbiomes from obese and lean twins <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>; (2) metagenomes from adult- and infant-type gut microbiomes <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Each dataset is divided into two populations of distinct phenotypes. The metabolic pathway data were downloaded from the KEGG pathways database <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The metabolic network is represented as a graph where nodes are metabolic substrates, and edges are molecular reactions (Fig. <figr fid="F1">1</figr>). The edges could be unidirectional or bidirectional depending on whether the corresponding reaction is reversible (as specified in KEGG database). Multiple reactions that are related to a same biological process are aggregated by KEGG into a &#8220;pathway&#8221; (e.g., glycolysis pathway). In addition, we refer to the network comprising all metabolic pathways in KEGG as the &#8220;global metabolic network&#8221;. Metagenomic sequences are annotated through BLASTX searches against KEGG genes database. The abundance of each molecular reaction is estimated as the number of metagenomic sequences mapped to it. Note that more accurate abundance estimates can be obtained by taking into account the length of individual genes <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and we plan to explore the use of such estimates (and the associated statistics) in future versions of our software.</p>
            <fig id="F1"><title><p>Figure 1</p></title><caption><p>Schematic diagram of the MetaPath methods</p></caption><text>
   <p><b>Schematic diagram of the MetaPath methods</b>.
Sequences from each sample are annotated against KEGG genes database and
mapped to reactions in metabolic networks, resulting an abundance matrix where the
rows are reactions and columns are samples. Then <it>p</it> values are computed for all
reactions using Metastats <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, then converted into Z values, and greedy search is
performed on the edge-weighted graph to find max-weight subnetworks. Finally, we
calculate the p<sub>abund</sub> and p<sub>struct</sub> significance values of the max-weight subnetwork.</p>
</text><graphic file="1753-6561-5-S2-S9-1"/></fig>
         </sec>
         <sec>
            <st>
               <p>Scoring metabolic subpathways</p>
            </st>
            <p>To score the biological activity of a particular subnetwork, we first use Metastats <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> to calculate the significance of differential abundance for each reaction between the two phenotypic groups under comparison. Under the null hypothesis, the relative abundances are randomly drawn from the same distribution across different phenotypic groups, thus the <it>p</it> value for each feature (metabolic reactions) follows a uniform distribution from 0 to 1. Based on this assumption, <it>p</it> values can be converted to <it>Z</it> scores <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> using the Gaussian distribution. Because Metastats performs a two-tailed test for each reaction, the two-tailed <it>p</it> values can be converted back to the original <it>Z</it> values using the following equation:</p>
            <p>
               <display-formula>
                  <graphic file="1753-6561-5-S2-S9-i1.gif"/>
               </display-formula>
            </p>
            <p>where <inline-formula><graphic file="1753-6561-5-S2-S9-i2.gif"/></inline-formula> is the inverse cumulative density function (CDF) of standard normal distribution; G1 and G2 represent two different phenotypic groups. Using this formula, if a reaction is more abundant in population G1, then its <it>Z</it> score will be positive and vice versa. We are specifically interested in finding a pathway whose reactions are either enriched or depleted as a whole, as apposed to previous approaches <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp> that identify active or perturbed subnetworks, which may contain a mixture of enriched and depleted components. Similar to the approach of <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> we define the aggregate score for a particular subnetwork to be the sum of the <it>Z</it> scores over all reactions contained within it: <inline-formula><graphic file="1753-6561-5-S2-S9-i3.gif"/></inline-formula>, where <it>k</it> is the size (number of metabolic reactions) of the subnetwork.</p>
         </sec>
         <sec>
            <st>
               <p>Identifying high-scoring pathways</p>
            </st>
            <p>As proposed in <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, we attempt to find subnetworks that maximize the cumulative Z- score defined above. Unfortunately, this problem is NP-hard, which is equivalent to finding a maximum-weight subgraph <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Several approaches to solving this problem have been previously proposed: Ideker, <it>et al.</it> 2002 <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> used simulated annealing, but this heuristic is slow; Dittrich, <it>et al</it>, 2008 <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> used integer linear programming that can find provably optimal subpathways quickly, but it requires the commercial software package CPLEX that is not available to the general public (using a freely available ILP solver would require re-implementing the entire algorithm as the software is provided as a binary-only release). Here we rely on a greedy search heuristic that is fast, and, while not guaranteed to find maximally scoring pathways, performs well in practice. The algorithm we employ is described below:</p>
            <p>
               <display-formula>
                  <graphic file="1753-6561-5-S2-S9-i4.gif"/>
               </display-formula>
            </p>
            <p>This algorithm tries to find a connected metabolic subnetwork, which can have any arbitrary structure, with maximum weight. However, it is believed that in metabolic networks, chains are especially more biologically meaningful and interesting, because they attempt to capture the structure of a series of reactions that are successively connected. To allow this idea, we modify line 8 of the above algorithm to &#8220;Pick an edge e<sub>j</sub> which has the highest weight of the edges that are adjacent to and have the same direction with e<sub>j-1</sub>&#8221;. Both searching algorithms are implemented in our program and can be selected through command-line parameters. To find all significant subnetworks (computing significance is discussed below), we iteratively remove the edges in the global network that are contained in previously found significant subnetworks, and rerun our greedy search on the rest of the network until we can no longer find any additional significant subnetworks. Note, that unlike the original version of our code <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, the search algorithm is not limited to given subnetwork size, rather will find all significant subnetworks irrespective of size.</p>
         </sec>
         <sec>
            <st>
               <p>Computing the significance of subnetwork</p>
            </st>
            <p>The null score distribution for a specific subnetwork can be estimated by permuting the sample labels (columns of the abundance matrix) of the reactions and computing the subnetwork scores from the permuted abundance matrix. The significance <it>p</it> value is estimated as the number of random permutations that produce higher scores than the original subnetwork. The <it>p</it> value computed through this approach (termed <it>p<sub>abund</sub></it> throughout the rest of the paper), however, ignores the topology of the underlying global metabolic network, and potentially leads to incorrect conclusions. For example, assume we have a densely connected metabolic network, in which every edge is connected with all other edges. Then, the best subnetwork is simply composed of the top differentially abundant metabolic reactions. This indicates that whenever there are significant reactions, which may simply come from random noise given the large number of edges, they will form a significant subnetwork because of the biases from the network topology (Fig. <figr fid="F2">2</figr>). To address this problem, we compute another <it>p</it> value (termed <it>p<sub>struct</sub></it>), relying on a topological definition of the null distribution of subnetwork scores. Specifically, instead of treating each subnetwork as a bag of genes, we estimate the distribution of scores for actual subnetworks identified within the underlying global metabolic network. Since this null-distribution depends on the size (number of edges) of the subnetwork, let <it>k</it> be the size of a subnetwork generated by the greedy search algorithm described above, and <it>Z</it> be the corresponding <it>Z</it>-score. The <it>p<sub>struct</sub></it> value for this subnetwork can be calculated as follows: (i) permute the edge weights (row labels of the abundance matrix) of the global metabolic network; (ii) perform greedy search to find a maximal weighted subnetwork of size <it>k</it>; (iii) repeat step 1 and step 2 for 1000 times, and generate 1000 weights of the max-weight subnetwork (null distribution); (iv) the <it>p<sub>struct</sub></it> value is the proportion of the 1000 times in step 3 that we see scores higher than our original observation <it>Z</it>.</p>
            <fig id="F2"><title><p>Figure 2</p></title><caption><p>Significant subnetworks that are caused by structural biases</p></caption><text>
   <p><b>Significant subnetworks that are caused by structural biases</b>.
On the left side, both of the two pathways have equal weight, indicating equal
significance of differential abundance. The high weight of the second pathway,
however, mainly come from the middle fat edge that has weight 7. On the right side,
in a densely connected network, any random high-weight edges will form a
subnetwork with high weight (correlated noise).</p>
</text><graphic file="1753-6561-5-S2-S9-2"/></fig>
         </sec>
         <sec>
            <st>
               <p>MetaPath methods summary</p>
            </st>
            <p>To summarize the methods described above, the MetaPath algorithm proceeds as follows:</p>
            <p>1. Differential abundance is assessed on an edge-by-edge basis (reaction-by-reaction) using Metastats;</p>
            <p>2. The significance estimates (<it>p</it>-values) from Metastats are fed into a greedy search algorithm to determine all maximally weighted subnetworks(in terms of statistical <it>Z</it>-scores) in the global metabolic network;</p>
            <p>3. The significance of each subnetwork detected by the greedy search algorithm is assessed using both a topology-independent bootstrapping approach (<it>p<sub>abund</sub></it>), and a topology-dependent bootstrapping approach (<it>p<sub>struct</sub></it>);</p>
            <p>4. The subnetworks determined to be significant (<it>p<sub>abund</sub></it> &#8804; <it>0.05</it> and <it>p<sub>struct</sub></it> &#8804; <it>0.05</it>) are reported to the user (Note: the threshold for significance can be adjusted through command-line parameters). The pathways are ranked by <it>p<sub>abund</sub></it> values.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussions</p>
         </st>
         <sec>
            <st>
               <p>Performance evaluation using simulated datasets</p>
            </st>
            <p>In order to validate our methods, we have designed a simulated metagenomic study and compared the results with three previous approaches: (i) identifying significantly active subnetworks using simulated annealing and greedy search <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>; (ii) discovering significant individual reactions using Metastats <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>; and (iii) finding differentially abundant KEGG defined pathways, an approach widely used in metagenomic functional comparison <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B7">7</abbr><abbr bid="B10">10</abbr></abbrgrp>. We choose these tools because they are addressing similar biological problems. However they do not solve the exact same problem as this paper, which is finding differentially abundant subnetworks that may span two or more KEGG defined pathways (see discussion in the Background section). Here the goal of this simulated study is to show that the computational problem in this paper can not be directly solved by applying methods previously developed in a related context.</p>
            <p>We designed a simulated metabolic pathways dataset in which five subjects are created for each of the two groups with distinct phenotypes. To generate the artificial reaction abundance matrix (where rows represent reactions and columns represent subjects), a Gaussian distribution is created for each reaction, whose mean is randomly chosen from a real metagenomic dataset (gut microbiome from obese and lean subjects <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>). The variance of each distribution is calculated by setting the relative standard deviation (standard deviation divided by the mean) to 0.2. If we define a reaction to be equally abundant between two groups under comparison, then a random abundance value is generated from the same distribution for each subject. Otherwise, if a reaction is defined to be significantly enriched in one group, then another normal distribution is created for this reaction by increasing the mean such that the <it>p</it> value of the difference for the two distributions is less than a predefined value (0.05 and 0.01 were used). In this study, we have chosen a subnetwork (a series of reactions with length 5 or 10) to be enriched in one population. The goal is to compare different methods in recovering this significant subnetwork (a set of significant reactions) based on the simulated abundance matrix. Biologically, the enriched pathways indicate functional enrichment of certain biological processes in a microbial community.</p>
            <p>The receiver operating characteristic (ROC) curve is plotted for each method (Fig. <figr fid="F3">3</figr>). Fig. <figr fid="F3">3</figr> shows that MetaPath outperforms all other methods dramatically showing the advantage in finding significant subnetworks. Note that the results tested on our simulated datasets can be considered as the baseline performance, because it contains only one significant subnetwork, whereas real metagenomic datasets typically contain multiple significant pathways. The most commonly used approach &#8212; comparing KEGG-defined pathways &#8212; performs the worst in our simulation study (Fig. <figr fid="F3">3</figr>).</p>
            <fig id="F3"><title><p>Figure 3</p></title><caption><p>Comparison of statistical methods for discovering significant
reactions in simulated datasets</p></caption><text>
   <p><b>Comparison of statistical methods for discovering significant
reactions in simulated datasets</b>.
Four methods are evaluated: discovering active subnetworks using simulated
annealing (Anneal) and greedy search (Greedy) <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, discovering significant
individual reactions using Metastats <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, finding differentially abundant KEGGdefined
pathways (KEGGPath), and MetaPath. Four datasets are created by varying
the number of significant reactions <it>n</it> and their significances.</p>
</text><graphic file="1753-6561-5-S2-S9-3"/></fig>
         </sec>
         <sec>
            <st>
               <p>Obese and lean twins</p>
            </st>
            <p>We used MetaPath to compare the abundances of the metabolic networks of the gut microbiome in lean and obese subjects, relying on data from <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. This metagenomic dataset comprises 6 samples from obese subjects and 6 samples from lean objects. The sequences are annotated and mapped to KEGG reactions using BLASTX (E value &lt; 10<sup>-5</sup>, bitscore &gt; 50, and %identity &gt; 50; parameters suggested in the original study), resulting in total 1832 unique reactions within the 12 metagenomic samples. First, we computed <it>p</it> values <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> using Metastats to find differentially abundant reactions. Using a <it>p</it> value cutoff of 0.05, 92.7&#177;9.1 (mean&#177;standard deviation) reactions are significant including 37.1&#177;6.6 and 55.6&#177;3.1 enriched reactions in obese and lean groups, respectively, based on 10 runs of Metastats. The high variance of the number of significant genes can be primarily explained by two reasons: (1) some reactions are slightly below or above significance cutoff (0.05), thus <it>p</it> values computed through bootstrapping will jump between being considered significant and not significant (Fig. <figr fid="F4">4</figr>); (2) there are large variances of the abundance values within individuals in a same phenotypic group. In addition to <it>p</it> values, Metastats also provides an estimate of the False Discovery Rate (<it>q</it> value), information that is not used by MetaPath. The <it>q</it> values for all reactions are 1 (except R01676 where <it>q</it>=0.73), i.e. a literal interpretation of Metastats results would indicate no pathways are significantly different between the two populations. This result can be explained by the flat distribution of the <it>p</it> values (Fig. <figr fid="F4">4</figr>), from which the <it>q</it> values are estimated. This observation highlights the limitation of relying on the false discovery rate, which requires the estimation of the proportion of features that are truly null <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, approach that does not perform well when only few features are truly significant.</p>
            <fig id="F4"><title><p>Figure 4</p></title><caption><p>p values distributions from comparing individual metabolic
reactions by Metastats and from comparing metabolic networks by MetaPath</p></caption><text>
   <p><b><it>p</it> values distributions from comparing individual metabolic
reactions by Metastats and from comparing metabolic networks by MetaPath</b>.
The top histogram is the distribution of the p values of individual metabolic reactions
calculated by Metastats. The Bottom histogram is the distribution of the p<sub>abund</sub> values
of the subnetworks calculated by MetaPath.</p>
</text><graphic file="1753-6561-5-S2-S9-4"/></fig>
            <p>We, then, applied MetaPath to this dataset, and have found 9 differentially abundant subnetwork (Fig. <figr fid="F5">5</figr>) using 0.05 cutoff value for both <it>p<sub>abund</sub></it> and <it>p<sub>struct</sub></it>. All these subnetworks are enriched in obese subjects; none was found to be enriched in lean subjects. These 9 significant subnetworks contain 48 unique reactions, 22 of which are significant. It is worth pointing out that the number of significant reactions varies between different runs of statistical permutations (using Metastats) as shown above, but the significant pathways identified by Metapath stay the same (Fig. <figr fid="F5">5</figr>). This observation confirms that the results from MetaPath are more robust in the presence of noise in the data than the gene-by-gene approach. In the <it>p</it> values distribution of subnetworks (Fig. <figr fid="F4">4</figr>), most of them are either very significant or insignificant and very few are around the <it>p</it> value cutoff, allowing the users to easily interpret the results.</p>
            <fig id="F5"><title><p>Figure 5</p></title><caption><p>9 statistically significant subnetworks are found in the comparison
of the gut microbiome from the obese and lean subjects</p></caption><text>
   <p><b>9 statistically significant subnetworks are found in the comparison
of the gut microbiome from the obese and lean subjects</b>.
All these subnetworks are enriched in the obese subjects. p<sub>abund</sub> and p<sub>struct</sub> significance
values are shown above each subnetwork. p values for each reaction are shown with
the KEGG reaction number. Five pathways (a)-(e) belong to the Fatty Acid
Metabolism pathway in KEGG. Four pathways (f)-(i) contain the L-Homocysteine
molecules.</p>
</text><graphic file="1753-6561-5-S2-S9-5"/></fig>
            <p>Five subnetworks (Fig. <figr fid="F5">5a-5e</figr>) are completely contained in the KEGG Fatty Acid Biosynthesis pathway, which consists of catabolic processes that can generate energy and primary metabolites from fatty acids. Our findings are consistent with previous observations and biochemical analysis in microbiota transplantation experiments in germ-free mice <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, where the concentrations of short-chain fatty acids in the caeca of obese mice are higher than lean mice, suggesting that the gut microbiome in obese subjects has an increased capacity for dietary energy harvest.</p>
            <p>Another interesting significant networks consists of 10 reactions (Fig. <figr fid="F5">5f</figr>), of which 8 belong to Cysteine and Methionine Metabolism and 2 belong to Sulfur Metabolism. Many reactions in this subnetwork are connected by the L-Homocysteine molecule. In addition, three other subnetworks (Fig. <figr fid="F5">5g-5i</figr>) we discovered further confirm its potential involvement in obesity, because all these three pathways contain L-homocysteine as metabolite. It is well-known that a high level of blood serum homocysteine is a risk factor for cardiovascular disease <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and obesity &#8212; an increasingly prevalent metabolic disorder &#8212; is closely associated with heart disease <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Significant correlations between plasma homocysteine concentrations and obesity have been previously reported <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. The finding of increased potential for homocysteine metabolism within the obese gut microbiome provides an interesting hypothesis for future studies that, the gut microbiome may either have a direct role in the elevation of homocysteine levels in plasma, or may indirectly affect the hepatic biosynthesis of this amino-acid in the human body.</p>
         </sec>
         <sec>
            <st>
               <p>Infant and adult individuals</p>
            </st>
            <p>A second data-set comprises gut microbiome samples from 4 infants and 9 adults individuals which were sequenced by Kurokawa, <it>et al</it>., 2007 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The sequences were annotated and mapped to the reactions of KEGG pathway using BLASTX (E value &lt; 10<sup>-8</sup>, hit length coverage &#8805; 50% of a query sequence), resulting in total 1781 unique reactions within the 13 metagenomic samples. Based on 10 runs of Metastats, 383.7&#177;1.56 reactions are significant using <it>p</it> value cutoff of 0.05, including 268.7&#177;1.56 and 115&#177;0 reactions enriched in infant and adult subjects respectively.Using a <it>q</it> value cutoff of 0.05, 167.2&#177;2.7 reactions are significant, including 133.2&#177;2.7 and 34&#177;0 reactions enriched in infant and adult subjects respectively.Compared with the previous dataset (obese and lean twins samples), the predictions of significant reactions are much more consistent across different permutations.</p>
            <p>Applying MetaPath to search for significant subnetworks using the same parameters as before, we have found that 6 are enriched in infant subjects (Fig. <figr fid="F6">6a-6f</figr>) and 4 are enriched in adult subjects (Fig. <figr fid="F6">6g-6j</figr>). These 10 significant subnetworks contain 55 unique reactions (35 and 20 in subnetworks enriched in infant and adult, respectively), including 38 significant reactions (22 and 16 enriched in infant and adult, respectively) and 17 reactions not found significant by Metastats. Three subnetworks enriched in infant subjects (Fig. <figr fid="F6">6a, 6c and 6d</figr>) involve the metabolite L-homocysteine, and a fourth one (Fig. <figr fid="F6">6b</figr>) involves L-cysteine &#8211; a related amino-acid, which is consistent with previous observation that breastfed babies have an higher plasma homocysteine level possibly caused by suboptimal availability of folate in breast milk <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The concentration of folate is negatively correlated with that of homocysteine, as folate is a necessary coenzyme for reactions that metabolize homocysteine. In addition, babies normally have high protein diet, which may also cause the concentration of homocysteine to increase. A second pathway in Fig. <figr fid="F6">6e</figr> involves substrates citrate and succinate, and is closely related with oxidative tricarboxylic acid (TCA) cycle. TCA cycle is part of carbohydrate metabolism and can convert carbohydrates into usable energy in aerobic organisms. Because the adult gut ecosystem is dominated by strict anaerobes, it is reasonable to find this subpathway enriched in infant individuals where the gut microbiota also includes aerobes. This finding is consistent with results obtained by comparing COG functional categories <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. We also find a subpathway belonging to atrazine metabolism to be enriched in infant subjects (Fig. <figr fid="F6">6f</figr>). Atrazine is one of the most widely used herbicides, and it contaminates water and soil throughout the world. Our finding possibly indicates a side-effect of this contamination.</p>
            <fig id="F6"><title><p>Figure 6</p></title><caption><p>10 statistically significant subpathways are found in the infant and
adult individuals dataset</p></caption><text>
   <p><b>10 statistically significant subpathways are found in the infant and
adult individuals dataset</b>.
6 subpathways are enriched in the infant subjects (Fig. 4a-4f), and 4 subpathways are
enriched in the adult subjects (Fig. 4g-4j). p<sub>abund</sub> and p<sub>struct</sub> significance values are
shown above each pathway. <it>p</it> values for each reaction are shown with the KEGG
reaction number.</p>
</text><graphic file="1753-6561-5-S2-S9-6"/></fig>
            <p>The pathway in Fig. <figr fid="F6">6i</figr> (enriched in adult subjects) is part of the lipopolysaccharide biosynthesis. Lipopolysaccharides are a building block of the outer membrane of Gram-negative bacteria. The enrichment of pathway Fig. <figr fid="F6">6i</figr> in adult subject may be a result of the fact that Gram-negative bacteria are also enriched in adults. Specifically, Bacteroides, a genus of Gram-negative bacteria, are a major constituent of adult gut microbiome, but not highly prevalent in infants. Fig. <figr fid="F6">6h</figr> and Fig. <figr fid="F6">6j</figr> (enriched in adult) are pathways related with pyrimidine metabolism. The metabolites RNA, cytidine and uridine, which are contained in pyrimidine metabolism, are normally obtained from high RNA food such as organ meats, broccoli, and brewer&#8217;s yeast, which are not available to unweaned infants, as they are not present in high abundance in milk. The pathway in Fig. <figr fid="F6">6g</figr> (enriched in adult) is part of fructose and mannose metabolism a pathway related to carbohydrate metabolism. This is also consistent with COG-based analyses indicating that many mono- or disaccharides metabolism genes are enriched in adults <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, explained by the fact that colonic microbiota in adults uses indigestible polysaccharides as resources for energy production and biosynthesis of cellular components.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We have introduced a statistical method for finding significant metabolic subpathways from metagenomic datasets. Compared with previous methods, results from MetaPath are more robust to noise in the data, and have significantly higher sensitivity and specificity (when tested on simulated datasets). When applied to two publicly available metagenomic data-sets the output of MetaPath is consistent with previous observations and also provides several new insights into the metabolic activity of the gut microbiome. Finally, MetaPath is efficient: a typical metagenomic dataset and the corresponding metabolic network (about 2000 edges) can be analyzed in half an hour on a single processor.</p>
         <p>While showing promising results, our methods have several limitations that we plan to address in the near future. First, and foremost, we restrict ourselves to pathways of a fixed length &#8212; a restriction necessary for accurately computing the null distribution of pathway scores. This can severely affect our ability to discover long pathways whose abundance differs only slightly, but significantly, between samples. Second, we currently estimate gene abundances by simply counting the number of sequencing reads that map to a certain gene. Such an approach ignores differences in the length of genes, potentially leading to incorrect conclusions. We plan to address this issue by incorporating a recently-published <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> method that can accurately correct for gene-length effects. The software described in this paper is freely-available under an open-source license from <url>http://metapath.cbcb.umd.edu</url></p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>BL and MP conceived the project, designed the algorithm and wrote the manuscript. BL implemented the algorithm and analyzed the data. Both authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Niranjan Nagarajan, Carl Kingsford, James White and Saket Navlakha, Theodore Gibbons for helpful discussions. This work was supported in part by grants R01-HG004885 from the NIH, and IIS-0812111 from the NSF, both to MP.</p>
            <p>This article has been published as part of <it>BMC Proceedings</it> Volume 5 Supplement 2, 2011: Proceedings of the 6th International Symposium on Bioinformatics Research and Applications (ISBRA'10). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1753-6561/5?issue=S2</url>.</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>Metagenomics: genomic analysis of microbial communities</p></title><aug><au><snm>Riesenfeld</snm><fnm>CS</fnm></au><au><snm>Schloss</snm><fnm>PD</fnm></au><au><snm>Handelsman</snm><fnm>J</fnm></au></aug><source>Annu Rev Genet</source><pubdate>2004</pubdate><volume>38</volume><fpage>525</fpage><lpage>552</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1146/annurev.genet.38.072902.091216</pubid><pubid idtype="pmpid" link="fulltext">15568985</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Bacterial rhodopsin: evidence for a new type of phototrophy in the sea</p></title><aug><au><snm>Beja</snm><fnm>O</fnm></au><au><snm>Aravind</snm><fnm>L</fnm></au><au><snm>Koonin</snm><fnm>EV</fnm></au><au><snm>Suzuki</snm><fnm>MT</fnm></au><au><snm>Hadd</snm><fnm>A</fnm></au><au><snm>Nguyen</snm><fnm>LP</fnm></au><au><snm>Jovanovich</snm><fnm>SB</fnm></au><au><snm>Gates</snm><fnm>CM</fnm></au><au><snm>Feldman</snm><fnm>RA</fnm></au><au><snm>Spudich</snm><fnm>JL</fnm></au><etal/></aug><source>Science</source><pubdate>2000</pubdate><volume>289</volume><fpage>1902</fpage><lpage>1906</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.289.5486.1902</pubid><pubid idtype="pmpid" link="fulltext">10988064</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>A core gut microbiome in obese and lean twins</p></title><aug><au><snm>Turnbaugh</snm><fnm>PJ</fnm></au><au><snm>Hamady</snm><fnm>M</fnm></au><au><snm>Yatsunenko</snm><fnm>T</fnm></au><au><snm>Cantarel</snm><fnm>BL</fnm></au><au><snm>Duncan</snm><fnm>A</fnm></au><au><snm>Ley</snm><fnm>RE</fnm></au><au><snm>Sogin</snm><fnm>ML</fnm></au><au><snm>Jones</snm><fnm>WJ</fnm></au><au><snm>Roe</snm><fnm>BA</fnm></au><au><snm>Affourtit</snm><fnm>JP</fnm></au><etal/></aug><source>Nature</source><pubdate>2009</pubdate><volume>457</volume><fpage>480</fpage><lpage>484</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature07540</pubid><pubid idtype="pmcid">2677729</pubid><pubid idtype="pmpid" link="fulltext">19043404</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>The COG database: a tool for genome-scale analysis of protein functions and evolution</p></title><aug><au><snm>Tatusov</snm><fnm>RL</fnm></au><au><snm>Galperin</snm><fnm>MY</fnm></au><au><snm>Natale</snm><fnm>DA</fnm></au><au><snm>Koonin</snm><fnm>EV</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2000</pubdate><volume>28</volume><fpage>33</fpage><lpage>36</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.33</pubid><pubid idtype="pmcid">102395</pubid><pubid idtype="pmpid" link="fulltext">10592175</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes</p></title><aug><au><snm>Meyer</snm><fnm>F</fnm></au><au><snm>Paarmann</snm><fnm>D</fnm></au><au><snm>D'Souza</snm><fnm>M</fnm></au><au><snm>Olson</snm><fnm>R</fnm></au><au><snm>Glass</snm><fnm>EM</fnm></au><au><snm>Kubal</snm><fnm>M</fnm></au><au><snm>Paczian</snm><fnm>T</fnm></au><au><snm>Rodriguez</snm><fnm>A</fnm></au><au><snm>Stevens</snm><fnm>R</fnm></au><au><snm>Wilke</snm><fnm>A</fnm></au><etal/></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><fpage>386</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-9-386</pubid><pubid idtype="pmcid">2563014</pubid><pubid idtype="pmpid" link="fulltext">18803844</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>KEGG for linking genomes to life and the environment</p></title><aug><au><snm>Kanehisa</snm><fnm>M</fnm></au><au><snm>Araki</snm><fnm>M</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au><au><snm>Hattori</snm><fnm>M</fnm></au><au><snm>Hirakawa</snm><fnm>M</fnm></au><au><snm>Itoh</snm><fnm>M</fnm></au><au><snm>Katayama</snm><fnm>T</fnm></au><au><snm>Kawashima</snm><fnm>S</fnm></au><au><snm>Okuda</snm><fnm>S</fnm></au><au><snm>Tokimatsu</snm><fnm>T</fnm></au><au><snm>Yamanishi</snm><fnm>Y</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>D480</fpage><lpage>484</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm882</pubid><pubid idtype="pmcid">2238879</pubid><pubid idtype="pmpid" link="fulltext">18077471</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes</p></title><aug><au><snm>Kurokawa</snm><fnm>K</fnm></au><au><snm>Itoh</snm><fnm>T</fnm></au><au><snm>Kuwahara</snm><fnm>T</fnm></au><au><snm>Oshima</snm><fnm>K</fnm></au><au><snm>Toh</snm><fnm>H</fnm></au><au><snm>Toyoda</snm><fnm>A</fnm></au><au><snm>Takami</snm><fnm>H</fnm></au><au><snm>Morita</snm><fnm>H</fnm></au><au><snm>Sharma</snm><fnm>VK</fnm></au><au><snm>Srivastava</snm><fnm>TP</fnm></au><etal/></aug><source>DNA Res</source><pubdate>2007</pubdate><volume>14</volume><fpage>169</fpage><lpage>181</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/dnares/dsm018</pubid><pubid idtype="pmcid">2533590</pubid><pubid idtype="pmpid" link="fulltext">17916580</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>An application of statistics to comparative metagenomics</p></title><aug><au><snm>Rodriguez-Brito</snm><fnm>B</fnm></au><au><snm>Rohwer</snm><fnm>F</fnm></au><au><snm>Edwards</snm><fnm>RA</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>162</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-162</pubid><pubid idtype="pmcid">1473205</pubid><pubid idtype="pmpid" link="fulltext">16549025</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Statistical methods for detecting differentially abundant features in clinical metagenomic samples</p></title><aug><au><snm>White</snm><fnm>JR</fnm></au><au><snm>Nagarajan</snm><fnm>N</fnm></au><au><snm>Pop</snm><fnm>M</fnm></au></aug><source>PLoS Comput Biol</source><pubdate>2009</pubdate><volume>5</volume><fpage>e1000352</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.1000352</pubid><pubid idtype="pmcid">2661018</pubid><pubid idtype="pmpid" link="fulltext">19360128</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Quantifying environmental adaptation of metabolic pathways in metagenomics</p></title><aug><au><snm>Gianoulis</snm><fnm>TA</fnm></au><au><snm>Raes</snm><fnm>J</fnm></au><au><snm>Patel</snm><fnm>PV</fnm></au><au><snm>Bjornson</snm><fnm>R</fnm></au><au><snm>Korbel</snm><fnm>JO</fnm></au><au><snm>Letunic</snm><fnm>I</fnm></au><au><snm>Yamada</snm><fnm>T</fnm></au><au><snm>Paccanaro</snm><fnm>A</fnm></au><au><snm>Jensen</snm><fnm>LJ</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><etal/></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2009</pubdate><volume>106</volume><fpage>1374</fpage><lpage>1379</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0808022106</pubid><pubid idtype="pmcid">2629784</pubid><pubid idtype="pmpid" link="fulltext">19164758</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Comparative metagenomics of microbial communities</p></title><aug><au><snm>Tringe</snm><fnm>SG</fnm></au><au><snm>von Mering</snm><fnm>C</fnm></au><au><snm>Kobayashi</snm><fnm>A</fnm></au><au><snm>Salamov</snm><fnm>AA</fnm></au><au><snm>Chen</snm><fnm>K</fnm></au><au><snm>Chang</snm><fnm>HW</fnm></au><au><snm>Podar</snm><fnm>M</fnm></au><au><snm>Short</snm><fnm>JM</fnm></au><au><snm>Mathur</snm><fnm>EJ</fnm></au><au><snm>Detter</snm><fnm>JC</fnm></au><etal/></aug><source>Science</source><pubdate>2005</pubdate><volume>308</volume><fpage>554</fpage><lpage>557</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1107851</pubid><pubid idtype="pmpid" link="fulltext">15845853</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>A Statistical Framework for the Functional Analysis of Metagenomes</p></title><aug><au><snm>Sharon</snm><fnm>I</fnm></au><au><snm>Pati</snm><fnm>A</fnm></au><au><snm>Markowitz</snm><fnm>VM</fnm></au><au><snm>Pinter</snm><fnm>RY</fnm></au></aug><source>Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology</source><publisher>Tucson, Arizona: Springer-Verlag</publisher><pubdate>2009</pubdate></bibl><bibl id="B13"><title><p>Discovering regulatory and signalling circuits in molecular interaction networks</p></title><aug><au><snm>Ideker</snm><fnm>T</fnm></au><au><snm>Ozier</snm><fnm>O</fnm></au><au><snm>Schwikowski</snm><fnm>B</fnm></au><au><snm>Siegel</snm><fnm>AF</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>18</volume><issue>Suppl 1</issue><fpage>S233</fpage><lpage>240</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/18.suppl_1.S233</pubid><pubid idtype="pmpid" link="fulltext">12169552</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Identifying functional modules in protein-protein interaction networks: an integrated exact approach</p></title><aug><au><snm>Dittrich</snm><fnm>MT</fnm></au><au><snm>Klau</snm><fnm>GW</fnm></au><au><snm>Rosenwald</snm><fnm>A</fnm></au><au><snm>Dandekar</snm><fnm>T</fnm></au><au><snm>Muller</snm><fnm>T</fnm></au></aug><source>Bioinformatics</source><pubdate>2008</pubdate><volume>24</volume><fpage>i223</fpage><lpage>231</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btn161</pubid><pubid idtype="pmcid">2718639</pubid><pubid idtype="pmpid" link="fulltext">18586718</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets</p></title><aug><au><snm>Liu</snm><fnm>B</fnm></au><au><snm>Pop</snm><fnm>M</fnm></au></aug><source>Bioinformatics Research and Applications
Lecture Notes in Computer Science</source><pubdate>2010</pubdate><volume>6053/2010</volume><fpage>101</fpage><lpage>112</lpage><xrefbib><pubid idtype="doi">full_text</pubid></xrefbib></bibl><bibl id="B16"><title><p>Statistical significance for genomewide studies</p></title><aug><au><snm>Storey</snm><fnm>JD</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2003</pubdate><volume>100</volume><fpage>9440</fpage><lpage>9445</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.1530509100</pubid><pubid idtype="pmcid">170937</pubid><pubid idtype="pmpid" link="fulltext">12883005</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>An obesity-associated gut microbiome with increased capacity for energy harvest</p></title><aug><au><snm>Turnbaugh</snm><fnm>PJ</fnm></au><au><snm>Ley</snm><fnm>RE</fnm></au><au><snm>Mahowald</snm><fnm>MA</fnm></au><au><snm>Magrini</snm><fnm>V</fnm></au><au><snm>Mardis</snm><fnm>ER</fnm></au><au><snm>Gordon</snm><fnm>JI</fnm></au></aug><source>Nature</source><pubdate>2006</pubdate><volume>444</volume><fpage>1027</fpage><lpage>1031</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature05414</pubid><pubid idtype="pmpid" link="fulltext">17183312</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Insulin is an independent correlate of plasma homocysteine levels in obese children and adolescents</p></title><aug><au><snm>Gallistl</snm><fnm>S</fnm></au><au><snm>Sudi</snm><fnm>K</fnm></au><au><snm>Mangge</snm><fnm>H</fnm></au><au><snm>Erwa</snm><fnm>W</fnm></au><au><snm>Borkenstein</snm><fnm>M</fnm></au></aug><source>Diabetes Care</source><pubdate>2000</pubdate><volume>23</volume><fpage>1348</fpage><lpage>1352</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2337/diacare.23.9.1348</pubid><pubid idtype="pmpid" link="fulltext">10977031</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Obesity and heart disease: a statement for healthcare professionals from the Nutrition Committee, American Heart Association</p></title><aug><au><snm>Eckel</snm><fnm>RH</fnm></au></aug><source>Circulation</source><pubdate>1997</pubdate><volume>96</volume><fpage>3248</fpage><lpage>3250</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9386201</pubid></xrefbib></bibl><bibl id="B20"><title><p>Occurrence of hyperhomocysteinemia 1 year after gastroplasty for severe obesity</p></title><aug><au><snm>Borson-Chazot</snm><fnm>F</fnm></au><au><snm>Harthe</snm><fnm>C</fnm></au><au><snm>Teboul</snm><fnm>F</fnm></au><au><snm>Labrousse</snm><fnm>F</fnm></au><au><snm>Gaume</snm><fnm>C</fnm></au><au><snm>Guadagnino</snm><fnm>L</fnm></au><au><snm>Claustrat</snm><fnm>B</fnm></au><au><snm>Berthezene</snm><fnm>F</fnm></au><au><snm>Moulin</snm><fnm>P</fnm></au></aug><source>J Clin Endocrinol Metab</source><pubdate>1999</pubdate><volume>84</volume><fpage>541</fpage><lpage>545</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1210/jc.84.2.541</pubid><pubid idtype="pmpid" link="fulltext">10022413</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Body mass index and serum folate in childbearing age women</p></title><aug><au><snm>Mojtabai</snm><fnm>R</fnm></au></aug><source>Eur J Epidemiol</source><pubdate>2004</pubdate><volume>19</volume><fpage>1029</fpage><lpage>1036</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s10654-004-2253-z</pubid><pubid idtype="pmpid" link="fulltext">15648596</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Serum homocysteine, B12 and folic acid concentration in Thai overweight and obese subjects</p></title><aug><au><snm>Tungtrongchitr</snm><fnm>R</fnm></au><au><snm>Pongpaew</snm><fnm>P</fnm></au><au><snm>Tongboonchoo</snm><fnm>C</fnm></au><au><snm>Vudhivai</snm><fnm>N</fnm></au><au><snm>Changbumrung</snm><fnm>S</fnm></au><au><snm>Tungtrongchitr</snm><fnm>A</fnm></au><au><snm>Phonrat</snm><fnm>B</fnm></au><au><snm>Viroonudomphol</snm><fnm>D</fnm></au><au><snm>Pooudong</snm><fnm>S</fnm></au><au><snm>Schelp</snm><fnm>FP</fnm></au></aug><source>Int J Vitam Nutr Res</source><pubdate>2003</pubdate><volume>73</volume><fpage>8</fpage><lpage>14</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1024/0300-9831.73.1.8</pubid><pubid idtype="pmpid">12690905</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Serum folate and homocysteine levels in obese females with non-alcoholic fatty liver</p></title><aug><au><snm>Hirsch</snm><fnm>S</fnm></au><au><snm>Poniachick</snm><fnm>J</fnm></au><au><snm>Avendano</snm><fnm>M</fnm></au><au><snm>Csendes</snm><fnm>A</fnm></au><au><snm>Burdiles</snm><fnm>P</fnm></au><au><snm>Smok</snm><fnm>G</fnm></au><au><snm>Diaz</snm><fnm>JC</fnm></au><au><snm>de la Maza</snm><fnm>MP</fnm></au></aug><source>Nutrition</source><pubdate>2005</pubdate><volume>21</volume><fpage>137</fpage><lpage>141</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.nut.2004.03.022</pubid><pubid idtype="pmpid" link="fulltext">15723740</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Plasma total homocysteine increases from day 20 to 40 in breastfed but not formula-fed low-birthweight infants</p></title><aug><au><snm>Fokkema</snm><fnm>MR</fnm></au><au><snm>Woltil</snm><fnm>HA</fnm></au><au><snm>van Beusekom</snm><fnm>CM</fnm></au><au><snm>Schaafsma</snm><fnm>A</fnm></au><au><snm>Dijck-Brouwer</snm><fnm>DA</fnm></au><au><snm>Muskiet</snm><fnm>FA</fnm></au></aug><source>Acta Paediatr</source><pubdate>2002</pubdate><volume>91</volume><fpage>507</fpage><lpage>511</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1080/080352502753711605</pubid><pubid idtype="pmpid" link="fulltext">12113317</pubid></pubidlist></xrefbib></bibl></refgrp>
   </bm>
</art>