Critical assessment of human metabolic pathway databases: a stepping stone for future integration
1 Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, 1100 DE, Amsterdam, the Netherlands
2 Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, the Netherlands
3 Netherlands Bioinformatics Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, the Netherlands
4 Netherlands Consortium for Systems Biology, University of Amsterdam, PO Box 94215, 1090 GE, Amsterdam, the Netherlands
5 Department of Clinical Chemistry, Laboratory Genetic Metabolic Diseases, Academic Medical Center, University of Amsterdam, PO Box 22700, 1100 DE, Amsterdam, the Netherlands
6 Department of Pediatrics, Emma Children's Hospital, Academic Medical Center, University of Amsterdam, PO Box 22700, 1100 DE, Amsterdam, the Netherlands
BMC Systems Biology 2011, 5:165 doi:10.1186/1752-0509-5-165Published: 14 October 2011
Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts.
We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison.
Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone for such an endeavor.