Open Access Highly Accessed Research article

Critical assessment of human metabolic pathway databases: a stepping stone for future integration

Miranda D Stobbe13, Sander M Houten56, Gerbert A Jansen13, Antoine HC van Kampen1234 and Perry D Moerland13*

Author Affiliations

1 Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, 1100 DE, Amsterdam, the Netherlands

2 Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, the Netherlands

3 Netherlands Bioinformatics Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, the Netherlands

4 Netherlands Consortium for Systems Biology, University of Amsterdam, PO Box 94215, 1090 GE, Amsterdam, the Netherlands

5 Department of Clinical Chemistry, Laboratory Genetic Metabolic Diseases, Academic Medical Center, University of Amsterdam, PO Box 22700, 1100 DE, Amsterdam, the Netherlands

6 Department of Pediatrics, Emma Children's Hospital, Academic Medical Center, University of Amsterdam, PO Box 22700, 1100 DE, Amsterdam, the Netherlands

For all author emails, please log on.

BMC Systems Biology 2011, 5:165  doi:10.1186/1752-0509-5-165

Published: 14 October 2011

Additional files

Additional file 1:

Transferred and obsolete identifiers and EC numbers per database. Number of transferred and obsolete EC numbers, gene and metabolite identifiers for each of the five pathway databases.

Format: PDF Size: 9KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Results of the FatiGO analyses. GO biological processes enriched according to FatiGO for the following comparisons: WS1) genes in the consensus on gene level versus the union of the remaining genes, WS2) all unique genes versus the union of the remaining genes, WS3-WS7): unique genes per database (BiGG, EHMN, HumanCyc, KEGG, Reactome) versus the remaining genes contained in the union, WS8) genes contained in the majority of the databases versus the union of the remaining genes.

Format: XLS Size: 120KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Consensus reactions. Overview of the reactions part of the consensus of all five pathway databases (when not taking into account e-, H+ and H2O). For each consensus reaction the corresponding EC numbers, genes (Entrez Gene IDs), and pathways are also given for each database.

Format: XLS Size: 119KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Pairwise comparison of the five databases on gene, EC, metabolite, and reaction level. Consensus between pairs of databases is calculated as in the main text: (|CDB1 CDB2|/|CDB1 CDB2|) × 100%, where C is the set of entities under consideration. Databases are compared on Entrez Gene IDs, EC numbers, metabolites, and reactions, which were not required to match on e-, H+ and/or H2O.

Format: PDF Size: 49KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

TCA cycle as represented in each of the five metabolic pathway databases. Adapted version of Figure 2 in the main text for each of the metabolic pathway databases separately. Reactions occurring in the TCA cycle for the selected database are highlighted. Metabolites are represented by rectangles, genes by rounded rectangles, and EC numbers by parallelograms. Color indicates how many of the five databases include a specific entity. Color of an arrow indicates the number of databases that agree upon an entire reaction, i.e., all its metabolites (except H+ which was matched separately). 'x' denotes a missing EC number.

Format: PDF Size: 169KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

TCA cycle as represented in each of the five metabolic pathway databases. Breakdown of the TCA cycle per database. WS1) Overview of all reactions, plus corresponding EC numbers and genes. WS2) Reactions of each database; matching reactions are aligned. For reactions that are not part of the TCA cycle consensus an explanation for the differences observed is given in column B (see also section 'Analysis of differences between databases' in the main text). WS3) Metabolites of each database; matching metabolites are aligned. WS4) EC numbers of each database; matching EC numbers are aligned. WS5) Genes of each database; matching genes are aligned. In WS3-WS5 metabolites, EC numbers, and genes are matched across the entire TCA cycle.

Format: XLS Size: 103KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Grouping of pathways into categories. Overview of the manual grouping of pathways of each database into one of eleven categories, see Materials and Methods.

Format: XLS Size: 94KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Identifier types for genes and metabolites present in each of the databases.

Format: PDF Size: 8KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 9:

Metabolite counts per database. For each of the five databases the percentage of metabolites without a chemical formula and the percentage of metabolites without an identifier is indicated. Furthermore, for each pathway database the percentage of metabolites linked to a particular metabolite database (KEGG Compound, KEGG Glycan, ChEBI, PubChem Compound, and CAS) is indicated. We also included the instances of metabolite classes for HumanCyc and members of sets for Reactome, see Materials and Methods.

Format: PDF Size: 7KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

Names of metabolites without a match in any of the four other databases and without any of the five types of metabolite identifiers.

Format: XLS Size: 416KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 11:

Metabolite counts per database for the comparison of core metabolic processes. For the metabolites of the core metabolic processes in each of the five pathway databases the percentage of metabolites without a chemical formula and the percentage of metabolites without an identifier is indicated. Furthermore, for each pathway database the percentage of metabolites linked to a particular metabolite database (KEGG Compound, KEGG Glycan, ChEBI, PubChem Compound, and CAS) is indicated. We also included the instances of metabolite classes for HumanCyc and members of sets for Reactome, see Materials and Methods.

Format: PDF Size: 7KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

TCA cycle: majority vote. Adapted version of Figure 2 in the main text when retaining only the entities that at least three out of five databases agree on. Reactions occurring in the majority are highlighted. Metabolites are represented by rectangles, genes by rounded rectangles, and EC numbers by parallelograms. Color indicates how many of the five databases include a specific entity. Color of an arrow indicates the number of databases that agree upon an entire reaction, i.e., all its metabolites (except H+ which was matched separately).

Format: PDF Size: 44KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 13:

Overview of all reactions and their matches. Overview of all reactions and their matches (when not taking into account e-, H+ and H2O). Rows are colored according to the number of databases that agree on a reaction. For each reaction the corresponding EC numbers, genes (Entrez Gene IDs), and pathways are also given for each database.

Format: XLS Size: 3.5MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 14:

Top-level pathways from Reactome (not) considered in the comparison.

Format: PDF Size: 10KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 15:

Instantiating reactions containing sets of metabolites. Reactome contains reactions defined in terms of sets of metabolites. For five of such reactions the specific instantiations could not be derived automatically, this document gives two detailed examples.

Format: PDF Size: 14KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data