Evaluation method for the potential functionome harbored in the genome and metagenome
1 Microbial Genome Research Group, Japan Agency for Marine-Earth Science & Technology (JAMSTEC), 2-15 Natsushima, Yokosuka, 237-0061, Japan
2 Advanced Science & Innovation Group, Mitsubishi Research Institute Inc. (MRI), 2-10-3, Nagata-cho, Chiyoda-ku, Tokyo, 100-8141, Japan
3 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 611-0011, Japan
4 Department of Microbiology, Faculty of Medicine, Kagawa University, 1750-1 Miki, Kagawa, 761-0793, Japan
BMC Genomics 2012, 13:699 doi:10.1186/1471-2164-13-699Published: 12 December 2012
Additional file 1:
Tables S1–S7. Table S1. List of 768 prokaryotic species used in this study. The additional data are available with the online version of this paper. Table S2. Taxonomic patterns of the prokaryotes which complete the KEGG modules (205 pathways, 263 structural complexes, 4 functional sets, and 3 signatures). Functional annotation of each module is listed in Table S3-S5. Figures S1-S3 were drawn based on this table. Table S3. Characterization of the 205 KEGG pathway modules containing submodules based on the module completion patterns in 768 prokaryotic species. Table S4. Characterization of the 263 KEGG structural complex modules containing submodules based on the module completion patterns in 768 prokaryotic species. Table S5. Characterization of the 7 KEGG modules (4 functional sets and 3 signatures) based on the module completion patterns in 768 prokaryotic species. Table S6. Notations of Boolean algebra-like equations for all KEGG modules containing redefined ones. Table S7. Summary of metagenomic sequences of human gut microbiome.
Format: PDF Size: 2.4MB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 2:
Figures S1–S9. Figure S1. Distribution patterns of the completion ratio of the KEGG pathway modules in 768 prokaryotic species. The completion ratio of 205 pathway modules containing submodules were evaluated in this study. Figure S2. Distribution patterns of the completion ratio of the KEGG structural complex modules in 768 prokaryotic species. The module completion ratio of 263 structural complex modules containing submodules was evaluated in this study. Figure S3. Distribution patterns of the completion ratio of the KEGG functional set and signature modules in 768 prokaryotic species. The module completion ratio of 7 functional set and signature modules was evaluated in this study. Figure S4. Distribution of KO identifiers mapped to the module for glyoxylate cycle (M00012) in other pathway modules. KO identifiers, except for K01637 and K01638 colored light green, are also shared in several other modules. Figure S5. Module completion patterns in 8 phenotypically different Bacillus-related species. (A) Pathway module. (B) Structural complex module. bsu, Bacillus subtilis ; bao, Bacillus amyloliquefaciens ; bli, Bacillus licheniformis ; bha, Bacillus halodurans ; bpf, Bacillus pseudofirmus ; oih, Oceanobacillus iheyensis ; gka, Geobacillus kaustophilus ; and gth, Geobacillus thermoglucosidasius. Green characters show rare modules, which are completed by less than 10% of 768 prokaryotic species. Figure S6. Module completion patterns in human and human gut microbiomes. (A)-1–3, Pathway module. (B)-1–3, Structural complex module. Upper histogram shows the module completion pattern in gut microbiomes from 13 healthy individuals . Middle histogram shows module completion patterns in humans. Lower histogram shows module completion patterns in human gut microbiomes plus humans. Green characters show rare modules, which are completed by less than 10% of 768 prokaryotic species. Figure S7. Definition of submodules for the KEGG module with branching. The heme biosynthesis pathway (glutamate => protoheme => siroheme) module (M00212) has branching at the intermediate compound uroporphyrinogen III (C01051), where this module was divided into 2 parts. Submodules are defined as M00121_1 (original), M00121_2 (left-side branching), and M00121_3 (right-side branching). Ovals with C numbers, rectangles with R numbers, and K numbers represent metabolites, enzymatic reactions, and KO, respectively. KO is used for mapping functional annotation of genes to the modules. Black K numbers indicate KO common to all 3 newly redefined submodules (M00121_1, M00121_2, and M00121_3), and blue and red K numbers correspond to reactions specific to M00121_2 and M00121_3, respectively. Figure S8. Positive predictive values (ppv) of the KO reassignment tests by KAAS. We performed KO reassignment tests for 30 species (7 eukaryotes, 20 bacteria, 3 archaea) by original (old) and improved (new) KAAS and found that new KAAS showed 2-5% improvements compared with the old KAAS. Three letter codes in X axis indicate species. abbreviations as follows: hsa: Homo sapiens, dre: Danio rerio, dme: Drosophila melanogaster, cel: Caenorhabditis elegans, ath: Arabidopsis thaliana, sce: Saccharomyces cerevisiae, cho: Cryptosporidium hominis, eco: Escherichia coli, nme: Neisseria meningitidis, hpy:Helicobacter pylori, rpr: Ricketsia prowazekii, bsu:Bacillus subtilis, sau: Staphylococcus aureus, lmo: Listeria monocytogenes, lla: Lactococcus lactis, lpl: Lactobacillus plantarum, cau: Chloroflexus aurantiacus, mge: Mycoplasma genitalium, mtu: Mycobacterium tuberculosis, blo: Bifidobacterium longum, ctr:Chlamydia trachomatis, pcu:Protochlamydia amoebophila, bbu:Borrelia burgdorferi, syn: Synechocystis sp., bth: Bacteroides thetaiotaomicron, dra: Deinococcus radiodurans, aae:Aquifex aeolicus, mja: Methanocaldococcus jannaschii, ape:Aeropyrum pernix, neq:Nanoarchaeum equitans.Blue bar: old KAAS, Red bar: new KAAS. Figure S9. Effect of database dependency on accuracy of the KO assignment. Escherichia coli isolated from Norwegian infant (Draft genome sequenced by 454 GS FLX Titanium). Blue diamonds show the results using the data set without proteins from the genera Escherichia, Salmonella, Shigella, and Yersinia (1,239 species). Similarly, red squares, green triangles, and purple dots show the results without proteins from the order Enterobacteriales (1,200 species), class Gammaproteobacteria (1,040 species), and phylum Proteobacteria (755 species), respectively. KO identifiers specific to the genera Escherichia, Salmonella, Shigella, and Yersinia (16 KO identifiers), order Enterobacteriales (90), class Gammaproteobacteria (203), or phylum Proteobacteria (370) were removed in advance from the protein data set. Here, the accuracy is defined by the sensitivity TP/(TP+FN), where TP and FN are the numbers of true positives and false negatives, respectively. We also used truncated proteins to confirm effect of amino acid (a.a.) sequence lengths on the accuracy of KO assignments. The 4,410 proteins from E. coli isolate were randomly fragmented into 50, 60, 80, 100, 120, 150, and 200 a.a. in length, and each length of a.a. sequences was used for verification of the accuracy of KO assignment.
Format: PDF Size: 13.5MB Download file
This file can be viewed with: Adobe Acrobat Reader