Additional file 2.
Figures S1–S9. Figure S1. Distribution patterns of the completion ratio of the KEGG pathway modules in 768 prokaryotic species. The completion ratio of 205 pathway modules containing submodules were evaluated in this study. Figure S2. Distribution patterns of the completion ratio of the KEGG structural complex modules in 768 prokaryotic species. The module completion ratio of 263 structural complex modules containing submodules was evaluated in this study. Figure S3. Distribution patterns of the completion ratio of the KEGG functional set and signature modules in 768 prokaryotic species. The module completion ratio of 7 functional set and signature modules was evaluated in this study. Figure S4. Distribution of KO identifiers mapped to the module for glyoxylate cycle (M00012) in other pathway modules. KO identifiers, except for K01637 and K01638 colored light green, are also shared in several other modules. Figure S5. Module completion patterns in 8 phenotypically different Bacillus-related species. (A) Pathway module. (B) Structural complex module. bsu, Bacillus subtilis ; bao, Bacillus amyloliquefaciens ; bli, Bacillus licheniformis ; bha, Bacillus halodurans ; bpf, Bacillus pseudofirmus ; oih, Oceanobacillus iheyensis ; gka, Geobacillus kaustophilus ; and gth, Geobacillus thermoglucosidasius. Green characters show rare modules, which are completed by less than 10% of 768 prokaryotic species. Figure S6. Module completion patterns in human and human gut microbiomes. (A)-1–3, Pathway module. (B)-1–3, Structural complex module. Upper histogram shows the module completion pattern in gut microbiomes from 13 healthy individuals . Middle histogram shows module completion patterns in humans. Lower histogram shows module completion patterns in human gut microbiomes plus humans. Green characters show rare modules, which are completed by less than 10% of 768 prokaryotic species. Figure S7. Definition of submodules for the KEGG module with branching. The heme biosynthesis pathway (glutamate => protoheme => siroheme) module (M00212) has branching at the intermediate compound uroporphyrinogen III (C01051), where this module was divided into 2 parts. Submodules are defined as M00121_1 (original), M00121_2 (left-side branching), and M00121_3 (right-side branching). Ovals with C numbers, rectangles with R numbers, and K numbers represent metabolites, enzymatic reactions, and KO, respectively. KO is used for mapping functional annotation of genes to the modules. Black K numbers indicate KO common to all 3 newly redefined submodules (M00121_1, M00121_2, and M00121_3), and blue and red K numbers correspond to reactions specific to M00121_2 and M00121_3, respectively. Figure S8. Positive predictive values (ppv) of the KO reassignment tests by KAAS. We performed KO reassignment tests for 30 species (7 eukaryotes, 20 bacteria, 3 archaea) by original (old) and improved (new) KAAS and found that new KAAS showed 2-5% improvements compared with the old KAAS. Three letter codes in X axis indicate species. abbreviations as follows: hsa: Homo sapiens, dre: Danio rerio, dme: Drosophila melanogaster, cel: Caenorhabditis elegans, ath: Arabidopsis thaliana, sce: Saccharomyces cerevisiae, cho: Cryptosporidium hominis, eco: Escherichia coli, nme: Neisseria meningitidis, hpy:Helicobacter pylori, rpr: Ricketsia prowazekii, bsu:Bacillus subtilis, sau: Staphylococcus aureus, lmo: Listeria monocytogenes, lla: Lactococcus lactis, lpl: Lactobacillus plantarum, cau: Chloroflexus aurantiacus, mge: Mycoplasma genitalium, mtu: Mycobacterium tuberculosis, blo: Bifidobacterium longum, ctr:Chlamydia trachomatis, pcu:Protochlamydia amoebophila, bbu:Borrelia burgdorferi, syn: Synechocystis sp., bth: Bacteroides thetaiotaomicron, dra: Deinococcus radiodurans, aae:Aquifex aeolicus, mja: Methanocaldococcus jannaschii, ape:Aeropyrum pernix, neq:Nanoarchaeum equitans.Blue bar: old KAAS, Red bar: new KAAS. Figure S9. Effect of database dependency on accuracy of the KO assignment. Escherichia coli isolated from Norwegian infant (Draft genome sequenced by 454 GS FLX Titanium). Blue diamonds show the results using the data set without proteins from the genera Escherichia, Salmonella, Shigella, and Yersinia (1,239 species). Similarly, red squares, green triangles, and purple dots show the results without proteins from the order Enterobacteriales (1,200 species), class Gammaproteobacteria (1,040 species), and phylum Proteobacteria (755 species), respectively. KO identifiers specific to the genera Escherichia, Salmonella, Shigella, and Yersinia (16 KO identifiers), order Enterobacteriales (90), class Gammaproteobacteria (203), or phylum Proteobacteria (370) were removed in advance from the protein data set. Here, the accuracy is defined by the sensitivity TP/(TP+FN), where TP and FN are the numbers of true positives and false negatives, respectively. We also used truncated proteins to confirm effect of amino acid (a.a.) sequence lengths on the accuracy of KO assignments. The 4,410 proteins from E. coli isolate were randomly fragmented into 50, 60, 80, 100, 120, 150, and 200 a.a. in length, and each length of a.a. sequences was used for verification of the accuracy of KO assignment.
Format: PDF Size: 13.5MB Download file
This file can be viewed with: Adobe Acrobat Reader
Takami et al. BMC Genomics 2012 13:699 doi:10.1186/1471-2164-13-699