Open Access Highly Accessed Research article

The genome sequence of E. coli W (ATCC 9637): comparative genome analysis and an improved genome-scale reconstruction of E. coli

Colin T Archer1, Jihyun F Kim2, Haeyoung Jeong2, Jin Hwan Park3, Claudia E Vickers1*, Sang Yup Lee3 and Lars K Nielsen1

Author Affiliations

1 Australian Institute for Bioengineering and Nanotechnology, Cnr Cooper and College Rds, The University of Queensland, St Lucia, Queensland 4072 Australia

2 Industrial Biotechnology and Bioenergy Research Center, Korea Research Institute of Bioscience and Biotechnology, 111 Gwahangno, Yuseong-gu, Daejeon, Korea

3 Department of Chemical and Biomolecular Engineering (BK21 program) and Center for Systems and Synthetic Biotechnology, Institute for the BioCentury, KAIST, 335 Gwahangno, Yuseong-gu, Daejeon 305-701, Republic of Korea

For all author emails, please log on.

BMC Genomics 2011, 12:9  doi:10.1186/1471-2164-12-9

Published: 6 January 2011

Additional files

Additional file 1:

List of CDSs which occur once in the genome of one safe strain but more than once in genomes of other safe strains. A list of CDSs which have only one copy in one safe strain, but have more than one ortholog in one or more other safe strains. For example, hokE occurs once in the K-12 genome but multiple times in the W genome. The CDS count of each strain does not reconcile unless these one-to-many and many-to-many relationships are considered. Detailed CDS counts are provided within the file. The counts explain the CDS skew which occurs when counting the number of CDSs in Figure 2 for K-12, B, or ATCC 8739. For example, in ATCC 8739 one copy of EcolC_3064 is present, while two are present in W as ECW_m0635 and ECW_m0636. When shared orthologs are counted the number in the ATCC 8739-W region can be one or two, depending on whether the number of orthologs is taken from W or ATCC 8739s context. We have thus detailed all orthologous CDSs which are found in different copy numbers in the other safe strains genomes.

Format: XLS Size: 24KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Description of supplementary files and instructions for use thereof. Detailed description of the contents of each additional file.

Format: DOC Size: 31KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 3:

Plasmids found in Group B1 strains. Overview and analysis of the integrative elements which are present in each sequenced group B1 strain. Sheet "Group B1 IEs" presents the attachment sites and significant fitness or virulence factors which are present in each integrative element. Sheet "IE sizes" shows the assumed start and finish sites of each integrative element and the elements size. These sizes were used to calculate each group B1 strains genome backbone size.

Format: XLS Size: 20KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Integrative elements found in Group B1 strains. Analysis of the plasmids which are found in sequenced group B1 strains including plasmid size and fitness/virulence factors which are present on each plasmids genome.

Format: XLS Size: 71KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

iCA1273 GSR. A list of the reactions, including GPR associations and constraints (lower bound, upper bound, objective functions) which are present in iCA1273.

Format: XLS Size: 739KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

iCA1273 GSR. iCA1273 in xml format for use with the COBRA Toolbox.

Format: XML Size: 3MB Download file

Open Data

Additional file 7:

List of unique iAF1260 features compared to iCA1273. A list of reactions which are present in iAF1260 but either do not occur in iCA1273 or do occur but have different gene-protein-reaction associations. Data columns are as follows: 1. Reaction abbreviation 2. Function of the reaction 3. Reaction catalysed 4. The genes necessary for the reaction to be catalysed in Boolan format 5. Notes about the reaction including reference to literature which details experimental evidence for the reaction and the PubMed ID of the paper.

Format: XLS Size: 43KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

List of unique iCA1273 reactions and metabolites compared to iAF1260. A list of new reactions and metabolites in iCA1273 which are not found in iAF1260. This file contains the following: 1. "Missing iAF1260 reactions" details reactions which occur in iAF1260 that are not present in W 2. "iCA1273 rxns miss K12 ortho" details reactions from iAF1260 which still occur in iCA1273 but are missing genes which are not present in the W genome. e.g. reaction "RPE" from iAF1260 can be catalyzed by the enzyme encoded by b3386 or b4301. However, in W, an ortholog for b4301 is not present while an ortholog for b3386 is present so the reaction still occurs within the cell.

Format: XLS Size: 246KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Growth phenotype data for E. coli W (ATCC 9637). Results of the Biolog™ growth phenotype assays for E. coli W and E. coli K-12 on a wide range of carbon and nitrogen sources.

Format: XLS Size: 54KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 10:

Comparison between predictions and experimental growth data for K-12 GEM and W GSR. A comparison between K-12 GEM (iAF1260) predicted growth phenotypes and Biolog™ data growth, and between W GEM (iCA1273) predicted growth phenotypes and Biolog™ data growth. Overlap between predicted and actual growth phenotypes is higher in W than in K-12.

Format: XLS Size: 37KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data