Open Access Highly Accessed Research article

Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

Rolf S Kaas1*, Carsten Friis1, David W Ussery2 and Frank M Aarestrup1

Author Affiliations

1 DTU Food, The Technical University of Denmark, Kgs Lyngby, Denmark

2 Department of Systems Biology, Center for Biological Sequence Analysis, The Technical University of Denmark, Kgs Lyngby, Denmark

For all author emails, please log on.

BMC Genomics 2012, 13:577  doi:10.1186/1471-2164-13-577

Published: 31 October 2012

Additional files

Additional file 1:

Genes used in MLST schemes. Lists of the three groups of genes used in the Mark Achtman, Pasteur institute, and T. Whittam MLST schemes.

Format: PDF Size: 27KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

MLST phylogenies of O157:H7. Four phylogenetic trees inferred from four different MLST schemes. Tree A is inferred from Mark Achtman’s MLST scheme, tree B is inferred from the Pasteur MLST scheme, tree C is inferred from T. Whittam’s MLST scheme and tree D is inferred from the alternative MLST scheme used in this proof of concept case.

Format: PDF Size: 382KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Core tree with all bootstrap values. The tree was created from the alignment of each of the 1,278 core genes from the 186 E. coli genomes. MLST types are annotated to the far right of each genome name. The phylotypes are marked with the colors blue (A), red (B1), purple (B2), green (D), and the Shigella genomes are marked with the color brown.

Format: PDF Size: 15KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Pan-genome tree with all bootstrap values. The tree was created based on the presence or absence of 16,373 HGCs in the 186 E. coli genomes. MLST types are annotated to the far right of each genome name. The phylotypes are marked with the colors blue (A), red (B1), purple (B2), green (D), and the Shigella genomes are marked with the color brown. Bootstrap values are annotated at each node as a percentage between 0 and 100.

Format: PDF Size: 446KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Annotation of highly deviating HGCs. Manual annotation of the 10 HGCs with the highest standard deviation in gene size. The annotation is based on blasting the gene members against the nr database, Uniprot and running the sequences through InterProtScan.

Format: PDF Size: 22KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Complete versus draft nucleotide diversity distributions. The nucleotide diversity distribution is plotted for both the core-HGCs and the pan-HGCs of the three datasets: complete (red), draft1 (blue), and draft2 (green).

Format: PDF Size: 266KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Table of complete dataset. The table shows the dataset used for the article. The “GB genes” column indicates the number of genes annotated in the corresponding GenBank file. The “Prod genes” column indicates the number of genes that was found with prodigal for this study.

Format: PDF Size: 141KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data