Open Access Highly Accessed Research article

Investigating the global genomic diversity of Escherichia coli using a multi-genome DNA microarray platform with novel gene prediction strategies

Scott A Jackson1*, Isha R Patel1, Tammy Barnaba1, Joseph E LeClerc1 and Thomas A Cebula2

Author affiliations

1 Division of Molecular Biology, Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, Maryland 20708, USA

2 Department of Biology, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2011, 12:349  doi:10.1186/1471-2164-12-349

Published: 6 July 2011



The gene content of a diverse group of 183 unique Escherichia coli and Shigella isolates was determined using the Affymetrix GeneChip® E. coli Genome 2.0 Array, originally designed for transcriptome analysis, as a genotyping tool. The probe set design utilized by this array provided the opportunity to determine the gene content of each strain very accurately and reliably. This array constitutes 10,112 independent genes representing four individual E. coli genomes, therefore providing the ability to survey genes of several different pathogen types. The entire ECOR collection, 80 EHEC-like isolates, and a diverse set of isolates from our FDA strain repository were included in our analysis.


From this study we were able to define sets of genes that correspond to, and therefore define, the EHEC pathogen type. Furthermore, our sampling of 63 unique strains of O157:H7 showed the ability of this array to discriminate between closely related strains. We found that individual strains of O157:H7 differed, on average, by 197 probe sets. Finally, we describe an analysis method that utilizes the power of the probe sets to determine accurately the presence/absence of each gene represented on this array.


These elements provide insights into understanding the microbial diversity that exists within extant E. coli populations. Moreover, these data demonstrate that this novel microarray-based analysis is a powerful tool in the field of molecular epidemiology and the newly emerging field of microbial forensics.

genome; diversity; microarray; Escherichia coli; Shigella; O157:H7; gene content; pathogenic determinants