Open Access Highly Accessed Research article

Investigating the global genomic diversity of Escherichia coli using a multi-genome DNA microarray platform with novel gene prediction strategies

Scott A Jackson1*, Isha R Patel1, Tammy Barnaba1, Joseph E LeClerc1 and Thomas A Cebula2

Author Affiliations

1 Division of Molecular Biology, Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, Maryland 20708, USA

2 Department of Biology, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA

For all author emails, please log on.

BMC Genomics 2011, 12:349  doi:10.1186/1471-2164-12-349

Published: 6 July 2011

Additional files

Additional File 1:

Gene differences matrix. Number of gene differences based on strain-to-strain comparisons is shown. A "gene difference" is defined here as a 4-fold difference in the RMA-summarized probe set intensities. The cells are color-coded based on the number of gene differences using the scale below. Strains are ordered based on their relatedness as determined by hierarchical cluster analysis.

Format: PDF Size: 613KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 2:

Pearson correlation matrix. R-Bioconductor was used to calculate Pearson correlation coefficients using RMA-summarized probe set intensities. The cells are color-coded to show relatedness and correlation (coefficient from 0-1) according to the scale below. Strains are ordered based on their relatedness as determined by hierarchical cluster analysis.

Format: PDF Size: 593KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 3:

Hierarchical Cluster Analysis. RMA-summarized probe set intensities were used to hierarchically cluster (Euclidean means) all 207 isolates (top dendrogram) and all 10,208 genes (left dendrogram) in Spotfire. The heatmap shows RMA probe set intensities from low (green) to high (red). The top dendrogram is color-coded based on the 3 large clusters of E. coli.

Format: PDF Size: 3.7MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 4:

Conserved, core, backbone genes in E. coli and Shigella. Using the MAS 5.0 gene detection method, we filtered those probe sets that were called "present" in all 207 isolates. The 2256 conserved probe sets are listed here along with their gene description, when available.

Format: PDF Size: 77KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional File 5:

Conserved Intergenic Regions. Using the MAS 5.0 gene detection method, we filtered those probe sets that were annotated as "intergenic" and called "present" in all 207 isolates. The 232 conserved intergenic probe sets are listed here along with their genome position and length.

Format: PDF Size: 17KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data