Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Methodology article

Genomic fluidity: an integrative view of gene diversity within microbial populations

Andrey O Kislyuk15, Bart Haegeman2, Nicholas H Bergman13 and Joshua S Weitz14*

Author affiliations

1 School of Biology, Georgia Institute of Technology, Atlanta, GA 30332 USA

2 INRIA Research Team MERE, UMR MISTEA, 34060 Montpellier, France

3 National Biodefense Analysis and Countermeasures Center, Frederick, MD 21702, USA

4 School of Physics, Georgia Institute of Technology, Atlanta, GA 30332 USA

5 Current Address: Pacific Biosciences, Menlo Park, CA 94025 USA

For all author emails, please log on.

Citation and License

BMC Genomics 2011, 12:32  doi:10.1186/1471-2164-12-32

Published: 13 January 2011

Abstract

Background

The dual concepts of pan and core genomes have been widely adopted as means to assess the distribution of gene families within microbial species and genera. The core genome is the set of genes shared by a group of organisms; the pan genome is the set of all genes seen in any of these organisms. A variety of methods have provided drastically different estimates of the sizes of pan and core genomes from sequenced representatives of the same groups of bacteria.

Results

We use a combination of mathematical, statistical and computational methods to show that current predictions of pan and core genome sizes may have no correspondence to true values. Pan and core genome size estimates are problematic because they depend on the estimation of the occurrence of rare genes and genomes, respectively, which are difficult to estimate precisely because they are rare. Instead, we introduce and evaluate a robust metric - genomic fluidity - to categorize the gene-level similarity among groups of sequenced isolates. Genomic fluidity is a measure of the dissimilarity of genomes evaluated at the gene level.

Conclusions

The genomic fluidity of a population can be estimated accurately given a small number of sequenced genomes. Further, the genomic fluidity of groups of organisms can be compared robustly despite variation in algorithms used to identify genes and their homologs. As such, we recommend that genomic fluidity be used in place of pan and core genome size estimates when assessing gene diversity within genomes of a species or a group of closely related organisms.