True differences in genomic fluidity φ can be detected from a small number of sampled genomes. (A) Two species with subtle differences in true gene distributions: (i) Species A (blue) as in Figure 1, w/pan genome of 105 genes and core genome of 103 genes; (ii) Species C (red) w/pan genome of 105 genes and core genome of 103 genes. Each genome has 2000 genes randomly chosen from the true gene distribution according to its frequency. (B) The number of genes (y-axis) observed as a function of the number of sampled genomes (x-axis). The observed gene distributions are statistically distinguishable. (C) Fluidity as a function of the number of sampled genomes is an unbiased estimator of the true value (dashed lines within red and blue shaded regions). The shaded regions denote the theoretical prediction for mean and standard deviations as inferred from the jackknife estimate (see Methods).
Kislyuk et al. BMC Genomics 2011 12:32 doi:10.1186/1471-2164-12-32