Table 1

Effect of gene predictions

Data set

Observed

ORFans

Chao

Bin. mix.


Original NCBI

12599

5438

26614

42640

Reduced 10%

11273

4470

22549

32528

Reduced 50%

9336

3272

17083

27456

Easygene

9211

3121

17041

29818


The number of observed gene families in data set, the number of ORFans (gene families found in 1 genome only), Chao estimates and binomial mixture estimates of pan-genome size for the original E. coli data as well as reduced data sets. "Reduced 10%" means the 10% shortest hypothetical proteins were removed from the original data set, and correspondingly for "Reduced 50%". "Easygene" is a new data set with genes predicted by the Easygene gene prediction tool.

Snipen et al. BMC Genomics 2009 10:385   doi:10.1186/1471-2164-10-385

Open Data