Finite supragenome model results using (K = 6) variable population gene frequency classes. In our previous supragenome analyses carried out with Haemophilus influenzae and Streptococcus pneumoniae we used a version of the finite supragenome model that required fixed population gene frequency classes. This model has been updated to make the optimization function (the log-likelihood of the observed sample gene frequency histogram, i.e., the observed gene frequency class distribution among the |S| strains examined) dependent on the values of the population gene frequency vector (μ) as well as the values of the corresponding mixture coefficient vector (π, for the probability that a gene in a supragenome will be represented in one of the K classes of population gene frequencies). For a given species, the bottom graph plots the values of the vector μ against the product of the estimate of supragenome size and the values of the vector π, all obtained at the maximization of the log-likelihood function.
Boissy et al. BMC Genomics 2011 12:187 doi:10.1186/1471-2164-12-187