Mathematical estimation of size of novel gene families, pan-genome, and core gene families. The number of novel gene families (A), pan-genome size (B), and the number of core gene families (C) were estimated for the classical Bordetellae (blue: all eleven genomes), B. bronchiseptica strains only (green: RB50, 253, 1289, MO149, D445, and Bbr77), and all the strains except B. pertussis strains (red: RB50, 253, 1289, MO149, D445, Bbr77, 12822, and Bpp5). If n genomes are selected from 11, there are 11!/ [(n-1)!*(11-n)!] possible combinations. Each possible combination is plotted as a point, and the line is fitted to the power law model adapted from the methods of Tetellin et al. . γ in the power law model for the pan-genome size estimation was reported for each group. The numbers shown on the right side of each graph are the number of expected novel gene families, pan-genome size, and core gene families with 25 genomes.
Park et al. BMC Genomics 2012 13:545 doi:10.1186/1471-2164-13-545