Observed vs. expected frequency of 20-mers in genomes. Mean ratio between observed and expected 20-mers in real genomes versus randomly generated sequences. Ratios were computed independently for 3 different genomes and 3 random sequences of similar %GC composition. Vertical bars show the standard deviation of these ratios. Genomes used for calculations: E. coli str. K-12 substr. MG1655 [50.8% GC], P. aeruginosa PAO1 [66.6% GC], H. influenzae Rd KW20 [38.1% GC], Colwellia psychrerythraea 34H [38.0% GC], Salinibacter ruber DSM 13855 [66.2% GC], Thiobacillus denitrificans ATCC 25259 [66.1% GC], Enterococcus faecalis V583 [37.5% GC], Anaplasma marginale str. St. Maries [49.8% GC] and Nitrosococcus oceani ATCC 19707 [50.3% GC].
Erill and O'Neill BMC Bioinformatics 2009 10:57 doi:10.1186/1471-2105-10-57