The density of the human genome in sequence space. For every randomly generated n-mer that was detected in the human genome, we generated all single basepair variants (3n variants for each n-mer) and tested them to see if they were also represented in the human genome (1nn). We also generated 3n of the 2 bp variants (2nn), 3n of the 3 bp variants, and so on up to variants that differed in 10 bp from the original human n-mer. The sequences that are only a few SNPs away from the original human n-mer are significantly more likely to be in the human genome compared to a random n-mer (black bars, "random"). This shows that the human genome is relatively compact in sequence space. The standard error for all points is < 0.003.
Liu et al. BMC Genomics 2008 9:509 doi:10.1186/1471-2164-9-509