k-mer distribution. Distribution of coverage levels for k-mers in the sequence reads from chromosome 21. There is a clear distinction between the coverage levels of the 31.7M observed k-mers that are found in the hg18 reference genome sequence compared to the 48.7M k-mers that are not in hg18. Of the k-mers not found in hg18, 44.5M or 99.87%, are observed only once, and are likely sequencing errors. A small fraction of k-mers that do not match hg18 are observed many times in the data; these likely represent SNP differences between the sequenced individual and hg18 and would be retained by the Bloom filter.
Melsted and Pritchard BMC Bioinformatics 2011 12:333 doi:10.1186/1471-2105-12-333