Additional File 3.

Quality score distribution of artifact reads largely overlaps with the quality score distribution of regular reads. Sequences resulting from crystals, dust and lint particles as well as other flow cell features are typically of low complexity (Additional File 2) but only partially of low quality. Plotted is the quality score frequency distribution (PHRED-scale, Ibis base caller) for all reads matching the 'GAC' library tag in the beginning of the read (black, n = 557,466,159 bases from 10,930,709 reads) as well as all sequences not matching the tag sequence and its one base pair substitutions (red, n = 3,481,668 bases from 68,268 reads). The data was obtained from lane 5 of the 080902_BIOLAB29_Run PE51_1 run from the Neandertal Genome project (Green et al: Science 2010).

Kircher et al. BMC Genomics 2011 12:382   doi:10.1186/1471-2164-12-382