Figure 1.

Types of errors. A screenshot from the IGV browser [21] showing three types of error in reads from an Illumina sequencing experiment: (1) A random error likely due to the fact that the position is close to the end of the read. (2) Random error likely due to sequence specific error- in this case a sequence of Cs are probably inducing errors at the end of the low complexity repeat. (3) Systematic error: although it is likely that the GGT sequence motif and the GGC motifs before it created phasing problems leading to the errors, the extent of error is not explained by a random error model. In this case, all the base calls in one direction are wrong as revealed by the 11 overlapping mate-pairs. In particular, all differences from the reference genome are base-call errors, verified by the mate-pair reads, which do not differ from the reference. Given the background error rate, the probability of observing 11 error-pairs at a single location, given that 11 mate-pair reads overlap the location, is 1.5 × 10-26. Moreover, given the presence of such errors at a single location, the probability that all of the errors occur on the same strand (i.e., on the forward mate pair) is <a onClick="popup('','MathML',630,470);return false;" target="_blank" href="">View MathML</a>. Note that the IGV browser made an incorrect SNP call at the systematic error site (colored bar in top panel).

Meacham et al. BMC Bioinformatics 2011 12:451   doi:10.1186/1471-2105-12-451
Download authors' original image