Additional File 1.

Merging of paired end reads efficiently removes adapter sequence for short insert libraries and increases read accuracy. Shown is the average sequencing error of the two simulated raw reads (black) in comparison to the sequencing error remaining after read merging for different adapter start points. The development is shown for two different types of simulated quality scores (red and green). In red, the quality score is the average error observed for the specific base-type in this cycle (i.e. all Adenines at this position in the read have the same quality score), while in green an error-informative quality score was simulated. For this type of quality score a random number between 0 and 10 (uniform sampling) was added to the average quality score of this base when the correct base was simulated and a random number between 0 and 10 (uniform sampling) was subtracted if a wrong base was simulated. The average reduction of error (starting from 0.244%) is 1.93 × (0.126%) for the position-dependent quality scores and 4.98 × (0.049%) for the error-informative quality scores. For sequences shorter or equal to read length (5-101nt) a reduction of error (0.146%) by a factor of 1.62 × (0.090%) and 20.88 × (0.007%) is observed, respectively. Sequences are required to have more than 10nt overlap for merging and merged sequences below 5nt are discarded as adapter dimers by the program.

Format: PDF Size: 33KB Download file

This file can be viewed with: Adobe Acrobat Reader

Kircher et al. BMC Genomics 2011 12:382   doi:10.1186/1471-2164-12-382