Table 1

Comparative analysis of the accuracy and quality of sequences

# of sequences

% of error-free sequences

# of positions

Insertions

Deletions

Mismatch

Ambiguous

Total % of error


GS20 (101)

34015

82.00%

32801429

0.18%

0.13%

0.08%

0.10%

0.49%


Ref 1 (101)

16052

87.12%

1605640

0.15%

0.05%

0.01%

0.01%

0.22%

Ref 2 (101)

16466

60.01%

1600327

0.42%

0.23%

0.04%

0.01%

0.70%

Ref 3 (101)

12215

72.96%

1228804

0.17%

0.19%

0.01%

0.01%

0.38%

Ref 4 (101)

9908

56.43%

984452

0.30%

0.37%

0.03%

0.00%

0.70%

Ref 5 (101)

15880

50.93%

1595718

0.34%

0.48%

0.05%

0.01%

0.88%

Ref 6 (101)

15716

75.17%

1581075

0.25%

0.10%

0.00%

0.01%

0.36%

Total

86237

67.57%

8596016

0.27%

0.23%

0.02%

0.01%

0.53%


Ref 1 (572)

16052

6.75%

5359696

0.52%

0.46%

0.10%

0.12%

1.20%

Ref 2 (552)

16466

9.75%

4789285

0.89%

0.28%

0.10%

0.08%

1.35%

Ref 3 (500)

12215

18.75%

4180478

0.30%

0.35%

0.07%

0.12%

0.84%

Ref 4 (532)

9908

6.88%

2572843

0.56%

0.71%

0.19%

0.11%

1.57%

Ref 5 (592)

15880

7.46%

6171098

0.38%

0.38%

0.06%

0.07%

0.89%

Ref 6 (516)

15716

11.81%

6027338

0.60%

0.17%

0.07%

0.04%

0.88%

Total

86237

10.09%

29100738

0.54%

0.36%

0.09%

0.09%

1.07%


The different types of error are detailed for each reference sequence for 454 sequencing. Errors are classified according to the nomenclature used by Huse et al. (2007): insertions, deletions, mismatches and ambiguous base calls (see materials and methods). Error rates are given for two length categories (first 101 bases vs. full length).

Gilles et al. BMC Genomics 2011 12:245   doi:10.1186/1471-2164-12-245

Open Data