Table 1

Definition of the terms used.

Term used



Sequences passing quality control (QC) criteria after BaseCall, generated from 454 sequencing using manufacturer specifications.


Reads remaining in the dataset after the first step of our data processing procedure (Table 2).


Sequences differing by at least one base pair substitution or by an indel.

Artifactual sequences or artifactual variants

Sequences or variants that resulted from sequencing errors, polymerase errors and non-specific amplifications of paralogue and pseudogene during PCR (Table 2).

True sequences or true variants

Sequences or variants that were retained after validation at all stages of our stepwise procedure.

Galan et al. BMC Genomics 2010 11:296   doi:10.1186/1471-2164-11-296

Open Data