Table 2

Effects of data preprocessing on SNP calling accuracy

Call set

(QUAL > = 50)

Site discovery


No. SNPs

Ti/Tv ratio


All

Known

Novel

dbSNP%

Known

Novel


raw

640946

499377

141569

77.91%

2.19

1.65

filterY

630641

490722

139919

77.81%

2.19

1.65

trim

651391

502951

148440

77.21%

2.18

1.58

filterY&trim

640487

493741

146746

77.08%

2.18

1.58


raw: without any preprocessing steps; filterY: removing those reads that fail the Illumina chastity filter; trim: trimming off low-quality tails from reads with the BWA parameter (-q 15); filterY&trim: removing those reads that fail the Illumina chastity filter and trimming off low quality tails. SNPs were called for five samples together by GATK using bases with base quality≥20 and reads with mapping quality ≥20. Only sites with QUAL > = 50 were considered as potentially variable sites.

Liu et al. BMC Genomics 2012 13(Suppl 8):S8   doi:10.1186/1471-2164-13-S8-S8

Open Data