Table 3

Effects of duplicate marking, realignment & recalibration on SNP calling accuracy

Call set

Site discovery


No. SNPs

Ti/Tv ratio


All

Known

Novel

dbSNP%

Known

Novel


Deep coverage with QUAL > 50


initial

96472

71534

24938

74.15%

2.50

1.73

realignment

94595

71374

23221

75.45%

2.50

1.84

recalibration

96316

71518

24798

74.25%

2.50

1.75

mark duplicate

96303

71502

24801

74.24%

2.50

1.73


Shallow coverage with QUAL > 20


initial

780490

607178

173312

77.79%

2.13

1.39

realignment

776560

606806

169754

78.14%

2.13

1.41

recalibration

783387

609601

173786

77.81%

2.13

1.40

mark duplicate

738198

583829

154369

79.09%

2.13

1.53


SNPs were called for 5 samples together by GATK using bases with base quality≥20 and reads with mapping quality ≥20. Only sites with QUAL > 50 for deep-coverage or QUAL > 20 for shallow coverage were considered as potentially variable sites.

Liu et al. BMC Genomics 2012 13(Suppl 8):S8   doi:10.1186/1471-2164-13-S8-S8

Open Data