Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Improving mapping and SNP-calling performance in multiplexed targeted next-generation sequencing

Abdou ElSharawy1, Michael Forster1, Nadine Schracke2, Andreas Keller3, Ingo Thomsen1, Britt-Sabina Petersen1, Björn Stade1, Peer Stähler2, Stefan Schreiber14, Philip Rosenstiel1 and Andre Franke1*

Author Affiliations

1 Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany

2 Febit biomed GmbH, Heidelberg, Germany

3 Biomarker Discovery Center, Heidelberg, Germany

4 Department of General Internal Medicine, Campus Kiel, University Hospital S.-H., Kiel, Germany

For all author emails, please log on.

BMC Genomics 2012, 13:417  doi:10.1186/1471-2164-13-417

Published: 22 August 2012

Abstract

Background

Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions).

Results

We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent ‘read-backmapping’ to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach.

Conclusions

We recommend applying our general ‘two-step’ mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.

Keywords:
Two-stage mapping; Read-backmapping; Software performance; SNP discovery; Multiplexed targeted next-generation sequencing