Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Methodology article

Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications

Ziwen He1, Xinnian Li1, Shaoping Ling2, Yun-Xin Fu3, Eric Hungate4, Suhua Shi1* and Chung-I Wu124*

Author Affiliations

1 State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, Sun Yat-sen University, 135 Xingang West Road, Guangzhou 510275, China

2 CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 1 Beichen West Road, Beijing 100101, China

3 Human Genetics Center, University of Texas School of Public Health, 1200 Herman Presser Drive, Houston, TX 77030, USA

4 Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, IL 60637, USA

For all author emails, please log on.

BMC Genomics 2013, 14:535  doi:10.1186/1471-2164-14-535

Published: 7 August 2013

Abstract

Background

As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data.

Results

By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low.

Conclusions

In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice.

Keywords:
Next generation sequencing; DNA polymorphism; Sequencing error; Pooled sample; Dual sequencing applications