Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

Christopher R Cabanski1, Keary Cavin2, Chris Bizon2, Matthew D Wilkerson3, Joel S Parker34, Kirk C Wilhelmsen234, Charles M Perou34, JS Marron13 and D Neil Hayes35*

Author Affiliations

1 Department of Statistics and Operations Research, Chapel Hill, NC, USA

2 Renaissance Computing Center, Chapel Hill, NC, USA

3 Lineberger Comprehensive Cancer Center, Chapel Hill, NC, USA

4 Department of Genetics, Chapel Hill, NC, USA

5 Department of Internal Medicine, Division of Medical Oncology, Multidisciplinary Thoracic Oncology Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13:221  doi:10.1186/1471-2105-13-221

Published: 4 September 2012

Abstract

Background

Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results.

Results

Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy.

Conclusion

ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.

Keywords:
Next-generation sequencing; Quality score; Recalibration; Bioinformatics; Bioconductor