Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data

Changjin Hong1, Nathan L Clement2, Spencer Clement3, Saher Sue Hammoud45, Douglas T Carrell4, Bradley R Cairns5, Quinn Snell3, Mark J Clement3 and William Evan Johnson1*

  • * Corresponding author: William E Johnson wej@bu.edu

Author Affiliations

1 Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA

2 Department of Computer Science, University of Texas, Austin, TX, USA

3 Department of Computer Science, Brigham Young University, Provo, UT, USA

4 IVF and Andrology Laboratories, Departments of Surgery, Obstetrics and Gynecology, and Physiology, University of Utah School of Medicine, Salt Lake City, UT, USA

5 Department of Oncological Sciences, Huntsman Cancer Institute, Salt Lake City, UT, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:337  doi:10.1186/1471-2105-14-337

Published: 21 November 2013

Abstract

Background

DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample.

Results

Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods.

Conclusions

The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap webcite. The software runs on multiple threads and multiple processors to increase the alignment speed.

Keywords:
DNA methylation; Bisulfite sequencing; Probabilistic alignment; Parallel processing