Email updates

Keep up to date with the latest news and content from BMC Systems Biology and BioMed Central.

This article is part of the supplement: Proceedings of the 23rd International Conference on Genome Informatics (GIW 2012)

Open Access Proceedings

A fast and accurate algorithm for single individual haplotyping

Minzhu Xie1*, Jianxin Wang2 and Tao Jiang3

Author Affiliations

1 College of Physics and Information Science, Hunan Normal University, Changsha 410081, P. R. China

2 School of Information Science and Engineering, Central South University, Changsha 410083, P. R. China

3 Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA

For all author emails, please log on.

BMC Systems Biology 2012, 6(Suppl 2):S8  doi:10.1186/1752-0509-6-S2-S8

Published: 12 December 2012

Abstract

Background

Due to the difficulty in separating two (paternal and maternal) copies of a chromosome, most published human genome sequences only provide genotype information, i.e., the mixed information of the underlying two haplotypes. However, phased haplotype information is needed to completely understand complex genetic polymorphisms and to increase the power of genome-wide association studies for complex diseases. With the rapid development of DNA sequencing technologies, reconstructing a pair of haplotypes from an individual's aligned DNA fragments by computer algorithms (i.e., Single Individual Haplotyping) has become a practical haplotyping approach.

Results

In the paper, we combine two measures "errors corrected" and "fragments cut" and propose a new optimization model, called Balanced Optimal Partition (BOP), for single individual haplotyping. The model generalizes two existing models, Minimum Error Correction (MEC) and Maximum Fragments Cut (MFC), and could be made either model by using some extreme parameter values. To solve the model, we design a heuristic dynamic programming algorithm H-BOP. By limiting the number of intermediate solutions at each iteration to an appropriately chosen small integer k, H-BOP is able to solve the model efficiently.

Conclusions

Extensive experimental results on simulated and real data show that when k = 8, H-BOP is generally faster and more accurate than a recent state-of-art algorithm ReFHap in haplotype reconstruction. The running time of H-BOP is linearly dependent on some of the key parameters controlling the input size and H-BOP scales well to large input data. The code of H-BOP is available to the public for free upon request to the corresponding author.