Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

WinHAP2: an extremely fast haplotype phasing program for long genotype sequences

Weihua Pan12, Yanan Zhao12, Yun Xu12* and Fengfeng Zhou3*

Author Affiliations

1 School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China

2 Anhui Province-MOST Co-Key Laboratory of High Performance Computing and Its Application, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China

3 Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, P.R. China

For all author emails, please log on.

BMC Bioinformatics 2014, 15:164  doi:10.1186/1471-2105-15-164

Published: 30 May 2014

Abstract

Background

The haplotype phasing problem tries to screen for phenotype associated genomic variations from millions of candidate data. Most of the current computer programs handle this problem with high requirements of computing power and memory. By replacing the computation-intensive step of constructing the maximum spanning tree with a heuristics of estimated initial haplotype, we released the WinHAP algorithm version 1.0, which outperforms the other algorithms in terms of both running speed and overall accuracy.

Results

This work further speeds up the WinHAP algorithm to version 2.0 (WinHAP2) by utilizing the divide-and-conquer strategy and the OpenMP parallel computing mode. WinHAP2 can phase 500 genotypes with 1,000,000 SNPs using just 12.8 MB in memory and 2.5 hours on a personal computer, whereas the other programs require unacceptable memory or running times. The parallel running mode further improves WinHAP2's running speed with several orders of magnitudes, compared with the other programs, including Beagle, SHAPEIT2 and 2SNP.

Conclusions

WinHAP2 is an extremely fast haplotype phasing program which can handle a large-scale genotyping study with any number of SNPs in the current literature and at least in the near future.

Keywords:
Haplotype phasing; Genotype; SNP; Long sequence; Parallel computing