WinHAP2: an extremely fast haplotype phasing program for long genotype sequences
1 School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China
2 Anhui Province-MOST Co-Key Laboratory of High Performance Computing and Its Application, University of Science and Technology of China, Hefei, Anhui 230027, P.R. China
3 Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, P.R. China
BMC Bioinformatics 2014, 15:164 doi:10.1186/1471-2105-15-164Published: 30 May 2014
The haplotype phasing problem tries to screen for phenotype associated genomic variations from millions of candidate data. Most of the current computer programs handle this problem with high requirements of computing power and memory. By replacing the computation-intensive step of constructing the maximum spanning tree with a heuristics of estimated initial haplotype, we released the WinHAP algorithm version 1.0, which outperforms the other algorithms in terms of both running speed and overall accuracy.
This work further speeds up the WinHAP algorithm to version 2.0 (WinHAP2) by utilizing the divide-and-conquer strategy and the OpenMP parallel computing mode. WinHAP2 can phase 500 genotypes with 1,000,000 SNPs using just 12.8 MB in memory and 2.5 hours on a personal computer, whereas the other programs require unacceptable memory or running times. The parallel running mode further improves WinHAP2's running speed with several orders of magnitudes, compared with the other programs, including Beagle, SHAPEIT2 and 2SNP.
WinHAP2 is an extremely fast haplotype phasing program which can handle a large-scale genotyping study with any number of SNPs in the current literature and at least in the near future.