Open Access Highly Accessed Open Badges Software

Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

Jost Neigenfind12*, Gabor Gyetvai3, Rico Basekow12, Svenja Diehl2, Ute Achenbach3, Christiane Gebhardt3, Joachim Selbig4 and Birgit Kersten12*

Author Affiliations

1 Bioinformatics, GabiPD team, Max Planck Institute of Molecular Plant Physiology, 14424 Potsdam-Golm, Germany

2 Bioinformatics, Former RZPD German Resource Center for Genome Research GmbH, Heubnerweg 6, D-14059, Berlin, Germany

3 Max Planck Institute for Plant Breeding Research, Carl von Linnè Weg 10, 50829 Köln, Germany

4 Institute of Biochemistry and Biology, University of Potsdam, c/o MPI-MP, 14424 Potsdam, Germany

For all author emails, please log on.

BMC Genomics 2008, 9:356  doi:10.1186/1471-2164-9-356

Published: 30 July 2008



Haplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (Solanum tuberosum). Potato species are tetraploid and highly heterozygous.


Here we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes.


Our results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as a Java JAR file. The software can be downloaded from the webpage of the GABI Primary Database at webcite. The application of SATlotyper will provide haplotype information, which can be used in haplotype association mapping studies of polyploid plants.