This article is part of the supplement: Proceedings of the 14th European workshop on QTL mapping and marker assisted selection (QTL-MAS)
Haplotype inference based on Hidden Markov Models in the QTL-MAS 2010 multi-generational dataset
Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, Uppsala, Sweden
BMC Proceedings 2011, 5(Suppl 3):S10 doi:10.1186/1753-6561-5-S3-S10Published: 27 May 2011
We have previously demonstrated an approach for efficient computation of genotype probabilities, and more generally probabilities of allele inheritance in inbred as well as outbred populations. That work also included an extension for haplotype inference, or phasing, using Hidden Markov Models. Computational phasing of multi-thousand marker datasets has not become common as of yet. In this communication, we further investigate the method presented earlier for such problems, in a multi-generational dataset simulated for QTL detection.
When analyzing the dataset simulated for the 14th QTLMAS workshop, the phasing produced showed zero deviations compared to original simulated phase in the founder generation. In total, 99.93% of all markers were correctly phased. 97.68% of the individuals were correct in all markers over all 5 simulated chromosomes. Results were produced over a weekend on a small computational cluster. The specific algorithmic adaptations needed for the Markov model training approach in order to reach convergence are described.
Our method provides efficient, near-perfect haplotype inference allowing the determination of completely phased genomes in dense pedigrees. These developments are of special value for applications where marker alleles are not corresponding directly to QTL alleles, thus necessitating tracking of allele origin, and in complex multi-generational crosses. The cnF2freq codebase, which is in a current state of active development, is available under a BSD-style license.