Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Software

Genotype calling in tetraploid species from bi-allelic marker data using mixture models

Roeland E Voorrips13*, Gerrit Gort2 and Ben Vosman13

Author Affiliations

1 Plant Breeding Department, Wageningen University and Research Centre, Wageningen, The Netherlands

2 Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands

3 Centre for BioSystems Genomics, P.O. Box 98, 6700 AB Wageningen, The Netherlands

For all author emails, please log on.

BMC Bioinformatics 2011, 12:172  doi:10.1186/1471-2105-12-172

Published: 19 May 2011

Abstract

Background

Automated genotype calling in tetraploid species was until recently not possible, which hampered genetic analysis. Modern genotyping assays often produce two signals, one for each allele of a bi-allelic marker. While ample software is available to obtain genotypes (homozygous for either allele, or heterozygous) for diploid species from these signals, such software is not available for tetraploid species which may be scored as five alternative genotypes (aaaa, baaa, bbaa, bbba and bbbb; nulliplex to quadruplex).

Results

We present a novel algorithm, implemented in the R package fitTetra, to assign genotypes for bi-allelic markers to tetraploid samples from genotyping assays that produce intensity signals for both alleles. The algorithm is based on the fitting of several mixture models with five components, one for each of the five possible genotypes. The models have different numbers of parameters specifying the relation between the five component means, and some of them impose a constraint on the mixing proportions to conform to Hardy-Weinberg equilibrium (HWE) ratios. The software rejects markers that do not allow a reliable genotyping for the majority of the samples, and it assigns a missing score to samples that cannot be scored into one of the five possible genotypes with sufficient confidence.

Conclusions

We have validated the software with data of a collection of 224 potato varieties assayed with an Illumina GoldenGateā„¢ 384 SNP array and shown that all SNPs with informative ratio distributions are fitted. Almost all fitted models appear to be correct based on visual inspection and comparison with diploid samples. When the collection of potato varieties is analyzed as if it were a population, almost all markers seem to be in Hardy-Weinberg equilibrium. The R package fitTetra is freely available under the GNU Public License from http://www.plantbreeding.wur.nl/UK/software_fitTetra.html webcite and as Additional files with this article.