Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Highlights from the Ninth International Society for Computational Biology (ISCB) Student Council Symposium 2013

Open Access Meeting abstract

alleHap: an efficient algorithm to reconstruct zero-recombinant haplotypes from parent-offspring pedigrees

Nathan Medina-Rodríguez12*, Angelo Santana1, Ana M Wägner3 and José M Quinteiro2

  • * Corresponding author: Nathan Medina-Rodríguez

Author Affiliations

1 Department of Mathematics, Universidad de Las Palmas de Gran Canaria, Campus de Tafira, 35017 Las Palmas, Spain

2 IUMA - Information and Communication Systems, Universidad de Las Palmas de Gran Canaria, Campus de Tafira, 35017 Las Palmas, Spain

3 Department of Medical and Surgical Sciences, Universidad de Las Palmas de Gran Canaria, Campus de Tafira, 35017 Las Palmas, Spain

For all author emails, please log on.

BMC Bioinformatics 2014, 15(Suppl 3):A6  doi:10.1186/1471-2105-15-S3-A6

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/15/S3/A6


Published:11 February 2014

© 2014 Medina-Rodríguez et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Background

Haplotype inference is an essential stage in genetic linkage analysis and estimation methods are also very frequently used to reconstruct haplotypes in current genetic association studies. Most of the latter are focused on haplotype phasing from recombinant DNA areas of unrelated individuals and use likelihood-based methods to infer the presence of alleles in several loci with very time-consuming probabilistic algorithms.

So far, literature does not analyze haplotypes using deterministic techniques, and there are hardly any alternative methods for constructing haplotypes from non-recombinant DNA areas, despite the fact that computational inference by probabilistic models may cause a large number of incorrect inferences.

Description and results

We have developed an algorithm called alleHap, which is able to impute alleles from parent-offspring pedigree databases with missing family members, to later construct their corresponding, unambiguous haplotypes.

The alleHap algorithm is based on a preliminary analysis of all possible combinations that may exist in the genotyping of a family, considering that each member, due to meiosis, should unequivocally have two alleles, one from each parent. The analysis was founded on the differentiation of seven cases, as described in [1], but some of them divided into a maximum of three variants, representing a different combination of alleles of the family members (Table 1).

Table 1. Possible allelic combinations in a parent-offspring pedigree

The classification by cases and variants allows the algorithm to impute missing values efficiently in the loaded database to proceed afterwards to the conformation of corresponding unambiguous haplotypes. Furthermore, the algorithm allows the construction of haplotypes, without any limitation in terms of the number of SNPs, i.e. enables the construction of haplotypes of more than two SNPs.

By analyzing all possible combinations of a parent-offspring pedigree in which parents may be missing, as long as one child has been genotyped, theoretically an unequivocal imputation of three possible parent haplotypes is possible in 92.3% of cases even when one parent is missing. When neither parent has been genotyped, in 36.4% of cases at least two haplotypes can be constructed. Regarding offspring allele imputation with both parents fully genotyped, a minimum of one haplotype for each child may be successfully reconstructed in 6.1% of possible cases.

Evaluation of the results (Figure 1) reveals an optimum performance of alleHap computational tasks, namely Simulation, Imputation and Reconstruction. Their corresponding execution times are quite low even when considering a large number of families (≤ 2000) and SNPs (≤ 50).

thumbnailFigure 1. Representation of computing times according to the number of families (left) and the number of SNPs (right).

Figure 2 shows how our algorithm has high allele imputation rates (about 65%) even when the probability of missing parents in each family is high (>50%). Regarding haplotype reconstruction rates, there is an almost linear relationship between reconstruction rates and the number of missing individuals per family. This is because alleHap is mainly based on the information included in the offspring, so the more children that are missing the more difficult it is to reconstruct the family haplotypes.

thumbnailFigure 2. Representation of allele imputation rates (left) and haplotype reconstruction rates (right).

Conclusions

alleHap has been tested by simulations and also with the Type 1 Diabetes Genetics Consortium [2] database. Our algorithm is very robust against inconsistencies within the genotypic data and consumes very little time, even when handling large amounts of data. The missing data imputation may improve results in numerous epidemiological and/or genetic linkage studies.

Our algorithm could be a useful instrument for information retrieval and knowledge discovery in genetics, since it would allow epidemiological specialists to discover new intergenic patterns by studying zero-recombinant haplotypes with a larger number of SNPs from family-based databases.

References

  1. Berger-Wolf TY, et al.: Reconstruction sibling relationships in wild populations.

    Bioinformatics 2007, 23:i49-i56. PubMed Abstract | Publisher Full Text OpenURL

  2. Rich SS, et al.: The Type 1 Diabetes Genetics Consortium.

    Ann N Y Acad Sci 2006, 1079:1-8. Publisher Full Text OpenURL