Linked region detection using high-density SNP genotype data via the minimum recombinant model of pedigree haplotype inference
1 Department of Computer Science, City University of Hong Kong, Hong Kong, PR China
2 Department of Paediatrics & Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Hong Kong, PR China
BMC Bioinformatics 2009, 10:216 doi:10.1186/1471-2105-10-216Published: 15 July 2009
With the rapid development of high-throughput genotyping technologies, efficient methods for identifying linked regions using high-density SNP genotype data have become more and more important. Recently, a deterministic method that works very well on SNP genotyping data has been developed (Lin et al. Bioinformatics 2008, 24(1): 86–93). However, that program can only work on a limited number of family structures. In particular, the results (if any) will be poor when the genotype data for the whole chromosome of one of the parents in a nuclear family is missing.
We have developed a software package (LIden) for identifying linked regions using high-density SNP genotype data. We focus on handling the case where the genotype data for the whole chromosome of one of the parents in a nuclear family is missing. We use the minimum recombinant model for haplotype inference in pedigrees. Several local optimization algorithms are used to infer the haplotype of each individual and determine the linked regions based on the inferred haplotype data. We have developed a more flexible method to combine nuclear families to further refine (reduce the length of) the linked regions.
Our new package (LIden) is efficient software for linked region detection using high-density SNP genotype data. LIden can handle some important cases where the existing programs do not work well. In particular, the new package can handle many cases where the genotype data of one of the two parents is missing for the entire chromosome. The running time of the program is O(mn), where m is the number of members in the family and n is the number of SNP sites in the chromosome. LIden is specifically suitable for handling big sized families. This research also demonstrates another practical use of the minimum recombinant model for haplotype inference in pedigrees.