Computation of haplotypes on SNPs subsets: advantage of the "global method"
- Equal contributors
1 Equipe génomique, bioinformatique et pathologies du système immunitaire, INSERM U736, 15 rue de l'École de Médecine, 75006 Paris, France
2 Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, 292 rue Saint-Martin, 75003 Paris, France
3 Children's Foundation Research Center and Center of Genomics and Bioinformatics, University of Tennessee, Memphis, TN, USA
BMC Genetics 2006, 7:50 doi:10.1186/1471-2156-7-50Published: 26 October 2006
Genetic association studies aim at finding correlations between a disease state and genetic variations such as SNPs or combinations of SNPs, termed haplotypes. Some haplotypes have a particular biological meaning such as the ones derived from SNPs located in the promoters, or the ones derived from non synonymous SNPs. All these haplotypes are "subhaplotypes" because they refer only to a part of the SNPs found in the gene. Until now, subhaplotypes were directly computed from the very SNPs chosen to constitute them, without taking into account the rest of the information corresponding to the other SNPs located in the gene. In the present work, we describe an alternative approach, called the "global method", which takes into account all the SNPs known in the region and compare the efficacy of the two "direct" and "global" methods.
We used empirical haplotypes data sets from the GH1 promoter and the APOE gene, and 10 simulated datasets, and randomly introduced in them missing information (from 0% up to 20%) to compare the 2 methods. For each method, we used the PHASE haplotyping software since it was described to be the best. We showed that the use of the "global method" for subhaplotyping leads always to a better error rate than the classical direct haplotyping. The advantage provided by this alternative method increases with the percentage of missing genotyping data (diminution of the average error rate from 25% to less than 10%). We applied the global method software on the GRIV cohort for AIDS genetic associations and some associations previously identified through direct subhaplotyping were found to be erroneous.
The global method for subhaplotyping can reduce, sometimes dramatically, the error rate on patient resolutions and haplotypes frequencies. One should thus use this method in order to minimise the risk of a false interpretation in genetic studies involving subhaplotypes. In practice the global method is always more efficient than the direct method, but a combination method taking into account the level of missing information in each subject appears to be even more interesting when the level of missing information becomes larger (>10%).