BMC Bioinformatics

official impact factor 3.03

Open Access Methodology article

Global haplotype partitioning for maximal associated SNP pairs

Ali Katanforoush1*, Mehdi Sadeghi2,3, Hamid Pezeshk4 and Elahe Elahi5

Author Affiliations

1 Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran

2 National Institute of Genetics Engineering and Biotechnology, Tehran, Iran

3 School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics, Tehran, Iran

4 School of Mathematics, Statistics and Computer Science, and Center of Excellence in Biomathematics, College of Science, University of Tehran, Tehran, Iran

5 Department of Biology, College of Science, University of Tehran, Tehran, Iran

For all author emails, please log on.

BMC Bioinformatics 2009, 10:269 doi:10.1186/1471-2105-10-269

Published: 27 August 2009

Abstract

Background

Global partitioning based on pairwise associations of SNPs has not previously been used to define haplotype blocks within genomes. Here, we define an association index based on LD between SNP pairs. We use the Fisher's exact test to assess the statistical significance of the LD estimator. By this test, each SNP pair is characterized as associated, independent, or not-statistically-significant. We set limits on the maximum acceptable proportion of independent pairs within all blocks and search for the partitioning with maximal proportion of associated SNP pairs. Essentially, this model is reduced to a constrained optimization problem, the solution of which is obtained by iterating a dynamic programming algorithm.

Results

In comparison with other methods, our algorithm reports blocks of larger average size. Nevertheless, the haplotype diversity within the blocks is captured by a small number of tagSNPs. Resampling HapMap haplotypes under a block-based model of recombination showed that our algorithm is robust in reproducing the same partitioning for recombinant samples. Our algorithm performed better than previously reported models in a case-control association study aimed at mapping a single locus trait, based on simulation results that were evaluated by a block-based statistical test. Compared to methods of haplotype block partitioning, we performed best on detection of recombination hotspots.

Conclusion

Our proposed method divides chromosomes into the regions within which allelic associations of SNP pairs are maximized. This approach presents a native design for dimension reduction in genome-wide association studies. Our results show that the pairwise allelic association of SNPs can describe various features of genomic variation, in particular recombination hotspots.