Identification of polymorphic inversions from genotypes
- Equal contributors
1 Center for Research in Environmental Epidemiology (CREAL), and Institut Municipal d'Investigació Mèdica (IMIM), Barcelona 08003, Spain
2 CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Spain
3 Department of Molecular Biology, Cellular Biology and Biochemistry, and Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
4 Department of Computer Science, and Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
5 Institut de Biotecnologia i de Biomedicina, Universitat Autonoma de Barcelona, 08193 Bellaterra, and Institució Catalana de Recerca i Estudis Avancats (ICREA), 08010 Barcelona, Spain
BMC Bioinformatics 2012, 13:28 doi:10.1186/1471-2105-13-28Published: 9 February 2012
Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies.
We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data , utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS).
We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model . We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals [3,4]. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion.