Open Access Open Badges Research article

Conversion events in gene clusters

Giltae Song1*, Chih-Hao Hsu2, Cathy Riemer1, Yu Zhang1, Hie Lim Kim1, Federico Hoffmann3, Louxin Zhang4, Ross C Hardison1, NISC Comparative Sequencing Program5, Eric D Green5 and Webb Miller1

Author affiliations

1 Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, PA 16802 USA

2 Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health (NIH), Bethesda, MD, 20892 USA

3 Department of Biochemistry and Molecular Biology, Mississippi State University, Mississippi State, MS 39760, USA

4 Department of Mathematics, National University of Singapore, 117543, Singapore

5 NIH Intramural Sequencing Center (NISC) and Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Bethesda, MD 20892 USA

For all author emails, please log on.

Citation and License

BMC Evolutionary Biology 2011, 11:226  doi:10.1186/1471-2148-11-226

Published: 28 July 2011



Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments.


To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at webcite.


These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes.