Open Access Highly Accessed Research article

Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22

Natalia Volfovsky1, Taras K Oleksyk234, Kristine C Cruz2, Ann L Truelove23, Robert M Stephens1 and Michael W Smith23*

Author Affiliations

1 Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick, National Cancer Institute at Frederick, Frederick, MD 21702, USA

2 Laboratory of Genomic Diversity, National Cancer Institute at Frederick, Frederick, MD 21702, USA

3 Basic Research Program, SAIC-Frederick, National Cancer Institute at Frederick, Frederick, MD 21702, USA

4 Department of Biology, University of Puerto Rico, Mayagüez, PR 00681, Puerto Rico

For all author emails, please log on.

BMC Genomics 2009, 10:51  doi:10.1186/1471-2164-10-51

Published: 26 January 2009



Understanding structure and function of human genome requires knowledge of genomes of our closest living relatives, the primates. Nucleotide insertions and deletions (indels) play a significant role in differentiation that underlies phenotypic differences between humans and chimpanzees. In this study, we evaluated distribution, evolutionary history, and function of indels found by comparing syntenic regions of the human and chimpanzee genomes.


Specifically, we identified 6,279 indels of 10 bp or greater in a ~33 Mb alignment between human and chimpanzee chromosome 22. After the exclusion of those in repetitive DNA, 1,429 or 23% of indels still remained. This group was characterized according to the local or genome-wide repetitive nature, size, location relative to genes, and other genomic features. We defined three major classes of these indels, using local structure analysis: (i) those indels found uniquely without additional copies of indel sequence in the surrounding (10 Kb) region, (ii) those with at least one exact copy found nearby, and (iii) those with similar but not identical copies found locally. Among these classes, we encountered a high number of exactly repeated indel sequences, most likely due to recent duplications. Many of these indels (683 of 1,429) were in proximity of known human genes. Coding sequences and splice sites contained significantly fewer of these indels than expected from random expectations, suggesting that selection is a factor in limiting their persistence. A subset of indels from coding regions was experimentally validated and their impacts were predicted based on direct sequencing in several human populations as well as chimpanzees, bonobos, gorillas, and two subspecies of orangutans.


Our analysis demonstrates that while indels are distributed essentially randomly in intergenic and intronic genomic regions, they are significantly under-represented in coding sequences. There are substantial differences in representation of indel classes among genomic elements, most likely caused by differences in their evolutionary histories. Using local sequence context, we predicted origins and phylogenetic relationships of gene-impacting indels in primate species. These results suggest that genome plasticity is a major force behind speciation events separating the great ape lineages.