Email updates

Keep up to date with the latest news and content from BMC Evolutionary Biology and BioMed Central.

Open Access Research article

The tendency to recreate ancestral CG dinucleotides in the human genome

Mingkun Li12* and Su-Shing Chen1

Author Affiliations

1 CAS-MPG Partner Institute of Computational Biology, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences, 200000, Shanghai, PR China

2 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, D04103, Leipzig, Germany

For all author emails, please log on.

BMC Evolutionary Biology 2011, 11:3  doi:10.1186/1471-2148-11-3

Published: 5 January 2011

Abstract

Background

The CG dinucleotides are known to be deficient in the human genome, due to a high mutation rate from 5-methylated CG to TG and its complementary pair CA. Meanwhile, many cellular functions rely on these CG dinucleotides, such as gene expression controlled by cytosine methylation status. Thus, CG dinucleotides that provide essential functional substrates should be retained in genomes. How these two conflicting processes regarding the fate of CG dinucleotides - i.e., high mutation rate destroying CG dinucleotides, vs. functional processes that require their preservation remains an unsolved question.

Results

By analyzing the mutation and frequency spectrum of newly derived alleles in the human genome, a tendency towards generating more CGs was observed, which was mainly contributed by an excess number of mutations from CA/TG to CG. Simultaneously, we found a fixation preference for CGs derived from TG/CA rather than CGs generated by other dinucleotides. These tendencies were observed both in intergenic and genic regions. An analysis of Integrated Extended Haplotype Homozygosity provided no evidence of selection for newly derived CGs.

Conclusions

Ancestral CG dinucleotides that were subsequently lost by mutation tend to be recreated in the human genome, as indicated by a biased mutation and fixation pattern favoring new CGs that derived from TG/CA.