Two common profiles exist for genomic oligonucleotide frequencies
Key Laboratory of Gene Engineering of Ministry of Education, and Biotechnology Research Center, Sun Yat-sen University, Guangzhou, 510275, China
BMC Research Notes 2012, 5:639 doi:10.1186/1756-0500-5-639Published: 17 November 2012
It was reported that there is a majority profile for trinucleotide frequencies among genomes. And further study has revealed that two common profiles, rather than one majority profile, exist for genomic trinucleotide frequencies. However, the origins of the common/majority profile remain elusive. Moreover, it is not clear whether the features of common profile may be extended to oligonucleotides other than trinucleotides.
We analyzed 571 prokaryotic genomes (chromosomes) and some selected eukaryotic nuclear genomes as well as other genetic systems to study their compositional features. We found that there are also two common profiles for genomic oligonucleotide frequencies: one is from low-GC content genomes, and the other is from high-GC content genomes. Furthermore, each common profile is highly correlated to the average profile of random sequences with corresponding GC content and generated according to first-order symmetry.
The causes for the existence of two common profiles would mainly be GC content variations and strand symmetry of genomic sequences. Therefore, both GC content and strand symmetry would play important roles in genome evolution.