Natural selection retains overrepresented out-of-frame stop codons against frameshift peptides in prokaryotes
1 Carol Yu Centre for Infection, Department of Microbiology, The University of Hong Kong, Hong Kong, China
2 Research Centre of Infection and Immunity, The University of Hong Kong, Hong Kong, China
3 Department of Biology, Stanford University, Stanford, California, USA
4 Department of Veterinary Integrative Biosciences, College of Veterinary Medicine, Texas A&M University, Texas, USA
BMC Genomics 2010, 11:491 doi:10.1186/1471-2164-11-491Published: 9 September 2010
Out-of-frame stop codons (OSCs) occur naturally in coding sequences of all organisms, providing a mechanism of early termination of translation in incorrect reading frame so that the metabolic cost associated with frameshift events can be reduced. Given such a functional significance, we expect statistically overrepresented OSCs in coding sequences as a result of a widespread selection. Accordingly, we examined available prokaryotic genomes to look for evidence of this selection.
The complete genome sequences of 990 prokaryotes were obtained from NCBI GenBank. We found that low G+C content coding sequences contain significantly more OSCs and G+C content at specific codon positions were the principal determinants of OSC usage bias in the different reading frames. To investigate if there is overrepresentation of OSCs, we modeled the trinucleotide and hexanucleotide biases of the coding sequences using Markov models, and calculated the expected OSC frequencies for each organism using a Monte Carlo approach. More than 93% of 342 phylogenetically representative prokaryotic genomes contain excess OSCs. Interestingly the degree of OSC overrepresentation correlates positively with G+C content, which may represent a compensatory mechanism for the negative correlation of OSC frequency with G+C content. We extended the analysis using additional compositional bias models and showed that lower-order bias like codon usage and dipeptide bias could not explain the OSC overrepresentation. The degree of OSC overrepresentation was found to correlate negatively with the optimal growth temperature of the organism after correcting for the G+C% and AT skew of the coding sequence.
The present study uses approaches with statistical rigor to show that OSC overrepresentation is a widespread phenomenon among prokaryotes. Our results support the hypothesis that OSCs carry functional significance and have been selected in the course of genome evolution to act against unintended frameshift occurrences. Some results also hint that OSC overrepresentation being a compensatory mechanism to make up for the decrease in OSCs in high G+C organisms, thus revealing the interplay between two different determinants of OSC frequency.