Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes
1 Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
2 Digital Biology Laboratory, Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
3 Department of Genetics and Cell Biology, Washington State University, Pullman, WA 99164, USA
BMC Evolutionary Biology 2004, 4:19 doi:10.1186/1471-2148-4-19Published: 28 June 2004
Codon usage bias has been widely reported to correlate with GC composition. However, the quantitative relationship between codon usage bias and GC composition across species has not been reported.
Based on an informatics method (SCUO) we developed previously using Shannon informational theory and maximum entropy theory, we investigated the quantitative relationship between codon usage bias and GC composition. The regression based on 70 bacterial and 16 archaeal genomes showed that in bacteria, SCUO = -2.06 * GC3 + 2.05*(GC3)2 + 0.65, r = 0.91, and that in archaea, SCUO = -1.79 * GC3 + 1.85*(GC3)2 + 0.56, r = 0.89. We developed an analytical model to quantify synonymous codon usage bias by GC compositions based on SCUO. The parameters within this model were inferred by inspecting the relationship between codon usage bias and GC composition across 70 bacterial and 16 archaeal genomes. We further simplified this relationship using only GC3. This simple model was supported by computational simulation.
The synonymous codon usage bias could be simply expressed as 1+ (p/2)log2(p/2) + ((1-p)/2)log2((l-p)/2), where p = GC3. The software we developed for measuring SCUO (codonO) is available at http://digbio.missouri.edu/~wanx/cu/codonO webcite.