Open Access Highly Accessed Research article

Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes

Xiu-Feng Wan1,2, Dong Xu2, Andris Kleinhofs3 and Jizhong Zhou1*

Author Affiliations

1 Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA

2 Digital Biology Laboratory, Department of Computer Science, University of Missouri, Columbia, MO 65211, USA

3 Department of Genetics and Cell Biology, Washington State University, Pullman, WA 99164, USA

For all author emails, please log on.

BMC Evolutionary Biology 2004, 4:19 doi:10.1186/1471-2148-4-19

Published: 28 June 2004

Abstract

Background

Codon usage bias has been widely reported to correlate with GC composition. However, the quantitative relationship between codon usage bias and GC composition across species has not been reported.

Results

Based on an informatics method (SCUO) we developed previously using Shannon informational theory and maximum entropy theory, we investigated the quantitative relationship between codon usage bias and GC composition. The regression based on 70 bacterial and 16 archaeal genomes showed that in bacteria, SCUO = -2.06 * GC3 + 2.05*(GC3)2 + 0.65, r = 0.91, and that in archaea, SCUO = -1.79 * GC3 + 1.85*(GC3)2 + 0.56, r = 0.89. We developed an analytical model to quantify synonymous codon usage bias by GC compositions based on SCUO. The parameters within this model were inferred by inspecting the relationship between codon usage bias and GC composition across 70 bacterial and 16 archaeal genomes. We further simplified this relationship using only GC3. This simple model was supported by computational simulation.

Conclusions

The synonymous codon usage bias could be simply expressed as 1+ (p/2)log2(p/2) + ((1-p)/2)log2((l-p)/2), where p = GC3. The software we developed for measuring SCUO (codonO) is available at http://digbio.missouri.edu/~wanx/cu/codonO webcite.