Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance

Zhang Zhang12, Jun Li3, Peng Cui14, Feng Ding14, Ang Li12, Jeffrey P Townsend56 and Jun Yu7*

Author Affiliations

1 Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia

2 Current address: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China

3 School of Biological Sciences, The University of Hong Kong, Hong Kong, China

4 Current address: Department of Pharmacology and Toxicology and the Cancer Center, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA

5 Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520, USA

6 Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA

7 CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China

For all author emails, please log on.

BMC Bioinformatics 2012, 13:43  doi:10.1186/1471-2105-13-43

Published: 22 March 2012

Abstract

Background

Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.

Results

Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.

Conclusions

As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.

Keywords:
Codon deviation coefficient; CDC; Codon usage bias; CUB; Statistical significance; Background nucleotide composition; GC content; Purine content; Bootstrapping