Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Development of an unbiased statistical method for the analysis of unigenic evolution

Colleen D Behrsin1, Chris J Brandl1, David W Litchfield1, Brian H Shilton1 and Lindi M Wahl2*

Author Affiliations

1 Department of Biochemistry, University of Western Ontario, London, Ontario, Canada

2 Department of Applied Mathematics, University of Western Ontario, London, Ontario, Canada

For all author emails, please log on.

BMC Bioinformatics 2006, 7:150  doi:10.1186/1471-2105-7-150

Published: 17 March 2006

Abstract

Background

Unigenic evolution is a powerful genetic strategy involving random mutagenesis of a single gene product to delineate functionally important domains of a protein. This method involves selection of variants of the protein which retain function, followed by statistical analysis comparing expected and observed mutation frequencies of each residue. Resultant mutability indices for each residue are averaged across a specified window of codons to identify hypomutable regions of the protein. As originally described, the effect of changes to the length of this averaging window was not fully eludicated. In addition, it was unclear when sufficient functional variants had been examined to conclude that residues conserved in all variants have important functional roles.

Results

We demonstrate that the length of averaging window dramatically affects identification of individual hypomutable regions and delineation of region boundaries. Accordingly, we devised a region-independent chi-square analysis that eliminates loss of information incurred during window averaging and removes the arbitrary assignment of window length. We also present a method to estimate the probability that conserved residues have not been mutated simply by chance. In addition, we describe an improved estimation of the expected mutation frequency.

Conclusion

Overall, these methods significantly extend the analysis of unigenic evolution data over existing methods to allow comprehensive, unbiased identification of domains and possibly even individual residues that are essential for protein function.