Log on / register
Feedback | Support | My details
Open AccessCorrespondence

Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach

Carson Andorf1,3 email, Drena Dobbs2,3,4 email and Vasant Honavar1,3,4 email

1Artificial Intelligence Laboratory, Department of Computer Science, Iowa State University, Ames, Iowa, 50011, USA

2Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, 50011, USA

3Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, Iowa, 50011, USA

4Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, Iowa, 50011, USA

author email corresponding author email

BMC Bioinformatics 2007, 8:284doi:10.1186/1471-2105-8-284

Published: 3 August 2007

Abstract

Background

Incorrectly annotated sequence data are becoming more commonplace as databases increasingly rely on automated techniques for annotation. Hence, there is an urgent need for computational methods for checking consistency of such annotations against independent sources of evidence and detecting potential annotation errors. We show how a machine learning approach designed to automatically predict a protein's Gene Ontology (GO) functional class can be employed to identify potential gene annotation errors.

Results

In a set of 211 previously annotated mouse protein kinases, we found that 201 of the GO annotations returned by AmiGO appear to be inconsistent with the UniProt functions assigned to their human counterparts. In contrast, 97% of the predicted annotations generated using a machine learning approach were consistent with the UniProt annotations of the human counterparts, as well as with available annotations for these mouse protein kinases in the Mouse Kinome database.

Conclusion

We conjecture that most of our predicted annotations are, therefore, correct and suggest that the machine learning approach developed here could be routinely used to detect potential errors in GO annotations generated by high-throughput gene annotation projects.

Editors Note : Authors from the original publication (Okazaki et al.: Nature 2002, 420:563–73) have provided their response to Andorf et al, directly following the correspondence.


© 1999-2008 BioMed Central Ltd unless otherwise stated