Open Access Methodology article

Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development

Jennifer I Deegan (née Clark)1*, Emily C Dimmer1 and Christopher J Mungall2*

Author Affiliations

1 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

2 240C Building 64, Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley CA 94720

For all author emails, please log on.

BMC Bioinformatics 2010, 11:530  doi:10.1186/1471-2105-11-530

Published: 25 October 2010

Abstract

Background

The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation.

Results

We have formalized the taxonomic constraints implicit in some GO classes, and specified these at various levels in the ontology. We have also developed an inference system that can be used to check for violations of these constraints in annotations. Using the constraints in conjunction with the inference system, we have detected and removed errors in annotations and improved the structure of the ontology.

Conclusions

Detection of inconsistencies in taxon-specificity enables gradual improvement of the ontologies, the annotations, and the formalized constraints. This is progressively improving the quality of our data. The full system is available for download, and new constraints or proposed changes to constraints can be submitted online at https://sourceforge.net/tracker/?atid=605890&group_id=36855 webcite.