Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: The Second Automated Function Prediction Meeting

Open Access Proceedings

CORRIE: enzyme sequence annotation with confidence estimates

Benjamin Audit1*, Emmanuel D Levy23, Wally R Gilks45, Leon Goldovsky26 and Christos A Ouzounis267*

Author Affiliations

1 Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS UMR5672, Ecole Normale Supérieure, 46 Allée d'Italie, F-69364 Lyon CEDEX 07, France

2 Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK

3 Current address: Computational Genomics Group, MRC Laboratory of Molecular Biology, Hills Rd, Cambridge CB2 2QH, UK

4 Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR, UK

5 Current address: Department of Statistics, School of Mathematics, University of Leeds, Leeds LS2 9JT, UK

6 Current address: Computational Genomics Unit, Center for Research & Technology Hellas, PO Box 361, GR-57001 Thessalonica, Greece

7 Current address: Institute of Agrobiotechnology, Center for Research & Technology Hellas, PO Box 361, GR-57001 Thessalonica, Greece

For all author emails, please log on.

BMC Bioinformatics 2007, 8(Suppl 4):S3  doi:10.1186/1471-2105-8-S4-S3

Published: 22 May 2007

Abstract

Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: http://www.genomes.org/services/corrie/ webcite.