Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected proceedings from the Automated Function Prediction Meeting 2011

Open Access Proceedings

Homology-based inference sets the bar high for protein function prediction

Tobias Hamp1, Rebecca Kassner1, Stefan Seemayer1, Esmeralda Vicedo1, Christian Schaefer1, Dominik Achten1, Florian Auer1, Ariane Boehm1, Tatjana Braun1, Maximilian Hecht1, Mark Heron1, Peter Hönigschmid1, Thomas A Hopf1, Stefanie Kaufmann1, Michael Kiening1, Denis Krompass1, Cedric Landerer1, Yannick Mahlich1, Manfred Roos1 and Burkhard Rost123*

Author Affiliations

1 TUM, Department of Informatics, Bioinformatics & Computational Biology - I12 Boltzmannstr. 3, 85748 Garching/Munich, Germany

2 Institute of Advanced Study (TUM-IAS) Lichtenbergstr. 2a, 85748 Garching/Munich, Germany

3 New York Consortium on Membrane Protein Structure (NYCOMPS) & Department of Biochemistry and Molecular Biophysics Columbia University, 701 West, 168th Street, New York, NY 10032, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 3):S7  doi:10.1186/1471-2105-14-S3-S7

Published: 28 February 2013

Abstract

Background

Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference.

Methods

Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements.

Results and conclusions

During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.