Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected proceedings from the Automated Function Prediction Meeting 2011

Open Access Proceedings

Combining heterogeneous data sources for accurate functional annotation of proteins

Artem Sokolov1, Christopher Funk2, Kiley Graim1, Karin Verspoor23 and Asa Ben-Hur4

Author Affiliations

1 Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA

2 Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA

3 National ICT Australia, Victoria Research Lab, Melbourne 3010, Australia

4 Department of Computer Science, Colorado State University, Fort Collins, Colorado 80523, USA

BMC Bioinformatics 2013, 14(Suppl 3):S10  doi:10.1186/1471-2105-14-S3-S10

Published: 28 February 2013

Abstract

Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to GOstruct, a structured-output framework for function annotation of proteins. The extended framework can learn from disparate data sources, with each data source provided to the framework in the form of a kernel. Our empirical results demonstrate that the multi-view framework is able to utilize all available information, yielding better performance than sequence-based models trained across species and models trained from collections of data within a given species. This version of GOstruct participated in the recent Critical Assessment of Functional Annotations (CAFA) challenge; since then we have significantly improved the natural language processing component of the method, which now provides performance that is on par with that provided by sequence information. The GOstruct framework is available for download at http://strut.sourceforge.net webcite.