Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Assessment of protein set coherence using functional annotations

Monica Chagoyen12*, Jose M Carazo1 and Alberto Pascual-Montano2

Author Affiliations

1 Centro Nacional de Biotecnología – CSIC, Madrid, Spain

2 Dpto. Arquitectura de Computadores y Automática, Universidad Complutense Madrid, Madrid, Spain

For all author emails, please log on.

BMC Bioinformatics 2008, 9:444  doi:10.1186/1471-2105-9-444

Published: 20 October 2008

Abstract

Background

Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set.

Results

In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation.

Conclusion

We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at http://www.cnb.csic.es/~monica/coherence/ webcite