Log on / register
Feedback | Support | My details
Open AccessHighly AccessMethodology article

A literature-based similarity metric for biological processes

Monica Chagoyen1,2 email, Pedro Carmona-Saez1 email, Concha Gil3,4 email, Jose M Carazo1 email and Alberto Pascual-Montano2 email

1Biocomputing Unit. Centro Nacional de Biotecnologia – CSIC, Madrid, Spain

2Dpto. Arquitectura de Computadores y Automatica. Universidad Complutense de Madrid, Madrid, Spain

3Dpto. Microbiologia II. Facultad de Farmacia. Universidad Complutense de Madrid, Madrid, Spain

4Unidad de Proteomica UCM – Parque Cientifico de Madrid, Madrid, Spain

author email corresponding author email

BMC Bioinformatics 2006, 7:363doi:10.1186/1471-2105-7-363

Published: 26 July 2006

Abstract

Background

Recent analyses in systems biology pursue the discovery of functional modules within the cell. Recognition of such modules requires the integrative analysis of genome-wide experimental data together with available functional schemes. In this line, methods to bridge the gap between the abstract definitions of cellular processes in current schemes and the interlinked nature of biological networks are required.

Results

This work explores the use of the scientific literature to establish potential relationships among cellular processes. To this end we haveused a document based similarity method to compute pair-wise similarities of the biological processes described in the Gene Ontology (GO). The method has been applied to the biological processes annotated for the Saccharomyces cerevisiae genome. We compared our results with similarities obtained with two ontology-based metrics, as well as with gene product annotation relationships. We show that the literature-based metric conserves most direct ontological relationships, while reveals biologically sounded similarities that are not obtained using ontology-based metrics and/or genome annotation.

Conclusion

The scientific literature is a valuable source of information from which to compute similarities among biological processes. The associations discovered by literature analysis are a valuable complement to those encoded in existing functional schemes, and those that arise by genome annotation. These similarities can be used to conveniently map the interlinked structure of cellular processes in a particular organism.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.