Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

A literature-based similarity metric for biological processes

Monica Chagoyen12*, Pedro Carmona-Saez1, Concha Gil34, Jose M Carazo1 and Alberto Pascual-Montano2

Author Affiliations

1 Biocomputing Unit. Centro Nacional de Biotecnologia – CSIC, Madrid, Spain

2 Dpto. Arquitectura de Computadores y Automatica. Universidad Complutense de Madrid, Madrid, Spain

3 Dpto. Microbiologia II. Facultad de Farmacia. Universidad Complutense de Madrid, Madrid, Spain

4 Unidad de Proteomica UCM – Parque Cientifico de Madrid, Madrid, Spain

For all author emails, please log on.

BMC Bioinformatics 2006, 7:363  doi:10.1186/1471-2105-7-363

Published: 26 July 2006



Recent analyses in systems biology pursue the discovery of functional modules within the cell. Recognition of such modules requires the integrative analysis of genome-wide experimental data together with available functional schemes. In this line, methods to bridge the gap between the abstract definitions of cellular processes in current schemes and the interlinked nature of biological networks are required.


This work explores the use of the scientific literature to establish potential relationships among cellular processes. To this end we haveused a document based similarity method to compute pair-wise similarities of the biological processes described in the Gene Ontology (GO). The method has been applied to the biological processes annotated for the Saccharomyces cerevisiae genome. We compared our results with similarities obtained with two ontology-based metrics, as well as with gene product annotation relationships. We show that the literature-based metric conserves most direct ontological relationships, while reveals biologically sounded similarities that are not obtained using ontology-based metrics and/or genome annotation.


The scientific literature is a valuable source of information from which to compute similarities among biological processes. The associations discovered by literature analysis are a valuable complement to those encoded in existing functional schemes, and those that arise by genome annotation. These similarities can be used to conveniently map the interlinked structure of cellular processes in a particular organism.