Defining functional distances over Gene Ontology
1 Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernandez Almagro, 3. E-28029 Madrid, Spain
2 Computational Systems Biology Group, National Biotechnology Centre (CNB-CSIC). Darwin 3, 28049, Madrid, Spain
BMC Bioinformatics 2008, 9:50 doi:10.1186/1471-2105-9-50Published: 25 January 2008
A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology' -GO-). However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms.
We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model Df which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'.
The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments.