Open Access Open Badges Research article

Selection of organisms for the co-evolution-based study of protein interactions

Dorota Herman13, David Ochoa1, David Juan2, Daniel Lopez1, Alfonso Valencia2 and Florencio Pazos1*

Author affiliations

1 Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/Darwin, 3, Cantoblanco, 28049 Madrid, Spain

2 Structural Bioinformatics Group, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández Almagro 3, 28029 Madrid, Spain

3 Centre for Systems Biology (CSB), School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2011, 12:363  doi:10.1186/1471-2105-12-363

Published: 12 September 2011



The prediction and study of protein interactions and functional relationships based on similarity of phylogenetic trees, exemplified by the mirrortree and related methodologies, is being widely used. Although dependence between the performance of these methods and the set of organisms used to build the trees was suspected, so far nobody assessed it in an exhaustive way, and, in general, previous works used as many organisms as possible. In this work we asses the effect of using different sets of organism (chosen according with various phylogenetic criteria) on the performance of this methodology in detecting protein interactions of different nature.


We show that the performance of three mirrortree-related methodologies depends on the set of organisms used for building the trees, and it is not always directly related to the number of organisms in a simple way. Certain subsets of organisms seem to be more suitable for the predictions of certain types of interactions. This relationship between type of interaction and optimal set of organism for detecting them makes sense in the light of the phylogenetic distribution of the organisms and the nature of the interactions.


In order to obtain an optimal performance when predicting protein interactions, it is recommended to use different sets of organisms depending on the available computational resources and data, as well as the type of interactions of interest.