Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Effects of protein interaction data integration, representation and reliability on the use of network properties for drug target prediction

Antonio Mora12 and Ian M Donaldson12*

Author Affiliations

1 Department for Molecular Biosciences, University of Oslo, P.O. Box 1041, Oslo, Blindern 0316, Norway

2 The Biotechnology Centre of Oslo, University of Oslo, P.O. Box 1125, Oslo, Blindern 0317, Norway

For all author emails, please log on.

BMC Bioinformatics 2012, 13:294  doi:10.1186/1471-2105-13-294

Published: 12 November 2012



Previous studies have noted that drug targets appear to be associated with higher-degree or higher-centrality proteins in interaction networks. These studies explicitly or tacitly make choices of different source databases, data integration strategies, representation of proteins and complexes, and data reliability assumptions. Here we examined how the use of different data integration and representation techniques, or different notions of reliability, may affect the efficacy of degree and centrality as features in drug target prediction.


Fifty percent of drug targets have a degree of less than nine, and ninety-five percent have a degree of less than ninety. We found that drug targets are over-represented in higher degree bins – this relationship is only seen for the consolidated interactome and it is not dependent on n-ary interaction data or its representation. Degree acts as a weak predictive feature for drug-target status and using more reliable subsets of the data does not increase this performance. However, performance does increase if only cancer-related drug targets are considered. We also note that a protein’s membership in pathway records can act as a predictive feature that is better than degree and that high-centrality may be an indicator of a drug that is more likely to be withdrawn.


These results show that protein interaction data integration and cleaning is an important consideration when incorporating network properties as predictive features for drug-target status. The provided scripts and data sets offer a starting point for further studies and cross-comparison of methods.