Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Ninth International Conference on Bioinformatics (InCoB2010): Bioinformatics

Open Access Proceedings

Integrating diverse biological and computational sources for reliable protein-protein interactions

Min Wu1, Xiaoli Li2*, Hon Nian Chua3, Chee-Keong Kwoh1 and See-Kiong Ng2

Author Affiliations

1 School of Computer Engineering, Nanyang Technological University, Singapore

2 Institute for Infocomm Research, 1 Fusionopolis Way, Singapore

3 Harvard University, 250 Longwood Avenue, SGMB-322 Boston, USA

For all author emails, please log on.

BMC Bioinformatics 2010, 11(Suppl 7):S8  doi:10.1186/1471-2105-11-S7-S8

Published: 15 October 2010

Abstract

Background

Protein-protein interactions (PPIs) play important roles in various cellular processes. However, the low quality of current PPI data detected from high-throughput screening techniques has diminished the potential usefulness of the data. We need to develop a method to address the high data noise and incompleteness of PPI data, namely, to filter out inaccurate protein interactions (false positives) and predict putative protein interactions (false negatives).

Results

In this paper, we proposed a novel two-step method to integrate diverse biological and computational sources of supporting evidence for reliable PPIs. The first step, interaction binning or InterBIN, groups PPIs together to more accurately estimate the likelihood (Bin-Confidence score) that the protein pairs interact for each biological or computational evidence source. The second step, interaction classification or InterCLASS, integrates the collected Bin-Confidence scores to build classifiers and identify reliable interactions.

Conclusions

We performed comprehensive experiments on two benchmark yeast PPI datasets. The experimental results showed that our proposed method can effectively eliminate false positives in detected PPIs and identify false negatives by predicting novel yet reliable PPIs. Our proposed method also performed significantly better than merely using each of individual evidence sources, illustrating the importance of integrating various biological and computational sources of data and evidence.