Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes

Jingchun Sun1, Yan Sun23, Guohui Ding23, Qi Liu4, Chuan Wang2, Youyu He2, Tieliu Shi2, Yixue Li2 and Zhongming Zhao156*

  • * Corresponding author: Zhongming Zhao zzhao@vcu.edu

  • † Equal contributors

Author Affiliations

1 Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA

2 Bioinformation Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China

3 Graduate School, Chinese Academy of Sciences, Shanghai 200031, China

4 School of Life Sciences and Technology, Shanghai Jiaotong University, Shanghai 200240, China

5 Department of Human Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA

6 Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8:414  doi:10.1186/1471-2105-8-414

Published: 26 October 2007

Abstract

Background

Although many genomic features have been used in the prediction of protein-protein interactions (PPIs), frequently only one is used in a computational method. After realizing the limited power in the prediction using only one genomic feature, investigators are now moving toward integration. So far, there have been few integration studies for PPI prediction; one failed to yield appreciable improvement of prediction and the others did not conduct performance comparison. It remains unclear whether an integration of multiple genomic features can improve the PPI prediction and, if it can, how to integrate these features.

Results

In this study, we first performed a systematic evaluation on the PPI prediction in Escherichia coli (E. coli) by four genomic context based methods: the phylogenetic profile method, the gene cluster method, the gene fusion method, and the gene neighbor method. The number of predicted PPIs and the average degree in the predicted PPI networks varied greatly among the four methods. Further, no method outperformed the others when we tested using three well-defined positive datasets from the KEGG, EcoCyc, and DIP databases. Based on these comparisons, we developed a novel integrated method, named InPrePPI. InPrePPI first normalizes the AC value (an integrated value of the accuracy and coverage) of each method using three positive datasets, then calculates a weight for each method, and finally uses the weight to calculate an integrated score for each protein pair predicted by the four genomic context based methods. We demonstrate that InPrePPI outperforms each of the four individual methods and, in general, the other two existing integrated methods: the joint observation method and the integrated prediction method in STRING. These four methods and InPrePPI are implemented in a user-friendly web interface.

Conclusion

This study evaluated the PPI prediction by four genomic context based methods, and presents an integrated evaluation method that shows better performance in E. coli.