This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2012: Bioinformatics
Protein function prediction by collective classification with explicit and implicit edges in protein-protein interaction networks
1 School of Computer Science, and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
2 Research Lab of Information Management, Changzhou University, Jiangsu, China
3 Department of Computer Science & Technology, Tongji University, Shanghai, China
BMC Bioinformatics 2013, 14(Suppl 12):S4 doi:10.1186/1471-2105-14-S12-S4Published: 24 September 2013
Protein function prediction is an important problem in the post-genomic era. Recent advances in experimental biology have enabled the production of vast amounts of protein-protein interaction (PPI) data. Thus, using PPI data to functionally annotate proteins has been extensively studied. However, most existing network-based approaches do not work well when annotation and interaction information is inadequate in the networks.
In this paper, we proposed a new method that combines PPI information and protein sequence information to boost the prediction performance based on collective classification. Our method divides function prediction into two phases: First, the original PPI network is enriched by adding a number of edges that are inferred from protein sequence information. We call the added edges implicit edges, and the existing ones explicit edges correspondingly. Second, a collective classification algorithm is employed on the new network to predict protein function.
We conducted extensive experiments on two real, publicly available PPI datasets. Compared to four existing protein function prediction approaches, our method performs better in many situations, which shows that adding implicit edges can indeed improve the prediction performance. Furthermore, the experimental results also indicate that our method is significantly better than the compared approaches in sparsely-labeled networks, and it is robust to the change of the proportion of annotated proteins.