This article is part of the supplement: Selected articles from the 7th International Symposium on Bioinformatics Research and Applications (ISBRA'11)
Efficient and accurate greedy search methods for mining functional modules in protein interaction networks
1 School of Computer Science and Engineering, Key Lab of Computer Network & Information Integration, MOE, Southeast University, Nanjing, 210018, China
2 National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
3 Division of Mathematics and Computer Science, University of South Carolina Upstate 800 University Way, Spartanburg, SC 29303, USA
BMC Bioinformatics 2012, 13(Suppl 10):S19 doi:10.1186/1471-2105-13-S10-S19Published: 25 June 2012
Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures.
In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules.
The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms.
Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.