We propose a novel method for automatic module extraction from protein-protein interaction networks. While most previous approaches for module discovery are based on graph partitioning , our algorithm can efficiently enumerate all densely connected modules in the network. As currently available interaction data are incomplete, this is a meaningful generalization of clique search techniques . In comparison with partitioning methods, the approach has the following advantages: the user can specify a minimum density for the outcoming modules and has the guarantee that all modules that satisfy this criterion are discovered. Moreover, it provides a natural way to detect overlapping modules. Many proteins are not steadily present in the cell, but are specifically expressed in dependence of cell type, environmental conditions, and developmental state. Therefore we introduce an additional constraint for modules which accounts for differential expression.
We analysed human interaction data from MINT, Intact, HPRD, and DIP in the context of tissue-specific gene expression data in human provided by Su et al. . We discretized the expression information into binary states (expressed versus not expressed) and searched for densely connected modules where all proteins are expressed in at least 3 tissues and all proteins are not expressed in at least 10 tissues. To deal with the fact that protein interaction data contain a high number of false positives, we computed reliability scores for each experimental source. Similarly to the work by Jansen et al. , we used for that purpose a gold standard set of known interactions as well as a gold standard set of false interactions and calculated the likelihood ratio, which was used to assign edge weights to the interaction graph. The density of a module is defined as the sum of the edge weights inside the module divided by the maximal possible weight sum for a module of that size.
Setting the minimum density threshold to 35% and removing modules that are totally contained in other modules, we obtained a set of 949 differentially expressed modules. They were ranked in descending order according to the average weight per node (see ), so larger and denser modules appear first. On the one hand, we discovered known complexes and modules that link strongly cooperating complexes like MCM and ORC. On the other hand, we found extensions of known complexes that confirm hypothetical functional annotation in Uniprot as well as modules which are not contained in the manually curated set of known complexes, but share the same functional annotation. Finally, some modules are candidates for further biological investigation, containing proteins with unknown functional relationships.
We developed a general method for exhaustive dense module extraction from networks. Remarkably, it allows to determine exact P-values for the predicted modules without having to rely on any network model and can easily integrate information from different heterogeneous data sources.
We are grateful to Andreas Rüpp for providing a curated set of known human complexes and to Gunnar Rätsch for his encouragement and support.
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes.