Large-scale prediction of protein-protein interactions from structures
1 Mines ParisTech, Centre for Computational Biology, 35 rue Saint-Honoré, F-77305 Fontainebleau, France
2 Institut Curie, F-75248, Paris, France
3 INSERM U900, F-75248, Paris, France
4 Department of Biochemistry, University of Washington, Seattle, WA, USA
5 Department of Genome Sciences University of Washington, Seattle, WA, USA
6 Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA
BMC Bioinformatics 2010, 11:144 doi:10.1186/1471-2105-11-144Published: 18 March 2010
The prediction of protein-protein interactions is an important step toward the elucidation of protein functions and the understanding of the molecular mechanisms inside the cell. While experimental methods for identifying these interactions remain costly and often noisy, the increasing quantity of solved 3D protein structures suggests that in silico methods to predict interactions between two protein structures will play an increasingly important role in screening candidate interacting pairs. Approaches using the knowledge of the structure are presumably more accurate than those based on sequence only. Approaches based on docking protein structures solve a variant of this problem, but these methods remain very computationally intensive and will not scale in the near future to the detection of interactions at the level of an interactome, involving millions of candidate pairs of proteins.
Here, we describe a computational method to predict efficiently in silico whether two protein structures interact. This yes/no question is presumably easier to answer than the standard protein docking question, "How do these two protein structures interact?" Our approach is to discriminate between interacting and non-interacting protein pairs using a statistical pattern recognition method known as a support vector machine (SVM). We demonstrate that our structure-based method performs well on this task and scales well to the size of an interactome.
The use of structure information for the prediction of protein interaction yields significantly better performance than other sequence-based methods. Among structure-based classifiers, the SVM algorithm, combined with the metric learning pairwise kernel and the MAMMOTH kernel, performs best in our experiments.