Email updates

Keep up to date with the latest news and content from BMC Systems Biology and BioMed Central.

This article is part of the supplement: Proceedings of the 23rd International Conference on Genome Informatics (GIW 2012)

Open Access Proceedings

A semi-supervised boosting SVM for predicting hot spots at protein-protein Interfaces

Bin Xu1, Xiaoming Wei1, Lei Deng1, Jihong Guan1* and Shuigeng Zhou2*

Author affiliations

1 Department of Computer Science and Technology, Tongji University, Shanghai 201804, China

2 Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200433, China

For all author emails, please log on.

Citation and License

BMC Systems Biology 2012, 6(Suppl 2):S6  doi:10.1186/1752-0509-6-S2-S6

Published: 12 December 2012

Abstract

Background

Hot spots are residues contributing the most of binding free energy yet accounting for a small portion of a protein interface. Experimental approaches to identify hot spots such as alanine scanning mutagenesis are expensive and time-consuming, while computational methods are emerging as effective alternatives to experimental approaches.

Results

In this study, we propose a semi-supervised boosting SVM, which is called sbSVM, to computationally predict hot spots at protein-protein interfaces by combining protein sequence and structure features. Here, feature selection is performed using random forests to avoid over-fitting. Due to the deficiency of positive samples, our approach samples useful unlabeled data iteratively to boost the performance of hot spots prediction. The performance evaluation of our method is carried out on a dataset generated from the ASEdb database for cross-validation and a dataset from the BID database for independent test. Furthermore, a balanced dataset with similar amounts of hot spots and non-hot spots (65 and 66 respectively) derived from the first training dataset is used to further validate our method. All results show that our method yields good sensitivity, accuracy and F1 score comparing with the existing methods.

Conclusion

Our method boosts prediction performance of hot spots by using unlabeled data to overcome the deficiency of available training data. Experimental results show that our approach is more effective than the traditional supervised algorithms and major existing hot spot prediction methods.