Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: The Second Automated Function Prediction Meeting

Open Access Proceedings

Using structural motif descriptors for sequence-based binding site prediction

Andreas Henschel1*, Christof Winter1, Wan Kyu Kim12 and Michael Schroeder1

Author Affiliations

1 Biotechnological Center, TU Dresden, Tatzberg 47-51, 01307 Dresden, Germany

2 Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX 78712, USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8(Suppl 4):S5  doi:10.1186/1471-2105-8-S4-S5

Published: 22 May 2007

Abstract

Background

Many protein sequences are still poorly annotated. Functional characterization of a protein is often improved by the identification of its interaction partners. Here, we aim to predict protein-protein interactions (PPI) and protein-ligand interactions (PLI) on sequence level using 3D information. To this end, we use machine learning to compile sequential segments that constitute structural features of an interaction site into one profile Hidden Markov Model descriptor. The resulting collection of descriptors can be used to screen sequence databases in order to predict functional sites.

Results

We generate descriptors for 740 classified types of protein-protein binding sites and for more than 3,000 protein-ligand binding sites. Cross validation reveals that two thirds of the PPI descriptors are sufficiently conserved and significant enough to be used for binding site recognition. We further validate 230 PPIs that were extracted from the literature, where we additionally identify the interface residues. Finally we test ligand-binding descriptors for the case of ATP. From sequences with Swiss-Prot annotation "ATP-binding", we achieve a recall of 25% with a precision of 89%, whereas Prosite's P-loop motif recognizes an equal amount of hits at the expense of a much higher number of false positives (precision: 57%). Our method yields 771 hits with a precision of 96% that were not previously picked up by any Prosite-pattern.

Conclusion

The automatically generated descriptors are a useful complement to known Prosite/InterPro motifs. They serve to predict protein-protein as well as protein-ligand interactions along with their binding site residues for proteins where merely sequence information is available.