Log on / register
Feedback | Support | My details
Open AccessResearch article

A method for probabilistic mapping between protein structure and function taxonomies through cross training

Kshitiz Gupta1,2* email, Vivek Sehgal2,3,4* email and Andre Levchenko1 email

The Whitaker Institute for Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA

Department of Computer Science & Engineering, Indian Institute of Technology, Bombay, Mumbai, India

Department of Computer Science, University of Maryland, College ParkCollege Park, MD, USA

Yahoo! Inc., 701 First Avenue, Sunnyvale, CA, USA

author email corresponding author email* Contributed equally

BMC Structural Biology 2008, 8:40doi:10.1186/1472-6807-8-40

Published: 3 October 2008

Abstract

Background

Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge on protein classification exists in the form of independently created protein classification databases. By creating probabilistic maps between classes of structural classification databases (e.g. SCOP [1]) and classes of functional classification databases (e.g. PROSITE [2]), structure and function of proteins could be probabilistically related.

Results

We demonstrate that PROSITE and SCOP have significant semantic overlap, in spite of independent classification schemes. By training classifiers of SCOP using classes of PROSITE as attributes and vice versa, accuracy of Support Vector Machine classifiers for both SCOP and PROSITE was improved. Novel attributes, 2-D elastic profiles and Blocks were used to improve time complexity and accuracy. Many relationships were extracted between classes of SCOP and PROSITE using decision trees.

Conclusion

We demonstrate that presented approach can discover new probabilistic relationships between classes of different taxonomies and render a more accurate classification. Extensive mappings between existing protein classification databases can be created to link the large amount of organized data. Probabilistic maps were created between classes of SCOP and PROSITE allowing predictions of structure using function, and vice versa. In our experiments, we also found that functions are indeed more strongly related to structure than are structure to functions.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.