This article is part of the supplement: The ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (IJCBS)
BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features
1 Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
2 J.C. Self Research Institute of Human Genetics, Greenwood Genetic Center, Greenwood, SC 29646, USA
3 School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907 USA
4 Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, Indianapolis, Indiana 46202 USA
5 Center for Research in Biological Systems, University of California at San Diego, La Jolla, California 92093-0043, USA
BMC Systems Biology 2010, 4(Suppl 1):S3 doi:10.1186/1752-0509-4-S1-S3Published: 28 May 2010
Understanding how biomolecules interact is a major task of systems biology. To model protein-nucleic acid interactions, it is important to identify the DNA or RNA-binding residues in proteins. Protein sequence features, including the biochemical property of amino acids and evolutionary information in terms of position-specific scoring matrix (PSSM), have been used for DNA or RNA-binding site prediction. However, PSSM is rather designed for PSI-BLAST searches, and it may not contain all the evolutionary information for modelling DNA or RNA-binding sites in protein sequences.
In the present study, several new descriptors of evolutionary information have been developed and evaluated for sequence-based prediction of DNA and RNA-binding residues using support vector machines (SVMs). The new descriptors were shown to improve classifier performance. Interestingly, the best classifiers were obtained by combining the new descriptors and PSSM, suggesting that they captured different aspects of evolutionary information for DNA and RNA-binding site prediction. The SVM classifiers achieved 77.3% sensitivity and 79.3% specificity for prediction of DNA-binding residues, and 71.6% sensitivity and 78.7% specificity for RNA-binding site prediction.
Predictions at this level of accuracy may provide useful information for modelling protein-nucleic acid interactions in systems biology studies. We have thus developed a web-based tool called BindN+ (http://bioinfo.ggc.org/bindn+/ webcite) to make the SVM classifiers accessible to the research community.