Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features

Ya-Nan Zhang1, Dong-Jun Yu2, Shu-Sen Li1, Yong-Xian Fan1, Yan Huang3* and Hong-Bin Shen1*

Author affiliations

1 Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China

2 School of Computer Science, Nanjing University of Science and Technology, 200 Xiaolingwei Road, Nanjing 210094, China

3 National Laboratory for Infrared Physics, Shanghai Institute of Technical Physics, Chinese Academy of Science, Shanghai 200083, China

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2012, 13:118  doi:10.1186/1471-2105-13-118

Published: 31 May 2012

Abstract

Background

Adenosine-5′-triphosphate (ATP) is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex.

Results

In this paper, we propose a novel framework for predicting the proteins’ functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target’s knowledge and multiple sources of descriptors can be complementary.

Conclusions

Prediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.

Keywords:
Protein-ATP binding site prediction; Position specific position matrix; Bi-profile sampling; Cross-validation