Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Efficacy of different protein descriptors in predicting protein functional families

Serene AK Ong1, Hong Huang Lin1, Yu Zong Chen1, Ze Rong Li2 and Zhiwei Cao3*

Author Affiliations

1 Department of Pharmacy, National University of Singapore, Blk S16, Level 8, 08-14, 3 Science Drive 2, Singapore 117543, Singapore

2 College of Chemistry, Sichuan University, Chengdu, 610064, P.R. China

3 Shanghai Center for Bioinformatics Technology, 100, Qinzhou Road, Shanghai 200235 P.R. China

For all author emails, please log on.

BMC Bioinformatics 2007, 8:300  doi:10.1186/1471-2105-8-300

Published: 17 August 2007

Abstract

Background

Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimization algorithm, and to examine whether the combined use of these descriptor-sets help to improve predictive performance. Six individual descriptor-sets and four combination-sets were evaluated in support vector machines (SVM) prediction of six protein functional families.

Results

The performance of these descriptor-sets were ranked by Matthews correlation coefficient (MCC), and categorized into two groups based on their performance. While there is no overwhelmingly favourable choice of descriptor-sets, certain trends were found. The combination-sets tend to give slightly but consistently higher MCC values and thus overall best performance such that three out of four combination-sets show slightly better performance compared to one out of six individual descriptor-sets.

Conclusion

Our study suggests that currently used descriptor-sets are generally useful for classifying proteins and the prediction performance may be enhanced by exploring combinations of descriptors.