Table 1

Prediction accuracy for different protein sequence representations based on 10-fold cross validation tests.

Feature representation

Classifier1 Feature selection2

FlexRP (Logistic Regression)

SVM

C4.5

IB1

Naïve Bayes


Composition vector

N/A

67.37%

68.74%

57.70%

57.33%

65.20%

PSI-BLAST profile

N/A

66.38%

67.35%

62.47%

61.62%

66.24%

Binary encoding

No selection

66.38%

66.06%

58.82%

59.92%

61.84%

Binary encoding

Linear coefficient

69.58%

68.74%

62.82%

57.05%

69.10%

Binary encoding

Entropy based

69.19%

68.74%

63.24%

58.21%

69.00%

K-spaced AA pairs

Linear coefficient

74.37%

74.60%

66.04%

68.74%

72.97%

K-spaced AA pairs

Entropy based

79.51%3

78.46%

66.25%

66.93%

76.01%


1The tested classifiers include the proposed FlexRP method, Support Vector Machine (SVM), decision tree (C4.5), instance-based learner (IB1), and Naïve Bayes.

2 The sequence representations based on binary codes and frequencies of the k-spaced amino acid pairs were processed using two feature selection methods.

3 The best result is shown in bold.

Chen et al. BMC Structural Biology 2007 7:25   doi:10.1186/1472-6807-7-25

Open Data