Table 7

Prediction performance for molecular function classes, over the CAFA evaluation dataset. (The number of proteins in each class is shown below each function header)

Function

Text-KNN

(confidence = 0.95)

CAFA-Prior

(confidence = 0.01)

CAFA-Seq

(confidence = 0.95)

GOtcha

(confidence = 0.95)


P

R

S

P

R

S

P

R

S

P

R

S


binding

(212 proteins)

0.643

0.17

0.87

0.579

1

0.00

0.9

0.085

0.987

0.723

0.16

0.916


transporter activity

(28 proteins)

0.00

0.00

0.97

0.077

1

0.00

0.5

0.036

0.997

0.714

0.179

0.994


catalytic activity

(165 proteins)

0.312

0.03

0.95

0.451

1

0.00

0.714

0.03

0.990

0.917

0.067

0.995


The text-based classifier, Text-KNN, is compared with baseline results provided by the CAFA challenge: CAFA-Prior, CAFA-Seq, and GOtcha. The confidence threshold used for each classifier is shown under its name in the respective column. A confidence threshold of 0.01 is used for CAFA-Prior because the classifier does not make any predictions for the 'transporter activity' class at higher confidence thresholds.

The columns P, R, and S refer, respectively, to the Precision, Recall, and Specificity of the classifiers over individual classes. Precision and recall values of 0 for a class indicate that all the proteins belonging to that class are misclassified (when the confidence score is 0.95). CAFA-Prior always has a specificity value of 0, because it assigns all the proteins to each class, and as such the number of true negatives is always 0.

A specificity value that is close to 1, for a class whose precision and recall are both 0, indicates that most proteins in the dataset are not in the class (true negatives) and are indeed not assigned to the class. A few proteins from other classes are misclassified into the class (false positives), hence the specificity is slightly less than 1.

Wong and Shatkay BMC Bioinformatics 2013 14(Suppl 3):S14   doi:10.1186/1471-2105-14-S3-S14

Open Data