Table 8

Prediction performance for biological process classes, over the CAFA evaluation dataset. (The number of proteins in each class is shown below each function header)

Function

Text-KNN

(confidence = 0.75)

CAFA-Prior

(confidence = 0.01)

CAFA-Seq

(confidence = 0.95)

GOtcha

(confidence = 0.14)


P

R

S

P

R

S

P

R

S

P

R

S


biological regulation

(114 proteins)

0.5

0.009

0.997

0.261

1

0

0.632

0.105

0.978

0.404

0.351

0.817


multi-organism process

(29 proteins)

0.00

0.00

0.939

0.067

1

0

0.00

0.00

0.99

0.286

0.069

0.988


localization

(60 proteins)

0.2

0.017

0.989

0.138

1

0

0.44

0.067

0.976

0.297

0.317

0.88


establishment of localization

(38 proteins)

0.25

0.026

0.992

0.087

1

0

0.5

0.105

0.99

0.263

0.395

0.894


response to stimulus

(106 proteins)

0.125

0.009

0.979

0.243

1

0

0.5

0.047

0.985

0.39

0.302

0.848


developmental process

(83 proteins)

0.00

0.00

0.997

0.19

1

0

0.556

0.06

0.989

0.263

0.181

0.881


multicellular organismal process

(87 proteins)

0.069

0.023

0.923

0.2

1

0

0.625

0.115

0.983

0.343

0.264

0.874


signalling

(33 proteins)

0.5

0.03

0.998

0.076

1

0

0.25

0.061

0.985

0.077

0.061

0.94


biological adhesion

(52 proteins)

0.00

0.00

0.971

0.06

1

0

0.00

0.00

0.998

0.00

0.00

0.993


cellular component organization

(64 proteins)

0.00

0.00

0.997

0.147

1

0

0.286

0.031

0.987

0.192

0.156

0.887


cellular process

(368 proteins)

0.857

0.016

0.985

0.844

1

0

0.867

0.071

0.941

0.866

0.829

0.309


metabolic process

(213 proteins)

0.00

0.00

0.991

0.489

1

0

0.588

0.047

0.969

0.633

0.559

0.691


reproduction

(25 proteins)

0.083

0.08

0.946

0.057

1

0

0.00

0.00

0.995

0.214

0.12

0.973


reproductive process

(25 proteins)

0.083

0.08

0.946

0.057

1

0

0.00

0.00

0.995

0.273

0.12

0.981


The text-based classifier, Text-KNN, compared with baseline results provided by the CAFA challenge: CAFA-Prior, CAFA-Seq, and GOtcha. The confidence threshold used for each classifier is shown under its name in the respective column. The confidence threshold for Text-kNN, GOtcha, and CAFA-Prior are, respectively, set at 0.75, 0.14, and 0.01 since these classifiers make no predictions for over 75% of the classes at higher confidence thresholds.

The columns P, R, and S refer, respectively, to the Precision, Recall, and Specificity of the classifier over individual classes. Precision and recall values of 0 for a class indicate that all the proteins belonging to that class are misclassified (at the respective confidence level). CAFA-Prior always has a specificity value of 0, because it assigns all the proteins to each class, and as such the number of true negatives is always 0.

A specificity value that is close to 1, for a class whose precision and recall are both 0, indicates that most proteins in the dataset are not in the class (true negatives) and are indeed not assigned to the class. A few proteins from other classes are misclassified into the class (false positives), hence the specificity is slightly less than 1.

Wong and Shatkay BMC Bioinformatics 2013 14(Suppl 3):S14   doi:10.1186/1471-2105-14-S3-S14

Open Data