Table 3

Prediction performance on molecular function classes, over the cross-validation dataset.

Function

#

Training Proteins

#

Test Proteins

Text-KNN

Base-Prior

Base-Seq


P

R

F

P

R

F

P

R

F


GO:0005488

10720

2680

0.65

0.88

0.75

0.63

0.64

0.63

0.67

0.75

0.71


GO:0003824

2943

736

0.52

0.23

0.32

0.16

0.15

0.15

0.38

0.29

0.33


GO:0030528

1276

319

0.44

0.24

0.31

0.07

0.07

0.07

0.49

0.37

0.42


GO:0005215

782

196

0.59

0.38

0.46

0.04

0.04

0.04

0.50

0.43

0.46


GO:0060089

738

184

0.39

0.16

0.22

0.04

0.04

0.04

0.26

0.27

0.27


GO:0030234

485

121

0.43

0.05

0.08

0.03

0.03

0.03

0.16

0.09

0.12


GO:0005198

334

84

0.04

0.01

0.01

0.02

0.02

0.02

0.11

0.11

0.11


GO:0016247

58

14

0.60

0.24

0.35

0.01

0.01

0.01

0.00

0.00

0.00


GO:0009055

54

14

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00


GO:0045182

21

5

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00


The text-based classifier, Text-KNN, is compared with two baselines: Base-Prior, and Base-Seq. The columns P, R, and F refer, respectively, to the Precision, Recall, and F-measure of the classifier over individual GO categories. A precision and recall values of 0 on a class indicates that all the proteins belonging to that class are misclassified into another class.

Wong and Shatkay BMC Bioinformatics 2013 14(Suppl 3):S14   doi:10.1186/1471-2105-14-S3-S14

Open Data