Table 7

Dictionary lookup performance. This table shows the speed and accuracy of dictionary lookup tasks using the human gene/protein dictionary and gene/protein name snippets. F-score is the harmonic mean of precision and recall. The values in the parentheses are the threshold values in soft string matching.

Method

Precision

Recall

F-score

Average lookup time (microsecond)


Bigram similariy (0.97)

0.758

0.587

0.661

6.7 × 105

Bigram similariy (0.95)

0.691

0.592

0.638

6.8 × 105

Bigram similariy (0.93)

0.612

0.610

0.611

6.8 × 105

No normalization

0.809

0.502

0.619

7

Case normalization

0.782

0.582

0.666

8

Heuristic normalization [18]

0.730

0.657

0.692

8

Automatic normalization

0.767

0.633

0.694

29


Tsuruoka et al. BMC Bioinformatics 2008 9(Suppl 3):S2   doi:10.1186/1471-2105-9-S3-S2

Open Data