Table 6

Evaluation using gene/protein name snippets from MEDLINE abstracts


Dictionary

Lookup performance

Iter.
Ambiguity
Variability
Rule
Precision
Recall

0
5.797
12.479
(convert capital letters to lower case)
0.782
0.582
1
5.807
12.161
‘-’ → ‘’
0.766
0.603
2
5.811
12.025
‘ precursor’ → ‘’
0.767
0.611
3
5.812
11.941
‘,’ → ‘’
0.767
0.611
4
5.812
11.907
‘inc finger protein’ → ‘nf’
0.767
0.611
5
5.812
11.868
‘ isoform 1’ → ‘’
0.767
0.611
6
5.813
11.832
‘ isoform 2’ → ‘’
0.766
0.611
7
5.813
11.806
‘ isoform a’ → ‘’
0.766
0.611
8
5.813
11.781
‘ isoform b’ → ‘’
0.766
0.611
9
5.813
11.748
‘ containing protein’ → ‘containing’
0.766
0.611
10
5.813
11.730
‘ variant’ → ‘’
0.766
0.611
:
:
:
:
:
:
21
5.815
11.597
‘nterleukin’ → ‘l’
0.767
0.613
:
:
:
:
:
:
24
5.816
11.566
‘specific’ → ‘’
0.767
0.615
:
:
:
:
:
:
33
5.816
11.450
‘protein’ → ‘gene’
0.765
0.616
34
5.828
11.056
‘ gene’ → ‘’
0.765
0.619
:
:
:
:
:
:
38
5.829
11.016
‘ recepto’ → ‘’
0.767
0.623
:
:
:
:
:
:
44
5.830
10.970
‘ alph’ → ‘’
0.765
0.625
:
:
:
:
:
:
75
5.831
10.838
‘ i’ → ‘1’
0.766
0.626
:
:
:
:
:
:
84
5.831
10.790
‘ lpha’ → ‘’
0.766
0.627
:
:
:
:
:
:
86
5.831
10.782
‘ beta’ → ‘b’
0.767
0.630
:
:
:
:
:
:
100
5.832
10.732
‘ type’ → ‘’
0.767
0.633

Tsuruoka et al. BMC Bioinformatics 2008 9(Suppl 3):S2   doi:10.1186/1471-2105-9-S3-S2