Table 4 |
|||
|
Error analysis |
|||
|
False positives |
|||
|
|
|||
|
Cause |
Correct extraction |
Identified term |
|
|
|
|||
|
1 |
lexicon |
- |
protein, binding sites |
|
2 |
prefix word |
trans-acting factor |
common trans-acting factor |
|
3 |
unknown word |
- |
ATTTGCAT |
|
4 |
sequential labelling error |
- |
additional proteins |
|
5 |
test set error |
- |
Estradiol receptors |
|
|
|||
|
False negatives |
|||
|
|
|||
|
Cause |
Correct extraction |
Identified term |
|
|
|
|||
|
1 |
anaphoric |
(the) receptor, (the) binding sites |
- |
|
2 |
coordination (and, or) |
transcription factors NF-kappa B and AP-1 |
transcription factors NF-kappa B |
|
3 |
prefix word |
activation protein-1 |
protein-1 |
|
catfish STAT |
STAT |
||
|
4 |
postfix word |
nuclear factor kappa B complex |
nuclear factor kappa B |
|
5 |
plural |
protein tyrosine kinase(s) |
protein tyrosine kinase |
|
6 |
family name, biding site, and domain |
T3 binding sites |
- |
|
residues 639–656 |
- |
||
|
7 |
sequential labelling error |
PCNA |
- |
|
Chloramphenicol acetyltransferase |
- |
||
|
8 |
test set error |
superfamily member |
- |
|
|
|||
|
Error analysis of the results of the dictionary-based statistical approach. |
|||
|
Sasaki et al. BMC Bioinformatics 2008 9(Suppl 11):S5 doi:10.1186/1471-2105-9-S11-S5 |
|||