Table 4

Error analysis

False positives


Cause

Correct extraction

Identified term


1

lexicon

-

protein, binding sites

2

prefix word

trans-acting factor

common trans-acting factor

3

unknown word

-

ATTTGCAT

4

sequential labelling error

-

additional proteins

5

test set error

-

Estradiol receptors


False negatives


Cause

Correct extraction

Identified term


1

anaphoric

(the) receptor, (the) binding sites

-

2

coordination (and, or)

transcription factors NF-kappa B and AP-1

transcription factors NF-kappa B

3

prefix word

activation protein-1

protein-1

catfish STAT

STAT

4

postfix word

nuclear factor kappa B complex

nuclear factor kappa B

5

plural

protein tyrosine kinase(s)

protein tyrosine kinase

6

family name, biding site, and domain

T3 binding sites

-

residues 639–656

-

7

sequential labelling error

PCNA

-

Chloramphenicol acetyltransferase

-

8

test set error

superfamily member

-


Error analysis of the results of the dictionary-based statistical approach.

Sasaki et al. BMC Bioinformatics 2008 9(Suppl 11):S5   doi:10.1186/1471-2105-9-S11-S5

Open Data