Table 1

BioCreAtIvE Data Sets

Set

Number of Sentences

Number of Entities

1 word

2 words

3 words

4 words

> 4 words


training

7500

8876

46.1%

25.7%

14.9%

6.6%

6.6%

devtest

2500

2975

46.6%

23.9%

15.1%

6.7%

7.7%

official test

5000

5949

46.1%

26.7%

14.3%

6.2%

6.7%


This table shows the BioCreAtIvE data including the ratio for the word length, which shows same tendency among sets.

Kinoshita et al. BMC Bioinformatics 2005 6(Suppl 1):S4   doi:10.1186/1471-2105-6-S1-S4

Open Data