Table 1

BioCreAtIvE Data Sets

Set
Number of Sentences
Number of Entities
1 word
2 words
3 words
4 words
> 4 words

training
7500
8876
46.1%
25.7%
14.9%
6.6%
6.6%
devtest
2500
2975
46.6%
23.9%
15.1%
6.7%
7.7%
official test
5000
5949
46.1%
26.7%
14.3%
6.2%
6.7%

This table shows the BioCreAtIvE data including the ratio for the word length, which shows same tendency among sets.

Kinoshita et al. BMC Bioinformatics 2005 6(Suppl 1):S4   doi:10.1186/1471-2105-6-S1-S4