Table 1

Counts of annotations
terminology # total annotations average # annotations per article median # annotations per article minimum # annotations per article maximum # annotations per article
ChEBI 8,137 121 94 11 486
CL 5,760 86 58 0 435
Entrez Gene 12,277 183 155 3 543
GO BPa 16,184 241 194 14 738
GO CC 8,354/4,707b 125/70 97/51 9/0 499/322
GO MF 4,062 61 42 2 403
NCBITaxonc 7,449 111 91 12 378
PRO 15,594 233 207 4 704
SOd 22,090 330 328 72 935
all 99,907 1,491e

aWe are still in the process of reviewing and editing the GO BP & MF annotations for the official 1.0 version release; therefore, the statistics for these will likely change. We will update annotation statistics on the project Web site as needed.

bWe have calculated statistics for the GO CC project both with and without the annotations of cell (GO:0005623), as these account for over half of the annotations of this project. In addition to skewing these statistics, since this is such a trivial concept that is also being annotated in the CL project, users may wish to exclude these annotations for training and evaluation of systems.

cIn addition to the hundreds of thousands of organism entries, the NCBI Taxonomy also has a small taxonomy of types of biological taxa (e.g., phylum, genus, subgenus). For the NCBI Taxonomy pass, there are also a small number of annotations of the mentions of these taxonomic concepts in the articles; however, we have excluded these in these statistics.

dFor the SO statistics, the independent_continuant annotations (as described in the Methodology) were excluded from the analysis.

eThe averages of the total number of annotations per article and of unique concepts per article were calculated simply by adding up the averages for each terminological annotation pass.

Counts of annotations and of average, median, minimum, and maximum counts of annotations per article for the 67 articles constituting the initial public release of the CRAFT Corpus.

Bada et al.

Bada et al. BMC Bioinformatics 2012 13:161   doi:10.1186/1471-2105-13-161

Open Data