Table 2

Main features used by the participating teams. The table shows the features and strategies adopted by the different participants and the number of users.

Characteristics (C), resources (R) and methods (M)


(C) Sentence level (retrieval unit)


(C) Paragraph level (retrieval unit)


(C) Full article processed


(C) Full article processed except methods section


(C) Only abstract processed


(C) GO term – Protein distance


(M) Stemming


(M) POS tagging


(M) Shallow parsing


(M) Finite state automata


(M) Edit distance ranking


(M) Vector space model


(M) Machine learning technique


(M) Support Vector Machines


(M) Naïve Bayes models


(M) N-gram models


(M) External resource – tool: GATE NLP tool


(M) External resource – tool: Morphological normalizer BioMorpher


(M) External resource – tool: qtile query based ranking tool


(M) External resource – tool: Grok POS tagger


(M) Heuristic rules


(M) Regular expressions/pattern matching


(M) Literal string matching


(R) Protein name aliases (link to external databases)


(R) GO terms used


(R) GOA data used


(R) GO term forming words/tokens


(R) GO term variants


(R) External resource – data: Dictionary of suffixes


(R) External resource – data: UMLS/MeSH dictionary


(R) External resource – data: HUGO database


(R) External resource – data: SGD database


(R) External resource – data: MGI database


(R) External resource – data: RGD database


(R) External resource – data: TAIR database


(R) External resource – data: Procter and Gamble protein synoyms


Blaschke et al. BMC Bioinformatics 2005 6(Suppl 1):S16   doi:10.1186/1471-2105-6-S1-S16

Open Data