Table 2

Relevance of inter-dictionary ambiguities for mining MEDLINE (amb.: ambiguous). The column 'nb. found abstracts' contains the number of MEDLINE abstracts (from within a set of approx. 7 million abstracts) that contain at least one gene/protein name of the respective organisms. The values in the other columns are percentages of the values in the column 'nb. found abstracts'.

nb. found abstracts

% amb. abstracts

% amb.+ unique synonym

% amb.+ unique organism

% amb.+ unique synonym or organism


human-mouse

2 761 987

60.5

23.1

37.8

46.5

human-rat

2 238 212

64.5

27.2

43.5

52.1

mouse-rat

2 532 682

58.2

24.2

17.1

33.7


Fundel and Zimmer BMC Bioinformatics 2006 7:372   doi:10.1186/1471-2105-7-372

Open Data