Table 2

Relevance of inter-dictionary ambiguities for mining MEDLINE (amb.: ambiguous). The column 'nb. found abstracts' contains the number of MEDLINE abstracts (from within a set of approx. 7 million abstracts) that contain at least one gene/protein name of the respective organisms. The values in the other columns are percentages of the values in the column 'nb. found abstracts'.


nb. found abstracts
% amb. abstracts
% amb.+ unique synonym
% amb.+ unique organism
% amb.+ unique synonym or organism

human-mouse
2 761 987
60.5
23.1
37.8
46.5
human-rat
2 238 212
64.5
27.2
43.5
52.1
mouse-rat
2 532 682
58.2
24.2
17.1
33.7

Fundel and Zimmer BMC Bioinformatics 2006 7:372   doi:10.1186/1471-2105-7-372