BMC Bioinformatics

official impact factor 3.03

Open Access Research article

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation

Antonio J Jimeno-Yepes1*, Bridget T McInnes2 and Alan R Aronson1

Author Affiliations

1 National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA

2 Department of Pharmacology, University of Minnesota Twin Cities, Minneapolis, MN 55155, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:223 doi:10.1186/1471-2105-12-223

Published: 2 June 2011

Additional files

Additional file 1:

Accuracy per ambiguous word. Medline Freq. is the frequency of the term in MEDLINE up to 23rd July 2010. NB stands for Naïve Bayes, AEC stands for Automatic Extracted Corpus, MRD stands for Machine Readable dictionary, 2-MRD stands for 2nd Order Co-occurrence and JDI stands for Journal Descriptor Indexing. The possible values for type are: A for abbreviations, T for terms and AT for abbreviations/terms.

Format: CSV Size: 10KB Download file

Open Data

Additional file 2:

Semantic Type frequency in the MSH WSD set and Metathesaurus concept count.

Format: CSV Size: 3KB Download file

Open Data

Additional file 3:

Sense frequency and MeSH Heading

Format: CSV Size: 17KB Download file

Open Data

Additional file 4:

Inter semantic types results.

Format: CSV Size: 17KB Download file

Open Data

Additional file 5:

Inter semantic groups results.

Format: CSV Size: 4KB Download file

Open Data