Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the BioNLP 08 ACL Workshop: Themes in biomedical language processing

Open Access Research

Disambiguation of biomedical text using diverse sources of information

Mark Stevenson1*, Yikun Guo1, Robert Gaizauskas1 and David Martinez2

Author Affiliations

1 Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK

2 NICTA Victoria and Department of Computer Science, Software Engineering, University of Melbourne, Victoria 3010, Australia

For all author emails, please log on.

BMC Bioinformatics 2008, 9(Suppl 11):S7  doi:10.1186/1471-2105-9-S11-S7

Published: 19 November 2008

Abstract

Background

Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of biomedical texts. Previous approaches to resolving this problem have made use of various sources of information including linguistic features of the context in which the ambiguous term is used and domain-specific resources, such as UMLS.

Materials and methods

We compare various sources of information including ones which have been previously used and a novel one: MeSH terms. Evaluation is carried out using a standard test set (the NLM-WSD corpus).

Results

The best performance is obtained using a combination of linguistic features and MeSH terms. Performance of our system exceeds previously published results for systems evaluated using the same data set.

Conclusion

Disambiguation of biomedical terms benefits from the use of information from a variety of sources. In particular, MeSH terms have proved to be useful and should be used if available.