Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the BioNLP 08 ACL Workshop: Themes in biomedical language processing

Open Access Research

Distinguishing the species of biomedical named entities for term identification

Xinglong Wang13* and Michael Matthews2

Author Affiliations

1 National Centre for Text Mining, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK

2 School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, UK

3 The work described in this paper was carried out at School of Informatics, University of Edinburgh, UK

For all author emails, please log on.

BMC Bioinformatics 2008, 9(Suppl 11):S6  doi:10.1186/1471-2105-9-S11-S6

Published: 19 November 2008

Abstract

Background

Term identification is the task of grounding ambiguous mentions of biomedical named entities in text to unique database identifiers. Previous work on term identification has focused on studying species-specific documents. However, full-length articles often describe entities across a number of species, in which case resolving the ambiguity of model organisms in entities is critical to achieving accurate term identification.

Results

We developed and compared a number of rule-based and machine-learning based approaches to resolving species ambiguity in mentions of biomedical named entities, and demonstrated that a hybrid method achieved the best overall accuracy at 71.7%, as tested on the gold-standard ITI-TXM corpora. By utilising the species information predicted by the hybrid tagger, our rule-based term identification system was improved significantly by up to 11.6%.

Conclusion

This paper shows that, in the context of identifying terms involving multiple model organisms, integration of an accurate species disambiguation system can significantly improve the performance of term identification systems.