This article is part of the supplement: Proceedings of the BioNLP 08 ACL Workshop: Themes in biomedical language processing
Distinguishing the species of biomedical named entities for term identification
1 National Centre for Text Mining, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
2 School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, UK
3 The work described in this paper was carried out at School of Informatics, University of Edinburgh, UK
BMC Bioinformatics 2008, 9(Suppl 11):S6 doi:10.1186/1471-2105-9-S11-S6Published: 19 November 2008
Term identification is the task of grounding ambiguous mentions of biomedical named entities in text to unique database identifiers. Previous work on term identification has focused on studying species-specific documents. However, full-length articles often describe entities across a number of species, in which case resolving the ambiguity of model organisms in entities is critical to achieving accurate term identification.
We developed and compared a number of rule-based and machine-learning based approaches to resolving species ambiguity in mentions of biomedical named entities, and demonstrated that a hybrid method achieved the best overall accuracy at 71.7%, as tested on the gold-standard ITI-TXM corpora. By utilising the species information predicted by the hybrid tagger, our rule-based term identification system was improved significantly by up to 11.6%.
This paper shows that, in the context of identifying terms involving multiple model organisms, integration of an accurate species disambiguation system can significantly improve the performance of term identification systems.