Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Second International Symposium on Semantic Mining in Biomedicine (SMBM)

Open Access Proceedings

Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches

Sampo Pyysalo1*, Tapio Salakoski1, Sophie Aubin2* and Adeline Nazarenko2

Author Affiliations

1 Turku Centre for Computer Science (TUCS) and University of Turku, Lemminkäisenkatu 14 A, 20520 Turku, Finland

2 LIPN, Université Paris 13 & CNRS UMR 7030, 99, av. J.-B. Clément, F-93430 Villetaneuse, France

For all author emails, please log on.

BMC Bioinformatics 2006, 7(Suppl 3):S2  doi:10.1186/1471-2105-7-S3-S2

Published: 24 November 2006

Abstract

Background

We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches.

Results

In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error.

Conclusion

When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license.