Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

Christopher Funk1*, William Baumgartner1, Benjamin Garcia12, Christophe Roeder1, Michael Bada1, K Bretonnel Cohen1, Lawrence E Hunter1 and Karin Verspoor34*

Author Affiliations

1 Computational Bioscience Program, U. of Colorado School of Medicine, Aurora, CO 80045, USA

2 Center for Genes, Environment, and Health, National Jewish Health, Denver, CO 80206, USA

3 Victoria Research Lab, National ICT Australia, Melbourne 3010, Australia

4 Computing and Information Systems Department, University of Melbourne, Melbourne 3010, Australia

For all author emails, please log on.

BMC Bioinformatics 2014, 15:59  doi:10.1186/1471-2105-15-59

Published: 26 February 2014

Abstract

Background

Ontological concepts are useful for many different biomedical tasks. Concepts are difficult to recognize in text due to a disconnect between what is captured in an ontology and how the concepts are expressed in text. There are many recognizers for specific ontologies, but a general approach for concept recognition is an open problem.

Results

Three dictionary-based systems (MetaMap, NCBO Annotator, and ConceptMapper) are evaluated on eight biomedical ontologies in the Colorado Richly Annotated Full-Text (CRAFT) Corpus. Over 1,000 parameter combinations are examined, and best-performing parameters for each system-ontology pair are presented.

Conclusions

Baselines for concept recognition by three systems on eight biomedical ontologies are established (F-measures range from 0.14–0.83). Out of the three systems we tested, ConceptMapper is generally the best-performing system; it produces the highest F-measure of seven out of eight ontologies. Default parameters are not ideal for most systems on most ontologies; by changing parameters F-measure can be increased by up to 0.4. Not only are best performing parameters presented, but suggestions for choosing the best parameters based on ontology characteristics are presented.