Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters
1 Computational Bioscience Program, U. of Colorado School of Medicine, Aurora, CO 80045, USA
2 Center for Genes, Environment, and Health, National Jewish Health, Denver, CO 80206, USA
3 Victoria Research Lab, National ICT Australia, Melbourne 3010, Australia
4 Computing and Information Systems Department, University of Melbourne, Melbourne 3010, Australia
BMC Bioinformatics 2014, 15:59 doi:10.1186/1471-2105-15-59Published: 26 February 2014
Ontological concepts are useful for many different biomedical tasks. Concepts are difficult to recognize in text due to a disconnect between what is captured in an ontology and how the concepts are expressed in text. There are many recognizers for specific ontologies, but a general approach for concept recognition is an open problem.
Three dictionary-based systems (MetaMap, NCBO Annotator, and ConceptMapper) are evaluated on eight biomedical ontologies in the Colorado Richly Annotated Full-Text (CRAFT) Corpus. Over 1,000 parameter combinations are examined, and best-performing parameters for each system-ontology pair are presented.
Baselines for concept recognition by three systems on eight biomedical ontologies are established (F-measures range from 0.14–0.83). Out of the three systems we tested, ConceptMapper is generally the best-performing system; it produces the highest F-measure of seven out of eight ontologies. Default parameters are not ideal for most systems on most ontologies; by changing parameters F-measure can be increased by up to 0.4. Not only are best performing parameters presented, but suggestions for choosing the best parameters based on ontology characteristics are presented.