Log on / register
Feedback | Support | My details

This article is part of the supplement: A critical assessment of text mining methods in molecular biology

Open AccessReport

Recognition of protein/gene names from text using an ensemble of classifiers

GuoDong Zhou1 email, Dan Shen1,2 email, Jie Zhang1,2 email, Jian Su1 email and SoonHeng Tan1 email

1Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore

2School of Computing, the National Univ. of Singapore, 119610, Singapore

author email corresponding author email

BMC Bioinformatics 2005, 6(Suppl 1):S7doi:10.1186/1471-2105-6-S1-S7

Published: 24 May 2005

Abstract

This paper proposes an ensemble of classifiers for biomedical name recognition in which three classifiers, one Support Vector Machine and two discriminative Hidden Markov Models, are combined effectively using a simple majority voting strategy. In addition, we incorporate three post-processing modules, including an abbreviation resolution module, a protein/gene name refinement module and a simple dictionary matching module, into the system to further improve the performance. Evaluation shows that our system achieves the best performance from among 10 systems with a balanced F-measure of 82.58 on the closed evaluation of the BioCreative protein/gene name recognitiontask (Task 1A).


© 1999-2008 BioMed Central Ltd unless otherwise stated