Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from the BioNLP Shared Task 2011

Open Access Proceedings

The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011

Jin-Dong Kim1*, Ngan Nguyen2, Yue Wang1, Jun'ichi Tsujii3, Toshihisa Takagi4 and Akinori Yonezawa1

Author Affiliations

1 Database Center for Life Science, Research Organization of Information and Science, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, Japan

2 Department of Information science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan

3 Microsoft Research Asia, 5 Dan Ling Street, Haidian District, Beijing, China

4 Department of Computational Biology, University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba, Japan

For all author emails, please log on.

BMC Bioinformatics 2012, 13(Suppl 11):S1  doi:10.1186/1471-2105-13-S11-S1

Published: 26 June 2012

Abstract

Background

The Genia task, when it was introduced in 2009, was the first community-wide effort to address a fine-grained, structural information extraction from biomedical literature. Arranged for the second time as one of the main tasks of BioNLP Shared Task 2011, it aimed to measure the progress of the community since 2009, and to evaluate generalization of the technology to full text papers. The Protein Coreference task was arranged as one of the supporting tasks, motivated from one of the lessons of the 2009 task that the abundance of coreference structures in natural language text hinders further improvement with the Genia task.

Results

The Genia task received final submissions from 15 teams. The results show that the community has made a significant progress, marking 74% of the best F-score in extracting bio-molecular events of simple structure, e.g., gene expressions, and 45% ~ 48% in extracting those of complex structure, e.g., regulations. The Protein Coreference task received 6 final submissions. The results show that the coreference resolution performance in biomedical domain is lagging behind that in newswire domain, cf. 50% vs. 66% in MUC score. Particularly, in terms of protein coreference resolution the best system achieved 34% in F-score.

Conclusions

Detailed analysis performed on the results improves our insight into the problem and suggests the directions for further improvements.