This article is part of the supplement: Proceedings of the Second International Symposium on Languages in Biology and Medicine (LBM) 2007

Open Access Proceedings

New challenges for text mining: mapping between text and manually curated pathways

Kanae Oda1, Jin-Dong Kim1, Tomoko Ohta1, Daisuke Okanohara1, Takuya Matsuzaki1, Yuka Tateisi2 and Jun'ichi Tsujii134*

Author Affiliations

1 Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan

2 Faculty of Informatics, Kogakuin University, 1-24-2 Nishi-shinjuku, Shinjuku-ku, Tokyo, Japan

3 School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK

4 National Centre for Text Mining, 131 Princess Street, Manchester, M1 7DN, UK

For all author emails, please log on.

BMC Bioinformatics 2008, 9(Suppl 3):S5  doi:10.1186/1471-2105-9-S3-S5

Published: 11 April 2008



Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge.


To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus.


We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text.