Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Workshop on Advances in Bio Text Mining

Open Access Poster presentation

Medie and Info-pubmed: 2010 update

Tomoko Ohta1*, Takuya Matsuzaki1, Naoaki Okazaki1, Makoto Miwa1, Rune Sætre1, Sampo Pyysalo1 and Jun’ichi Tsujii123

Author Affiliations

1 Department of Computer Science, University of Tokyo, Tokyo, Japan

2 School of Computer Science, University of Manchester, Manchester, UK

3 National Centre for Text Mining, University of Manchester, Manchester, UK

For all author emails, please log on.

BMC Bioinformatics 2010, 11(Suppl 5):P7  doi:10.1186/1471-2105-11-S5-P7

The electronic version of this article is the complete one and can be found online at:

Published:6 October 2010

© 2010 Ohta et al; licensee BioMed Central Ltd.


In the recent decades, high-throughput screening methods were established, bringing forth major breakthroughs in the fields of molecular biology and biomedicine. Since researchers in these fields need to interpret an enormous quantity of data and the publication rates of scientific articles are exploding, demands on text mining technology are growing with each passing year.

Medie ( webcite) and Info-pubmed ( webcite) were developed as a response to these information needs. Medie is a general-purpose integrated Pubmed search engine and Info-pubmed is a targeted system for finding information about the interactions of key biomedical entities.

In this work, the first update of these systems since their introduction, we present multiple extensions of the systems based on recent advances in biomedical text mining.

Extensions of Medie and Info-pubmed

Medie and Info-pubmed are based on deep syntactic analysis of sentence structure. To allow users to take advantage of the latest parsing technology, the current release integrates an improved parser [1].

In an extension of semantic search capabilities, the updated Medie system incorporates extended ontology-based search that allows the query verb to be replaced by any GENIA event ontology ( webcite) term. Such searches are expanded to the set of verbs annotated as expressing the given event in GENIA corpus [2]: for example, a search for Positive regulation will now match activate, induce, etc.

To allow more focused searches, we incorporated the section labeling method of Hirohata et al. [3], creating search options limiting queries to specific types of sentences such as methods, results and conclusions. The indexing system and search options were further augmented with Pubmed annotation metadata, allowing searches to be limited by MeSH terms, author, or journal.

The initial release of Info-pubmed implemented search for automatically detected protein-protein interactions. We have extended this search capability to include gene-disease associations [4], allowing the system to be used also to study the epidemiological connections of biomolecules.

Finally, we have extended the coverage of both systems to the entire PubMed and added scheduled update modules that perform daily updates of the system database, fully automating data access, analysis and indexing.

Figure 1 shows an example search result on Medie illustrating a number of the newly introduced functions.

thumbnailFigure 1. Snapshot of updated Medie: “What disease does dystrophin cause?”


We have introduced extended and updated functionality for Medie and Info-pubmed, search systems integrating state-of-the-art text mining technology. The updates allow advanced semantic searches of the latest published information in all of Pubmed.


  1. Ninomiya T, Matsuzaki T, Miyao Y, Tsujii J: A log-linear model with an n-gram reference distribution for accurate HPSG parsing.

    Proceedings of IWPT 2007 2007.

    Prague, Czech Republic


  2. Kim JD, Ohta T, Tsujii J: Corpus annotation for mining biomedical events from literature.

    BMC Bioinformatics 2008, 9:10.

    [ISSN 1471-2105]

    PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  3. Hirohata K, Okazaki N, Ananiadou S, Ishizuka M: Identifying Sections in Scientific Abstracts using Conditional Random Fields.

    Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India 2008, 381-388. OpenURL

  4. Chun Hw, Tsuruoka Y, Kim JD, Shiba R, Nagata N, Hishiki T, Tsujii J: Extraction of Gene-Disease Relations from MedLine using Domain Dictionaries and Machine Learning.

    Proceedings of ThePacific Symposium on Biocomputing (PSB), Maui, Hawaii, USA 2006, 4-15. OpenURL