Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Second International Symposium on Semantic Mining in Biomedicine (SMBM)

Open Access Proceedings

An environment for relation mining over richly annotated corpora: the case of GENIA

Fabio Rinaldi1*, Gerold Schneider1, Kaarel Kaljurand1, Michael Hess1 and Martin Romacker2

Author Affiliations

1 Institute of Computational Linguistics, IFI, University of Zurich, Switzerland

2 Novartis Pharma AG, Basel, Switzerland

For all author emails, please log on.

BMC Bioinformatics 2006, 7(Suppl 3):S3  doi:10.1186/1471-2105-7-S3-S3

Published: 24 November 2006

Abstract

Background

The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information.

Results

We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information.

Conclusion

The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation.