This article is part of the supplement: Second International Symposium on Semantic Mining in Biomedicine (SMBM)
An environment for relation mining over richly annotated corpora: the case of GENIA
1 Institute of Computational Linguistics, IFI, University of Zurich, Switzerland
2 Novartis Pharma AG, Basel, Switzerland
BMC Bioinformatics 2006, 7(Suppl 3):S3 doi:10.1186/1471-2105-7-S3-S3Published: 24 November 2006
The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information.
We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information.
The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation.