Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Ninth International Conference on Bioinformatics (InCoB2010): Computational Biology

Open Access Proceedings

Algorithms and semantic infrastructure for mutation impact extraction and grounding

Jonas B Laurila1, Nona Naderi2, René Witte2, Alexandre Riazanov1, Alexandre Kouznetsov1 and Christopher JO Baker1*

Author Affiliations

1 Department of Computer Science & Applied Statistics, University of New Brunswick, Saint John, New Brunswick, E2L 4L5, Canada

2 Department of Computer Science & Software Engineering, Concordia University, Montréal, Québec, H3G 1M8, Canada

For all author emails, please log on.

BMC Genomics 2010, 11(Suppl 4):S24  doi:10.1186/1471-2164-11-S4-S24

Published: 2 December 2010

Abstract

Background

Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases.

Results

We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework.

Conclusion

We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.