Email updates

Keep up to date with the latest news and content from BMC Medical Informatics and Decision Making and BioMed Central.

This article is part of the supplement: Proceedings of the ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics (DTMBio 2011)

Open Access Proceedings

Semantic text mining support for lignocellulose research

Marie-Jean Meurs12*, Caitlin Murphy23, Ingo Morgenstern23, Greg Butler12, Justin Powlowski24, Adrian Tsang23 and René Witte1

Author Affiliations

1 Department of Computer Science and Software Engineering, Concordia University, Montréal, QC, Canada

2 Centre for Structural and Functional Genomics, Concordia University, Montréal, QC, Canada

3 Department of Biology, Concordia University, Montreal, QC, Canada

4 Department of Chemistry and Biochemistry, Concordia University, Montréal, QC, Canada

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2012, 12(Suppl 1):S5  doi:10.1186/1472-6947-12-S1-S5

Published: 30 April 2012



Biofuels produced from biomass are considered to be promising sustainable alternatives to fossil fuels. The conversion of lignocellulose into fermentable sugars for biofuels production requires the use of enzyme cocktails that can efficiently and economically hydrolyze lignocellulosic biomass. As many fungi naturally break down lignocellulose, the identification and characterization of the enzymes involved is a key challenge in the research and development of biomass-derived products and fuels. One approach to meeting this challenge is to mine the rapidly-expanding repertoire of microbial genomes for enzymes with the appropriate catalytic properties.


Semantic technologies, including natural language processing, ontologies, semantic Web services and Web-based collaboration tools, promise to support users in handling complex data, thereby facilitating knowledge-intensive tasks. An ongoing challenge is to select the appropriate technologies and combine them in a coherent system that brings measurable improvements to the users. We present our ongoing development of a semantic infrastructure in support of genomics-based lignocellulose research. Part of this effort is the automated curation of knowledge from information on fungal enzymes that is available in the literature and genome resources.


Working closely with fungal biology researchers who manually curate the existing literature, we developed ontological natural language processing pipelines integrated in a Web-based interface to assist them in two main tasks: mining the literature for relevant knowledge, and at the same time providing rich and semantically linked information.