Log on / register
Feedback | Support | My details
Open AccessHighly AccessMethodology article

Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

Carlos Rodríguez-Penagos email, Heladia Salgado email, Irma Martínez-Flores email and Julio Collado-Vides email

Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Apdo. Postal 565-A, Avenida Universidad, Cuernavaca, Morelos, 62100, Mexico

author email corresponding author email

BMC Bioinformatics 2007, 8:293doi:10.1186/1471-2105-8-293

Published: 7 August 2007

Abstract

Background

Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12.

Results

Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners.

Conclusion

Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.


© 1999-2008 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.