Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from The Second Workshop on Data Mining of Next-Generation Sequencing in conjunction with the 2012 IEEE International Conference on Bioinformatics and Biomedicine

Open Access Research article

Provenance in bioinformatics workflows

Renato de Paula1, Maristela Holanda1*, Luciana SA Gomes2, Sergio Lifschitz2 and Maria Emilia MT Walter1

Author Affiliations

1 Department of Computer Science, University of Brasilia - UnB, Brasilia, Brazil

2 Department of Informatics, Pontificial Catholic University - PUC/RJ, Rio de Janeiro, Brazil

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 11):S6  doi:10.1186/1471-2105-14-S11-S6

Published: 4 November 2013

Abstract

In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine.