This article is part of the supplement: Selected articles from The Second Workshop on Data Mining of Next-Generation Sequencing in conjunction with the 2012 IEEE International Conference on Bioinformatics and Biomedicine
Provenance in bioinformatics workflows
1 Department of Computer Science, University of Brasilia - UnB, Brasilia, Brazil
2 Department of Informatics, Pontificial Catholic University - PUC/RJ, Rio de Janeiro, Brazil
BMC Bioinformatics 2013, 14(Suppl 11):S6 doi:10.1186/1471-2105-14-S11-S6Published: 4 November 2013
In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine.