Log on / register
Feedback | Support | My details
Open AccessMethodology article

SIGffRid: A tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics

Fabrice Touzain1 email, Sophie Schbath2 email, Isabelle Debled-Rennesson1 email, Bertrand Aigle3 email, Gregory Kucherov4 email and Pierre Leblond3 email

Laboratoire Lorrain de Recherche en Informatique et ses Applications, Campus Scientifique, B.P. 239, UMR CNRS-INPL-INRIA-Nancy 2-UHP 7503, 54506 Vandœuvre-lès-Nancy, France

Unité Mathématique, Informatique et Génome, INRA, 78350 Jouy-en-Josas, France

Laboratoire de Génétique et Microbiologie, UMR INRA 1128, IFR 110, Université Henri Poincaré, B.P. 239, 54506 Vandœuvre-lès-Nancy, France

Laboratoire d'Informatique Fondamentale de Lille, UMR USTL-CNRS 8022, 59655 Villeneuve d'Ascq, France

author email corresponding author email

BMC Bioinformatics 2008, 9:73doi:10.1186/1471-2105-9-73

Published: 31 January 2008

Abstract

Background

Many programs have been developed to identify transcription factor binding sites. However, most of them are not able to infer two-word motifs with variable spacer lengths. This case is encountered for RNA polymerase Sigma (σ) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Our goal is to design an algorithm detecting SFBS by using combinational and statistical constraints deduced from biological observations.

Results

We describe a new approach to identify SFBSs by comparing two related bacterial genomes. The method, named SIGffRid (SIGma Factor binding sites Finder using R'MES to select Input Data), performs a simultaneous analysis of pairs of promoter regions of orthologous genes. SIGffRid uses a prior identification of over-represented patterns in whole genomes as selection criteria for potential -35 and -10 boxes. These patterns are then grouped using pairs of short seeds (of which one is possibly gapped), allowing a variable-length spacer between them. Next, the motifs are extended guided by statistical considerations, a feature that ensures a selection of motifs with statistically relevant properties. We applied our method to the pair of related bacterial genomes of Streptomyces coelicolor and Streptomyces avermitilis. Cross-check with the well-defined SFBSs of the SigR regulon in S. coelicolor is detailed, validating the algorithm. SFBSs for HrdB and BldN were also found; and the results suggested some new targets for these σ factors. In addition, consensus motifs for BldD and new SFBSs binding sites were defined, overlapping previously proposed consensuses. Relevant tests were carried out also on bacteria with moderate GC content (i.e. Escherichia coli/Salmonella typhimurium and Bacillus subtilis/Bacillus licheniformis pairs). Motifs of house-keeping σ factors were found as well as other SFBSs such as that of SigW in Bacillus strains.

Conclusion

We demonstrate that our approach combining statistical and biological criteria was successful to predict SFBSs. The method versatility autorizes the recognition of other kinds of two-box regulatory sites.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.