SIGffRid: A tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics
1 Laboratoire Lorrain de Recherche en Informatique et ses Applications, Campus Scientifique, B.P. 239, UMR CNRS-INPL-INRIA-Nancy 2-UHP 7503, 54506 Vandœuvre-lès-Nancy, France
2 Unité Mathématique, Informatique et Génome, INRA, 78350 Jouy-en-Josas, France
3 Laboratoire de Génétique et Microbiologie, UMR INRA 1128, IFR 110, Université Henri Poincaré, B.P. 239, 54506 Vandœuvre-lès-Nancy, France
4 Laboratoire d'Informatique Fondamentale de Lille, UMR USTL-CNRS 8022, 59655 Villeneuve d'Ascq, France
BMC Bioinformatics 2008, 9:73 doi:10.1186/1471-2105-9-73Published: 31 January 2008
Many programs have been developed to identify transcription factor binding sites. However, most of them are not able to infer two-word motifs with variable spacer lengths. This case is encountered for RNA polymerase Sigma (σ) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Our goal is to design an algorithm detecting SFBS by using combinational and statistical constraints deduced from biological observations.
We describe a new approach to identify SFBSs by comparing two related bacterial genomes. The method, named SIGffRid (SIGma Factor binding sites Finder using R'MES to select Input Data), performs a simultaneous analysis of pairs of promoter regions of orthologous genes. SIGffRid uses a prior identification of over-represented patterns in whole genomes as selection criteria for potential -35 and -10 boxes. These patterns are then grouped using pairs of short seeds (of which one is possibly gapped), allowing a variable-length spacer between them. Next, the motifs are extended guided by statistical considerations, a feature that ensures a selection of motifs with statistically relevant properties. We applied our method to the pair of related bacterial genomes of Streptomyces coelicolor and Streptomyces avermitilis. Cross-check with the well-defined SFBSs of the SigR regulon in S. coelicolor is detailed, validating the algorithm. SFBSs for HrdB and BldN were also found; and the results suggested some new targets for these σ factors. In addition, consensus motifs for BldD and new SFBSs binding sites were defined, overlapping previously proposed consensuses. Relevant tests were carried out also on bacteria with moderate GC content (i.e. Escherichia coli/Salmonella typhimurium and Bacillus subtilis/Bacillus licheniformis pairs). Motifs of house-keeping σ factors were found as well as other SFBSs such as that of SigW in Bacillus strains.
We demonstrate that our approach combining statistical and biological criteria was successful to predict SFBSs. The method versatility autorizes the recognition of other kinds of two-box regulatory sites.