Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Simultaneous prediction of transcription factor binding sites in a group of prokaryotic genomes

Shaoqiang Zhang12, Shan Li1, Phuc T Pham1 and Zhengchang Su1*

Author Affiliations

1 Department of Bioinformatics and Genomics, Center for Bioinformatics Research, the University of North Carolina at Charlotte, Charlotte, NC 28223, USA

2 College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China

For all author emails, please log on.

BMC Bioinformatics 2010, 11:397  doi:10.1186/1471-2105-11-397

Published: 23 July 2010



Our current understanding of transcription factor binding sites (TFBSs) in sequenced prokaryotic genomes is very limited due to the lack of an accurate and efficient computational method for the prediction of TFBSs at a genome scale. In an attempt to change this situation, we have recently developed a comparative genomics based algorithm called GLECLUBS for de novo genome-wide prediction of TFBSs in a target genome. Although GLECLUBS has achieved rather high prediction accuracy of TFBSs in a target genome, it is still not efficient enough to be applied to all the sequenced prokaryotic genomes.


Here, we designed a new algorithm based on GLECLUBS called extended GLECLUBS (eGLECLUBS) for simultaneous prediction of TFBSs in a group of related prokaryotic genomes. When tested on a group of γ-proteobacterial genomes including E. coli K12, a group of firmicutes genomes including B. subtilis and a group of cyanobacterial genomes using the same parameter settings, eGLECLUBS predicts more than 82% of known TFBSs in extracted inter-operonic sequences in both E. coli K12 and B. subtilis. Because each genome in a group is equally treated, it is highly likely that similar prediction accuracy has been achieved for each genome in the group.


We have developed a new algorithm for genome-wide de novo prediction of TFBSs in a group of related prokaryotic genomes. The algorithm has achieved the same level of accuracy and robustness as its predecessor GLECLUBS, but can work on dozens of genomes at the same time.