Benchmark of BioWord and MEME motif discovery against E. coli transcription factor binding sites downloaded from the Prodoric database [,]. Each binding site was expanded 50 bp on each side using adjacent E. coli genome sequence to generate motif discovery input data. Motif discovery results for BioWord are from the greedy search algorithm. MEME searches were conducted using the San Diego Supercomputing Center (SDSC) MEME web service. For both MEME and BioWord, parameters were made as similar as possible: Prodoric site length, one site per sequence, search given strand only, 3 reported motifs. In BioWord, the iteration number was set to 100. For both methods, the motif shown corresponds to the best fit with the Prodoric motif. The transcription factor (TF) and length of its binding motif are provided in the leftmost columns. In each block, the number of sites (available in the database or reported by the method), the consensus logo and the information content (IC) of the motif are shown. The rank of the best-fitting motif (based on e-value for MEME, information content for BioWord) among the three reported motifs is also indicated. All logos are in the same scale, with cell height corresponding to 2 bits of information. Input sequences for motif discovery and site sequences for all reported motifs can be found in Additional file 1 .
Anzaldi et al. BMC Bioinformatics 2012 13:124 doi:10.1186/1471-2105-13-124