Scoring functions for transcription factor binding site prediction
Institute of Computational Science, ETH, 8092 Zurich, Switzerland
BMC Bioinformatics 2005, 6:84 doi:10.1186/1471-2105-6-84Published: 4 April 2005
Transcription factor binding site (TFBS) prediction is a difficult problem, which requires a good scoring function to discriminate between real binding sites and background noise. Many scoring functions have been proposed in the literature, but it is difficult to assess their relative performance, because they are implemented in different software tools using different search methods and different TFBS representations.
Here we compare how several scoring functions perform on both real and semi-simulated data sets in a common test environment. We have also developed two new scoring functions and included them in the comparison. The data sets are from the yeast (S. cerevisiae) genome.
Our new scoring function LLBG (least likely under the background model) performs best in this study. It achieves the best average rank for the correct motifs. Scoring functions based on positional bias performed quite poorly in this study.
LLBG may provide an interesting alternative to current scoring functions for TFBS prediction.