Log on / register
Feedback | Support | My details
Open AccessHighly AccessResearch article

New scoring schema for finding motifs in DNA Sequences

Fatemeh Zare-Mirakabad1 email, Hayedeh Ahrabian2 email, Mehdei Sadeghi3,4 email, Abbas Nowzari-Dalini2 email and Bahram Goliaei1 email

Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran

Center of Excellence in Biomathematics, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran

National Institute of Genetic Engineering and Biotechnology, Tehran, Iran

School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran

author email corresponding author email

BMC Bioinformatics 2009, 10:93doi:10.1186/1471-2105-10-93

Published: 20 March 2009

Abstract

Background

Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions.

Results

We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions.

Conclusion

The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple mathematical calculations. By implementing our method on several biological data sets, it can be induced that this method performs better than methods that do not consider dependencies.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.