Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the Drosophila genome: the fluffy-tail test

Irina Abnizova1*, Rene te Boekhorst2, Klaudia Walter1 and Walter R Gilks1

Author Affiliations

1 MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK

2 Computer Science Department, University of Hertfordshire, College Lane, AL10 92BA, Hatfield Campus, UK

For all author emails, please log on.

BMC Bioinformatics 2005, 6:109  doi:10.1186/1471-2105-6-109

Published: 27 April 2005

Abstract

Background

This paper addresses the problem of recognising DNA cis-regulatory modules which are located far from genes. Experimental procedures for this are slow and costly, and computational methods are hard, because they lack positional information.

Results

We present a novel statistical method, the "fluffy-tail test", to recognise regulatory DNA. We exploit one of the basic informational properties of regulatory DNA: abundance of over-represented transcription factor binding site (TFBS) motifs, although we do not look for specific TFBS motifs, per se . Though overrepresentation of TFBS motifs in regulatory DNA has been intensively exploited by many algorithms, it is still a difficult problem to distinguish regulatory from other genomic DNA.

Conclusion

We show that, in the data used, our method is able to distinguish cis-regulatory modules by exploiting statistical differences between the probability distributions of similar words in regulatory and other DNA. The potential application of our method includes annotation of new genomic sequences and motif discovery.