Log on / register
Feedback | Support | My details
Open AccessResearch article

Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency

Anna G Nazina email and Dmitri A Papatsenko email

Department of Biology, New York University, New York, USA

author email corresponding author email

BMC Bioinformatics 2003, 4:65doi:10.1186/1471-2105-4-65

Published: 22 December 2003

Abstract

Background

Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be extracted from genome on the basis of their relative position to the coding regions.

Results

To explore the feasibility of CRM extraction from a genome, we generated an original training set, containing annotated sequence data for most of the known developmental CRMs from Drosophila. Based on this set of experimental data, we developed a strategy for statistical extraction of cis-regulatory modules from the genome, using exhaustive analysis of local word frequency (LWF). To assess the performance of our analysis, we measured the correlation between predictions generated by the LWF algorithm and the distribution of conserved non-coding regions in a number of Drosophila developmental genes.

Conclusions

In most of the cases tested, we observed high correlation (up to 0.6–0.8, measured on the entire gene locus) between the two independent techniques. We discuss computational strategies available for extraction of Drosophila CRMs and possible extensions of these methods.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.