c-REDUCE: Incorporating sequence conservation to detect motifs that correlate with expression
-
* Corresponding author: Katerina Kechris katerina.kechris@ucdenver.edu
1 Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver, 4200 East Ninth Avenue, B-119, Denver, CO 80262, USA
2 Department of Biochemistry and Biophysics, UCSF, 1700 4th Street, San Francisco, CA 94143, USA
3 Center for Theoretical Biology, Peking University, Beijing 100871, PR China
BMC Bioinformatics 2008, 9:506 doi:10.1186/1471-2105-9-506
Published: 28 November 2008Abstract
Background
Computational methods for characterizing novel transcription factor binding sites search for sequence patterns or "motifs" that appear repeatedly in genomic regions of interest. Correlation-based motif finding strategies are used to identify motifs that correlate with expression data and do not rely on promoter sequences from a pre-determined set of genes.
Results
In this work, we describe a method for predicting motifs that combines the correlation-based strategy with phylogenetic footprinting, where motifs are identified by evaluating orthologous sequence regions from multiple species. Our method, c-REDUCE, can account for variability at a motif position inferred from evolutionary information. c-REDUCE has been tested on ChIP-chip data for yeast transcription factors and on gene expression data in Drosophila.
Conclusion
Our results indicate that utilizing sequence conservation information in addition to correlation-based methods improves the identification of known motifs.