Flanking sequence context-dependent transcription factor binding in early Drosophila development
- Equal contributors
1 Computer Science Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
2 Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
3 Department of Biological Sciences, Mount Holyoke College, South Hadley, MA 01705, USA
4 Department of Biology, Amherst College, Amherst, MA 01002, USA
5 Mathematics Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
6 Department of Mathematics, Amherst College, Amherst, MA 01002, USA
BMC Bioinformatics 2013, 14:298 doi:10.1186/1471-2105-14-298Published: 4 October 2013
Gene expression in the Drosophila embryo is controlled by functional interactions between a large network of protein transcription factors (TFs) and specific sequences in DNA cis-regulatory modules (CRMs). The binding site sequences for any TF can be experimentally determined and represented in a position weight matrix (PWM). PWMs can then be used to predict the location of TF binding sites in other regions of the genome, although there are limitations to this approach as currently implemented.
In this proof-of-principle study, we analyze 127 CRMs and focus on four TFs that control transcription of target genes along the anterio-posterior axis of the embryo early in development. For all four of these TFs, there is some degree of conserved flanking sequence that extends beyond the predicted binding regions. A potential role for these conserved flanking sequences may be to enhance the specificity of TF binding, as the abundance of these sequences is greatly diminished when we examine only predicted high-affinity binding sites.
Expanding PWMs to include sequence context-dependence will increase the information content in PWMs and facilitate a more efficient functional identification and dissection of CRMs.