Erroneous attribution of relevant transcription factor binding sites despite successful prediction of cis-regulatory modules
1 Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214 USA
2 Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14260 USA
3 New York State Center of Excellence in Bioinformatics and the Life Sciences, Buffalo NY 14203 USA
4 Department of Molecular and Cellular Biology, Roswell Park Cancer Institute, Buffalo NY 14263 USA
5 Center for Human Genome Variation, Duke University, Durham, NC 27708, USA
BMC Genomics 2011, 12:578 doi:10.1186/1471-2164-12-578Published: 25 November 2011
Cis-regulatory modules are bound by transcription factors to regulate gene expression. Characterizing these DNA sequences is central to understanding gene regulatory networks and gaining insight into mechanisms of transcriptional regulation, but genome-scale regulatory module discovery remains a challenge. One popular approach is to scan the genome for clusters of transcription factor binding sites, especially those conserved in related species. When such approaches are successful, it is typically assumed that the activity of the modules is mediated by the identified binding sites and their cognate transcription factors. However, the validity of this assumption is often not assessed.
We successfully predicted five new cis-regulatory modules by combining binding site identification with sequence conservation and compared these to unsuccessful predictions from a related approach not utilizing sequence conservation. Despite greatly improved predictive success, the positive set had similar degrees of sequence and binding site conservation as the negative set. We explored the reasons for this by mutagenizing putative binding sites in three cis-regulatory modules. A large proportion of the tested sites had little or no demonstrable role in mediating regulatory element activity. Examination of loss-of-function mutants also showed that some transcription factors supposedly binding to the modules are not required for their function.
Our results raise important questions about interpreting regulatory module predictions obtained by finding clusters of conserved binding sites. Attribution of function to these sites and their cognate transcription factors may be incorrect even when modules are successfully identified. Our study underscores the importance of empirical validation of computational results even when these results are in line with expectation.