Open Access Research article

Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes

Christian Dubos123, Zsolt Kelemen12, Alvaro Sebastian4, Lorenz Bülow5, Gunnar Huep6, Wenjia Xu1, Damaris Grain1, Fabien Salsac1, Cecile Brousse1, Loïc Lepiniec1, Bernd Weisshaar6, Bruno Contreras-Moreira47 and Reinhard Hehl5*

  • * Corresponding author: Reinhard Hehl r.hehl@tu-bs.de

  • † Equal contributors

Author Affiliations

1 INRA, Institut Jean-Pierre Bourgin, Saclay Plant Sciences, UMR1318, RD10, F-78026, Versailles, France

2 AgroParisTech, Institut Jean-Pierre Bourgin, Saclay Plant Sciences, UMR1318, RD10, F-78026, Versailles, France

3 Current address: Biochimie et Physiologie Moleculaire des Plantes, UMR 5004, INRA/CNRS/SupAgro-M/UM2, 34060 Montpellier Cedex 1, France

4 Estación Experimental de Aula Dei/CSIC, Av. Montañana 1.005, 50059 Zaragoza, Spain

5 Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106 Braunschweig, Germany

6 Department of Biology, Bielefeld University, Universitaetsstrasse 25, 33615 Bielefeld, Germany

7 Fundación ARAID, calle María de Luna 11, 50018 Zaragoza, Spain

For all author emails, please log on.

BMC Genomics 2014, 15:317  doi:10.1186/1471-2164-15-317

Published: 28 April 2014

Abstract

Background

Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes.

Results

Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions.

Conclusions

The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes.

Keywords:
Databases; Arabidopsis thaliana; Physcomitrella patens; Yeast one-hybrid; Microarray; Transcription factor; cis-element