Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrep-resented upstream motifs

Davide Corà1, Ferdinando Di Cunto2, Paolo Provero3, Lorenzo Silengo2 and Michele Caselle1*

Author Affiliations

1 Dipartimento di Fisica Teorica, Università di Torino, and INFN, sezione di Torino, Italy

2 Dipartimento di Genetica, Biologia e Biochimica, Università di Torino, Torino, Italy

3 Fondazione per le Biotecnologie, Torino, Italy

For all author emails, please log on.

BMC Bioinformatics 2004, 5:57  doi:10.1186/1471-2105-5-57

Published: 11 May 2004

Abstract

Background

Transcriptional regulation is a key mechanism in the functioning of the cell, and is mostly effected through transcription factors binding to specific recognition motifs located upstream of the coding region of the regulated gene. The computational identification of such motifs is made easier by the fact that they often appear several times in the upstream region of the regulated genes, so that the number of occurrences of relevant motifs is often significantly larger than expected by pure chance.

Results

To exploit this fact, we construct sets of genes characterized by the statistical overrepresentation of a certain motif in their upstream regions. Then we study the functional characterization of these sets by analyzing their annotation to Gene Ontology terms. For the sets showing a statistically significant specific functional characterization, we conjecture that the upstream motif characterizing the set is a binding site for a transcription factor involved in the regulation of the genes in the set.

Conclusions

The method we propose is able to identify many known binding sites in S. cerevisiae and new candidate targets of regulation by known transcritpion factors. Its application to less well studied organisms is likely to be valuable in the exploration of their regulatory interaction network.