This article is part of the supplement: Problems and tools in the systems biology of the neuronal cell

Open Access Open Badges Review

Computational framework for the prediction of transcription factor binding sites by multiple data integration

Alberto Ambesi-Impiombato12, Mukesh Bansal13, Pietro Liò4 and Diego di Bernardo13*

Author Affiliations

1 TIGEM, Telethon Institute of Genetics and Medicine, Naples, Italy

2 Department of Neuroscience, University of Medicine "Federico II", Naples, Italy

3 SEMM, European School of Molecular Medicine, Naples, Italy

4 Computer Laboratory, Cambridge University, Cambridge, UK

For all author emails, please log on.

BMC Neuroscience 2006, 7(Suppl 1):S8  doi:10.1186/1471-2202-7-S1-S8

Published: 30 October 2006


Control of gene expression is essential to the establishment and maintenance of all cell types, and its dysregulation is involved in pathogenesis of several diseases. Accurate computational predictions of transcription factor regulation may thus help in understanding complex diseases, including mental disorders in which dysregulation of neural gene expression is thought to play a key role. However, biological mechanisms underlying the regulation of gene expression are not completely understood, and predictions via bioinformatics tools are typically poorly specific.

We developed a bioinformatics workflow for the prediction of transcription factor binding sites from several independent datasets. We show the advantages of integrating information based on evolutionary conservation and gene expression, when tackling the problem of binding site prediction. Consistent results were obtained on a large simulated dataset consisting of 13050 in silico promoter sequences, on a set of 161 human gene promoters for which binding sites are known, and on a smaller set of promoters of Myc target genes.

Our computational framework for binding site prediction can integrate multiple sources of data, and its performance was tested on different datasets. Our results show that integrating information from multiple data sources, such as genomic sequence of genes' promoters, conservation over multiple species, and gene expression data, indeed improves the accuracy of computational predictions.