Identifying targets of multiple co-regulating transcription factors from expression time-series by Bayesian model comparison
- Equal contributors
1 The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
2 Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
3 Department of Information and Computer Science, Aalto University, Helsinki, Finland
4 Department of Computer Science and Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK
BMC Systems Biology 2012, 6:53 doi:10.1186/1752-0509-6-53Published: 30 May 2012
Complete transcriptional regulatory network inference is a huge challenge because of the complexity of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference of context-specific networks involving a few interacting transcription factors (TFs) and all of their target genes.
We present a computational framework for Bayesian statistical inference of target genes of multiple interacting TFs from high-throughput gene expression time-series data. We use ordinary differential equation models that describe transcription of target genes taking into account combinatorial regulation. The method consists of a training and a prediction phase. During the training phase we infer the unobserved TF protein concentrations on a subnetwork of approximately known regulatory structure. During the prediction phase we apply Bayesian model selection on a genome-wide scale and score all alternative regulatory structures for each target gene. We use our methodology to identify targets of five TFs regulating Drosophila melanogaster mesoderm development. We find that confident predicted links between TFs and targets are significantly enriched for supporting ChIP-chip binding events and annotated TF-gene interations. Our method statistically significantly outperforms existing alternatives.
Our results show that it is possible to infer regulatory links between multiple interacting TFs and their target genes even from a single relatively short time series and in presence of unmodelled confounders and unreliable prior knowledge on training network connectivity. Introducing data from several different experimental perturbations significantly increases the accuracy.