From gene expression to gene regulatory networks in Arabidopsis thaliana
1 School of Computing, University of Leeds, Leeds, LS2 9JT, UK
2 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, LS2 9JT, UK
3 Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, UK
4 Current address : School of Biological and Biomedical Sciences, Durham University, Durham, UK
BMC Systems Biology 2009, 3:85 doi:10.1186/1752-0509-3-85Published: 3 September 2009
The elucidation of networks from a compendium of gene expression data is one of the goals of systems biology and can be a valuable source of new hypotheses for experimental researchers. For Arabidopsis, there exist several thousand microarrays which form a valuable resource from which to learn.
A novel Bayesian network-based algorithm to infer gene regulatory networks from gene expression data is introduced and applied to learn parts of the transcriptomic network in Arabidopsis thaliana from a large number (thousands) of separate microarray experiments. Starting from an initial set of genes of interest, a network is grown by iterative addition to the model of the gene, from another defined set of genes, which gives the 'best' learned network structure. The gene set for iterative growth can be as large as the entire genome. A number of networks are inferred and analysed; these show (i) an agreement with the current literature on the circadian clock network, (ii) the ability to model other networks, and (iii) that the learned network hypotheses can suggest new roles for poorly characterized genes, through addition of relevant genes from an unconstrained list of over 15,000 possible genes. To demonstrate the latter point, the method is used to suggest that particular GATA transcription factors are regulators of photosynthetic genes. Additionally, the performance in recovering a known network from different amounts of synthetically generated data is evaluated.
Our results show that plausible regulatory networks can be learned from such gene expression data alone. This work demonstrates that network hypotheses can be generated from existing gene expression data for use by experimental biologists.