Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data

Mingzhu Zhu15, Jeremy L Dahmen34, Gary Stacey34 and Jianlin Cheng123*

Author Affiliations

1 Department of Computer Science, University of Missouri, Columbia, MO 65211, USA

2 Informatics Institute, University of Missouri, Columbia, MO, USA

3 C.S. Bond Life Science Center, University of Missouri, Columbia, MO, USA

4 Divisions of Plant Science and Biochemistry, Columbia, MO, USA

5 Current address: Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:278  doi:10.1186/1471-2105-14-278

Published: 22 September 2013



High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed.


We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature.


We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.