This article is part of the supplement: The ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (IJCBS)
A novel gene network inference algorithm using predictive minimum description length approach
- Equal contributors
1 School of Computing, University of Southern Mississippi, MS 39402, USA
2 Environmental Laboratory, U.S. Army Engineer Research and Development Center, 3909 Halls Ferry Road, Vicksburg, MS 39180, USA
3 SpecPro Inc., 3909 Halls Ferry Road, Vicksburg, MS 39180, USA
BMC Systems Biology 2010, 4(Suppl 1):S7 doi:10.1186/1752-0509-4-S1-S7Published: 28 May 2010
Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter.
The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data.
We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the PMDL principle is effective in determining the MI threshold and the developed algorithm improves precision of gene regulatory network inference. Based on the sensitivity analysis of all tested cases, an optimal CMI threshold value has been identified. Finally it was observed that the performance of the algorithms saturates at a certain threshold of data size.