School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406, USA

Environmental Services, SpecPro Inc, San Antonio, TX 78216, USA

Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USA

Abstract

Background

Dynamic Bayesian Network (DBN) is an approach widely used for reconstruction of gene regulatory networks from time-series microarray data. Its performance in network reconstruction depends on a structure learning algorithm. REVEAL (REVerse Engineering ALgorithm) is one of the algorithms implemented for learning DBN structure and used to reconstruct gene regulatory networks (GRN). However, the two-stage temporal Bayes network (2TBN) structure of DBN that specifies correlation between time slices cannot be obtained by score metrics used in REVEAL.

Methods

In this paper, we study a more sophisticated score function for DBN first proposed by Nir Friedman for stationary DBNs structure learning of both initial and transition networks but has not yet been used for reconstruction of GRNs. We implemented Friedman's Bayesian Information Criterion (BIC) score function, modified K2 algorithm to learn Dynamic Bayesian Network structure with the score function and tested the performance of the algorithm for GRN reconstruction with synthetic time series gene expression data generated by GeneNetWeaver and real yeast benchmark experiment data.

Results

We implemented an algorithm for DBN structure learning with Friedman's score function, tested it on reconstruction of both synthetic networks and real yeast networks and compared it with REVEAL in the absence or presence of preprocessed network generated by Zou&Conzen's algorithm. By introducing a stationary correlation between two consecutive time slices, Friedman's score function showed a higher precision and recall than the naive REVEAL algorithm.

Conclusions

Friedman's score metrics for DBN can be used to reconstruct transition networks and has a great potential to improve the accuracy of gene regulatory network structure prediction with time series gene expression datasets.

Background

High-content technologies such as DNA microarrays can provide a system-scale overview of how genes interact with each other in a network context. This network is called a gene regulatory network (GRN) and can be defined as a mixed graph over a set of nodes (corresponding to genes or gene activities) with directed or undirected edges (representing causal interactions or associations between gene activities)

Dynamic Bayesian networks (DBNs) are belief networks that represent the stochastic process of a set of random variables over time. The hidden Markov model (HMM) and the Kalman filter can be considered as the simplest DBNs. However, Kalman filters can only handle unimodal posterior distributions and linear models, whereas parameterization of HMM grows exponentially with the number of state variables

A commonly used structure learning algorithm is based on REVEAL (REVerse Engineering ALgorithm)

In the following sections, we first provide an introduction to DBN and existing DBN algorithms for reconstruction of GRNs. We then present an implementation of Friedman's DBN algorithm. Finally, we apply the algorithms to synthetic datasets and a real yeast benchmark dataset, and compare its performance to the commonly used Murphy's DBN algorithm

Methods

Dynamic Bayesian networks

A DBN is a probabilistic network defined as a pair (_{0},_{→}) representing the joint probability distribution over all possible time series of variables _{1},_{2,...,}_{n}_{i }_{i }_{0}=(_{0},Θ_{0})and a transition Bayesian network _{→}_{→},Θ_{→}). In time slice 0, the parents of _{i}_{[0] }are specified in the prior network _{0}, the parents of _{i}_{[t+1] }are those specified in time slice _{→}. The structure of a two-stage temporal Bayes network (2TBN) is showed in Figure _{[t+1]}|_{[0]},...,_{[t+1]}|_{[t]}). The other assumption is that the process is stationary, i.e. the transition probability _{[t+1]}|_{[t]}) is independent of

The basic building block of DBN

**The basic building block of DBN**.

Bayesian information criterion for DBN

Given a Bayesian network with structure

where the denominator is simply a normalized factor. Thus, we define the Bayesian score as:

where

where P(D|θ_{G},G) is the marginal likelihood of the data given the network 〈G,θ_{G}〉 and P(θ_{G}|G) is our pior.

Under Dirichlet distribution prior for all parameters in the network, when

where [

This approximation is called the Bayesian information criterion (BIC). N. Friedman, et al. deduce BIC for Dynamic Bayesian Network in his work, which is briefly described below.

It is assumed that the dataset _{seq }_{i }_{l}_{l}_{l}_{0 }from _{seq }_{→ }by

We use the following notations,

where ^{l}^{l}

The likelihood function decomposes as:

and the log-likelihood is given by

Such decomposition implies that _{0 }is independent from _{→}_{0}+BIC→

where,

Learning network structure

Under Friedman score metrics, the maximized score can be exploited by any Bayesian structure learning procedure, **s**uch as hill-climbing search procedures. In this paper, we modify K2 algorithm, and adapt it to learn structure for DBN, as described in Figure _{0 }independently of that of _{→}. We find the maximum score function and add a correlation between the factors in consecutive time slices or the same time slice if the relationship increases the score function. We stop adding parents to the node, when the addition of no single parent can increase the score.

Modified K2 algorithm for use in Friedman's algorithm on structure learning for dynamic Bayesian network (DBN)

**Modified K2 algorithm for use in Friedman's algorithm on structure learning for dynamic Bayesian network (DBN)**.

Existing approaches for comparison

For convenience of performance analysis in the next section, we briefly describe Murphy and Zou's previous work here and present results in the next section. The widely used DBN implementation developed by Murphy and Mian (called Murphy's DBN hereafter) is based on REVEAL ^{2n such sets, which can be arranged in a lattice for the permutation of factors. The problem is to find the highest score in the lattice. The approach taken by REVEAL is started from the bottom of the lattice, and evaluates the score at all points in the successive level until a point is found with a score of 1.0. Zou and Conzen 17 proposed a method to generate a preprocessed network for potential regulators by biological interpretation of time course microarray data. It assumes that the gene with earlier initial up-regulation is the potential regulator of those with later initial up-regulation. This preprocessed network is used to narrow down the search space for Murphy's DBN algorithm because it requires excessive time to find a permutation for each node even when imposing a maximum number of parents for the nodes if the network dimension is large.}

Results and discussion

The Friedman's algorithm described in the method section was implemented based on Murphy's BNT tool box (Bayes Net Toolbox for Matlab). We tested four cases of DBN algorithms on reconstruction of synthetic networks. The four methods are: (1) Zou's preprocessed networks consisting of potential regulators by biological interpretation of time course microarray data (Zou&Conzen), (2) Murphy's DBN, implemented in conjunction with the preprocessed networks (Kevin Murphy + Zou&Conzen), (3) Friedman's algorithm (Nir Friedman), and (4) Friedman's algorithm combined with the preprocessed networks (Friedman + Zou&Conzen).

Precision (P) and recall (R) were used as the metrics for performance comparison. Here, R is defined as C_{e}_{e}_{e}_{e}_{e}_{e}_{e}_{e}_{e}

Synthetic data

The synthetic datasets and network were generated using GeneNetWeaver from DREAM (Dialogue for Reverse Engineering Assessments and Methods) projects

An example of the 10-gene transition network reconstructed using Friedman's algorithm is shown in Figure

(a) A transition network of 10 genes learned by Friedman score metrics

**(a) A transition network of 10 genes learned by Friedman score metrics**. The left column shows the genes at time t, and the right column the corresponding gene at the next time slice. (b) The gene regulatory network converted from (a).

The second example is the GRNs with 50 genes as shown in Figure

The 50-gene network reconstructed by different algorithms with dashed lines indicating false positive edges, and solid lines true positive edges

**The 50-gene network reconstructed by different algorithms with dashed lines indicating false positive edges, and solid lines true positive edges**. (a) The true network, (b)

The GRN reconstructed by the modified Friedman method (Method 3) without a preprocessed network is a dense network, as given in Figure

Comparison of performance between different structure learning algorithms using synthetic dataset

**Comparison of performance between different structure learning algorithms using synthetic dataset**.

A complete performance comparison of the four algorithms in terms of precision and recall is given in Figure

Comparison of performance between different structure learning algorithms using synthetic dataset (C_{e}: Correctly infered edges; P: Precision; R: Recall)

**Nir Friedman**

**Nir Friedman + Zou&Conzon**

**Kevin Murphy + Zou&Conzon**

**Zou&Conzon**

**Network Size**

**C _{e}**

**P**

**R**

**C _{e}**

**P**

**R**

**C _{e}**

**P**

**R**

**C _{e}**

**P**

**R**

10

5

0.50

0.29

3

0.60

0.27

3

0.30

0.18

6

0.38

0.04

20

7

0.15

0.17

3

0.12

0.08

3

0.10

0.08

9

0.09

0.23

50

38

0.23

0.36

6

0.09

0.06

8

0.12

0.07

14

0.04

0.14

100

38

0.10

0.22

25

0.14

0.14

8

0.07

0.05

48

0.02

0.26

Real yeast benchmark dataset

We also investigated the performance of Friedman's DBN algorithm in reconstruct of GRNs from real biological datasets. We tested it on the benchmark yeast time series dataset from Spellman's experiment

The real yeast network reconstructed by different algorithms (dashed lines indicating false positive edges, and solid lines true positive edges)

**The real yeast network reconstructed by different algorithms (dashed lines indicating false positive edges, and solid lines true positive edges)**. (a) Murphy + Zou algorithm (b) Probabilistic Boolean Network (c) Friedman's score metrics.

Comparison of performance between different structure learning algorithms using yeast benchmark dataset (C_{e}: Correctly infered edges; P: Precision; R: Recall)

**Nir Friedman**

**Kevin Murphy + Zou&Conzon**

**Probabilistic Boolean Network**

**Network Size**

**C _{e}**

**P**

**R**

**C _{e}**

**P**

**R**

**C _{e}**

**P**

**R**

13

19

0.76

0.19

11

0.69

0.11

20

0.71

0.20

Conclusions

In this study, we implemented Friedman's score metrics for DBNs by our algorithm, and applied the algorithm in reconstruction GRNs using both synthetic time series gene expression data and a real yeast benchmark dataset. The algorithm is able to capture the correlation between consecutive time slices in both score function and learning procedure, thus Friedman's score metrics gives a higher precision and recall than the naive REVEAL algorithm application in the absence or presence of preprocessed network generated by Zou&Conzen's algorithm. This also reflects that in real biological processes, time lag regulation might better describe the true regulation between genes. Also based on the testing results, the Friedman's score metrics we implemented has great potential in improving the accuracy of structure prediction for GRN reconstruction with complete synthetic time series data.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

HL implemented the algorithms, conducted network inference and performance comparison. HL and CZ drafted the paper. CZ, PG and EJP supervised this work and revised the paper. NW participated in algorithm development and network inference. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by the US Army Corps of Engineers Environmental Quality Program under contract # W912HZ-08-2-0011and the NSF EPSCoR project "Modeling and Simulation of Complex Systems" (NSF #EPS-0903787). Permission was granted by the Chief of Engineers to publish this information.