Gippsland School of Information Technology, Monash University, Melbourne, Australia

Department of Microbiology, Monash University, Melbourne, Australia

Chemical Engineering Department, Indian Institute of Technology, Bombay, India

Abstract

Background

Dynamic Bayesian network (DBN) is among the mainstream approaches for modeling various biological networks, including the gene regulatory network (GRN). Most current methods for learning DBN employ either local search such as hill-climbing, or a meta stochastic global optimization framework such as genetic algorithm or simulated annealing, which are only able to locate sub-optimal solutions. Further, current DBN applications have essentially been limited to small sized networks.

Results

To overcome the above difficulties, we introduce here a deterministic global optimization based DBN approach for reverse engineering genetic networks from time course gene expression data. For such DBN models that consist only of inter time slice arcs, we show that there exists a polynomial time algorithm for learning the globally optimal network structure. The proposed approach, named GlobalMIT^{+}, employs the recently proposed information theoretic scoring metric named mutual information test (MIT). GlobalMIT^{+} is able to learn high-order time delayed genetic interactions, which are common to most biological systems. Evaluation of the approach using both synthetic and real data sets, including a 733 cyanobacterial gene expression data set, shows significantly improved performance over other techniques.

Conclusions

Our studies demonstrate that deterministic global optimization approaches can infer large scale genetic networks.

Background

Gene regulatory network (GRN) reverse-engineering has been a subject of intensive study within the systems biology community during the last decade. Of the dozens of methods available currently, most can be broadly classified into three main-stream categories, namely

In this paper, we focus on the BN paradigm, which is indeed among the first approaches for reverse engineering GRN, through the seminal work of Friedman et al.

In DBN framework, the task of GRN reverse engineering amounts to learning the optimal DBN structure from gene expression data. After the structure has been reconstructed, a set of conditional probability tables can be easily learned, using methods such as maximum likelihood, to describe the system dynamics. In this paper, we are focusing on the more challenging problem of structure learning. Most of the recent works have employed either

The first-order Markov DBN model that we considered earlier ^{+}. Our contribution in this paper is three-fold: (i) we prove the polynomial time complexity of GlobalMIT^{+} for higher order DBNs; (ii) we give a complete characterization of the time complexity of GlobalMIT^{+}, and propose a variant GlobalMIT^{*} for large scale networks that balances optimality, order coverage and computational tractability; (iii) we evaluate the high-order GlobalMIT^{+/*} on several real and synthetic datasets, and for the first time apply a DBN-based GRN reverse engineering algorithm on a large scale network of 733 cyanobacterial genes, in a very reasonable run-time on a regular desktop PC. We show that the learned networks exhibit a scale-free structure, the common topology of many known biochemical networks, with hubs with significantly enriched functionals corresponding to major cellular processes.

Methods

Preliminaries

We first briefly review the DBN models. Let _{i} over _{i}_{i} at any time **X****X**[1],…,**X****X****X****X****X**

DBN models consist of two parts: the

(a) prior network; (b) First-order Markov transition network; (c) 2nd-order Markov transition network with only inter time slice edges.

**(a) prior network; (b) First-order Markov transition network; (c) 2nd-order Markov transition network with only inter time slice edges.**

A critical limitation of the first-order DBN for modeling GRN is that it assumes every genetic interaction to have a uniform time lag of 1 time unit, i.e., all edges are from slice ^{th}**X****X**[1],…,**X****X****X****X**_{2} is regulated by two parents, namely _{3} with one-time-unit lag, and _{1} with two-time-unit lag.

The MIT scoring metric

In this section, we first review the MIT scoring metric for learning BN and then show how it can be adapted to the DBN case. The most popular approaches for learning DBN are essentially those that have been adapted from the static BN literature, namely the _{r1},…,_{rn}} be the number of discrete states corresponding to our set of RVs **X**={_{X1},…,_{Xn}}, _{Pai}={_{Xi1},…,_{is}_{i}} be the set of parents of _{i} in _{i1},…,_{is}_{i}} discrete states, and _{si}=|_{Pai}|. The MIT score is defined as:

where _{i} and its parents as estimated from

where _{si}} of _{Pai}, with the first variable having the greatest number of states, the second variable having the second largest number of states, and so on.

To make sense of this criterion, let us first point out that maximizing the first term in the MIT score, i.e., _{i} and _{j} are conditionally independent given _{Pai} is true, the statistics 2_{Xi}_{Xj}|_{Pai}) approximates to a ^{χ2}(_{ri}−1)(_{rj}−1)_{qi} degree of freedom, and _{qi}=1 if _{Pai}=_{i} is the total number of states of _{Pai}, i.e., _{i} given all the other variables in _{Pai} that is higher than 100

We next show how MIT can be adapted for the case of high-order DBN learning, by carefully addressing the issue of data alignment. The mutual information is now calculated between a parent set and its child at different time lags. At any time **Pa**_{i}={_{Xi1}[_{δi1}],…,_{is}_{i}[_{is}_{i}]} be the parent set of _{Xi}[_{δi1},…,_{is}_{i}} be the actual regulation order corresponding to each parent. In this work, since we only consider DBN with inter time slice edges, 1≤_{δij}≤_{s} as a time-delayed mutual information operator, which automatically shifts the target variable as well as all of its parents to the correct alignment.

The number of _{e} is therefore _{Ne}=_{i}’s are the length of the time series. The MIT score for DBN is calculated as:

To make this clear, we demonstrate the process of data alignment through the simple DBN example given in Figure _{2}, _{Is}(·) operates, it shifts the target node _{2} forward by two units in time, while the parent _{1} is shifted zero unit, and parent _{3} is shifted 1 unit, as shown in Figure _{Ne}=_{N1}−2 if only the first sequence is used, or _{Ne}=_{N1} + _{N2} + _{N3}−2×3 if all 3 sequences are used for learning.

Data alignment for node _{2} in the DBN in Figure _{s}(_{2},Pa_{2}).

**Data alignment for node**_{2}** in the DBN in Figure****(c).** Shaded cells denote unused observations for the calculation of _{s}(_{2},Pa_{2}).

Shared and exchanged information in time-delayed MI

The proposed algorithm uses the time-delayed mutual information to give directional sense in dynamical systems. As a measure, for capturing system dynamics, the time-delayed MI contains both the exchanged information which is useful and the shared information which is not useful. However, Schreiber

However, we note that the transfer entropy requires the estimation of very high-dimensional joint distributions, i.e., (2

Proposed approaches

This section presents our GlobalMIT^{+} algorithm for learning the globally optimal structure for a ^{+} with d=1. Our development of GlobalMIT^{+} has made use of the same set of assumptions as proposed by

Assumption 1

(acyclicity) Examination of the graph acyclicity is not required.

This assumption is valid for DBNs with no intra time slice edges. For this class of DBN, as the edges are only directed forward in time, acyclicity is automatically satisfied. The biological implication of this assumption is that we may not be able to detect the instantaneous interactions. As stated previously, the majority of genetic interactions are time-delayed. However, if the sampling gap is large, we may consider some quick interactions as instantaneous. The effect of this constraint is that, if gene _{1} regulates gene _{2} almost instantly, their mutual information _{1},_{2}) will likely be maximized when their expression profiles are in synchrony, i.e., no shifting of any of the two sequences is involved. With Assumption 1 in place, we will have to consider two time-delayed mutual information values, _{Is}(_{X1},_{X2}) and _{Is}(_{X2},_{X1}) (since _{s} is asymmetric). If these values are significantly weaker than _{X1},_{X2}) then the interaction between genes _{1} and _{2} may go undetected. However, when the signal is smooth and is sampled in short time step, we found that shifting the expression profile by just one time unit will not often cause a large reduction in the MI value. This is because smooth time series have high auto-correlation at short lags, and thus, instantaneous interactions may still be captured by DBN models with only inter-time slice edges. The algorithmic implication of Assumption 1 becomes clear when we consider Assumption 2 below:

Assumption 2

(additivity)

To simplify notation, we write _{Pai}) for _{Pai} can be an arbitrary subset of ^{dn}. In order to further reduce the search space, we rely on the special structure of the scoring metric, as follows:

Assumption 3

(splitting) _{Pai})=_{Pai}) + _{Pai}) for some non-negative functions

Assumption 4

(uniformity)

Assumption 3 requires the scoring function to decompose into two components: **Pa**_{i} (Assumption 4), i.e., the network gets more complex as more variables are added to the parent sets. However in its original form, the MIT scoring metric, having higher scores for better networks, does not abide by these assumptions. We overcome this by casting the problem as a minimization problem (similar to Dojer) where lower scored networks are better. We consider a variant of MIT as follows:

where _{Xi},

Roughly speaking, _{MIT} measures the “error” of representing the joint distribution underlying _{MIT} measures the complexity of this representation. We make the following propositions:

Proposition 1

_{MIT} maximization is equivalent to _{MIT} minimization.

Proof

This is obvious, since

Proposition 2

_{vMIT},_{uMIT}satisfy assumption 3.

Proof

_{vMIT}≥0 since of all possible parent sets _{Pai}, the full set ^{Xd}has the maximum (shifted) mutual information with _{i}. And since the support of the Chi-square distribution is ^{ℝ + }, i.e., _{χα,·}≥0, therefore

While we note that _{MIT} does not satisfy Assumption 4, for applications where all the variables have the same number of states, it can be shown to satisfy this assumption. Within the context of GRN modeling from microarray data, this generally holds true, since it is a popular practice to discretize expression data of all genes to, e.g., 3 states corresponding to high, low and base-line expression value

Assumption 5

(variable uniformity) All variables in **X** have the same number of discrete states

Proposition 3

Under the assumption of variable uniformity, _{MIT} satisfies assumption 4.

Proof

It can be seen that if |**Pa**_{i}|=|**Pa**_{i}_{si}, then

Since _{uMIT}(_{Pai}) is the same for all parent sets of the same cardinality, we can write _{uMIT}(|_{Pai}|) in place of _{uMIT}(_{Pai}). With Assumptions 1-5 satisfied, we can employ the following Algorithm 1, named globalMIT^{+}, to find the globally optimal DBN with MIT, i.e., the one with the minimal _{MIT} score.

Theorem 1

Under assumptions 1-5, GlobalMIT^{+} applied to each variable in **X** finds a globally optimal

Algorithm 1 GlobalMIT^{+} : Optimal ^{th}-order DBN with MIT

· _{Pai}:=

· **for**

· If _{uMIT}(_{sMIT}(_{Pai}) then return _{Pai}; Stop.

·

· If _{sMIT}(**P**)<_{sMIT}(_{Pai}) then _{Pai}:=**P**.

· **end for**

Proof

The key point here is that once a parent set grows to a certain extent, its complexity alone surpasses the total score of a previously found sub-optimal parent set. In fact, all the remaining potential parent sets **P** omitted by the algorithm have a total score higher than the current best score, i.e., _{sMIT}(**Pa**)≥_{uMIT}(|**Pa**|)≥_{sMIT}(_{Pai}), where _{Pai}is the last sub-optimal parent set found. □

We note that the terms _{MIT} score in (1) are all constant and would not affect the outcome of our optimization problem. Knowing their exact value is however, necessary for the stopping criterion in Algorithm 1, and also for determining its complexity bound, as will be shown in Section “Complexity analysis”. Calculating ^{knd + 1}). However, for our purpose, since the only requirement for _{MIT} is that it must be non-negative, it is sufficient to use an upper bound of

where _{Hs}(_{Xi}) is the entropy of _{i} estimated from a _{Hs}(_{Xi}), that is

Using these bounds, we obtain the following more practical versions of _{MIT}:

It is straightforward to show that Algorithm 1 and Theorem 1 are still valid when _{MIT} or _{MIT} are used in place of _{MIT}.

Complexity analysis

Theorem 2

GlobalMIT^{+} admits a polynomial worst-case time complexity of

Proof

Our aim is to find a number ^{*} satisfying ^{*} and over. In the worst case, our algorithm will have to examine all the possible parent sets of cardinality from 1 to ^{*}-1. We have:

As discussed above, since calculating _{MIT} is not convenient, we use _{MIT} and _{MIT} instead. With _{MIT}, ^{*} can be found as:

while for _{MIT}′:

It can be seen that ^{*} depends only on _{e}. Since there are ^{Xd} with at most ^{*} parents, and each set of parents can be scored in polynomial time, GlobalMIT^{+} admits an overall polynomial worst-case time complexity in the number of variables ^{*} does not admit a closed-form solution (since ^{*} can be provided as follows. Note that _{ij}, as an under-estimate for

Assuming

Let us now compare this bound with those of the algorithms for learning the globally optimal DBN under the BIC/MDL and BDe scoring metrics as proposed by _{Ne}, making its value less of practical interest, even for small data sets. Moreover, this bound becomes meaningless when _{Ne}>

The GlobalMIT^{*} algorithm

It is noted that the search space has been expanded from **X**[^{th}-order DBN. Roughly, the number of variables has been multiplied ^{+}. For very large networks, it may be useful to consider the following additional assumption:

Assumption 6

(non-redundant, optimal-lag interaction) No multiple edges with different time lags exist between a parent _{i} and its child _{j}. Furthermore, the only one edge allowed, if it exists, must take place at the optimal lag

This assumption restricts that for each node _{i}, there may be only one single link to any node _{j} at the _{j} reduces from ^{+}, when Assumption 6 is employed, as GlobalMIT^{*}. It can be easily seen that, for any high order ^{*} still admits the same complexity as the first order GlobalMIT.

Results and discussion

This section presents the experimental evaluation on GlobalMIT^{+/*}. Our proposed algorithms are implemented within the Matlab/C++ GlobalMIT^{+} toolbox, freely available as online supplementary material (Additional file

GlobalMIT+.zip — The GlobalMIT^{+} toolbox Implementation of the proposed algorithms in Matlab and C++, together with the user’s guide

Click here for file

It is noted that the GlobalMIT^{+} toolbox supports multi-threading to maximally exploit the currently popular multi-core PC systems. We conducted our experiments on a quad-core i7 desktop PC with 8Gb of main memory, running Win7 64bit, which is a typical off-the-shelf PC configuration at the time this paper was written. Intel core i7 processors contain 4 separate cores, each can handle 2 independent threads concurrently. We shall execute GlobalMIT^{+} with 6 threads in parallel (the remaining two being reserved for system and interface processes). BANJO also supports multi-threading, whereas BNFinder does not. While we could have run all algorithms with only a single thread, for a “fair” comparison in terms of run-time, our objective in carrying out the experiments this way is to highlight the capability and benefit of parallelization of GlobalMIT^{+}. The 1-thread execution time would be roughly three to five times longer in our observation. As for parameter setting, BNFinder was run with default settings, while BANJO was run with 6 threads, ^{+} or at least 10 minutes, whichever longer. GlobalMIT^{+} has two parameters, namely the significance level _{Ne}<100 and

Small scale

We study the

For this small network, GlobalMIT^{+} and BNFinder require only a few seconds, while BANJO was executed for 10 minutes with 6 threads in parallel. The experimental results are reported in Figure ^{+} (d=1), BNFinder (BDe & MDL) all returned the same network in Figure ^{+} discovered a similar hub structure. The most complete network was discovered at d=6 in (Figure

Supplementary Material for Gene Regulatory Network Modeling via Global Optimization of High-Order Dynamic Bayesian Network

Click here for file

Experimental results on the

**Experimental results on the**
**
E. coli
**

Medium scale synthetic network for glucose homeostasis

We study a glucose homeostasis network of 35 genes and 52 interactions, first proposed by Le et al.

The hepatic glucose homeostasis network: black, blue, red colors for 1st-, 2nd- and 3rd-order interactions respectively.

**The hepatic glucose homeostasis network: black, blue, red colors for 1st-, 2nd- and 3rd-order interactions respectively.**

**GlobalMIT** **( d=1)**

**GlobalMIT**^{*} **( d=3)**

**GlobalMIT**^{+} **( d=3)**

**BANJO**

**BNFinder+MDL**

**N**

**Pr**

**Se**

**Time**

**Pr**

**Se**

**Time**

**Pr**

**Se**

**Time**

**Pr**

**Se**

**Time**

**Pr**

**Se**

**Time**

^{Se: percent sensitivity; Pr: percent precision; Time: in minutes.}

25

75±17

9±2

0±0

67±8

18±5

0±0

64±12

18±5

0±0

12±2

22±3

10±0

64±17

9±2

0±0

50

82±14

19±3

0±0

80±10

35±3

0±0

77±12

35±4

0±0

25±5

27±6

10±0

88±12

18±4

1±0

75

85±12

24±3

0±0

85±6

45±4

0±0

81±8

46±4

9±0

34±4

28±2

10±0

85±11

23±3

7±0

100

94±7

24±2

2±0

98±4

46±4

0±0

98±4

46±4

11±0

41±5

29±3

11±0

85±8

25±3

14±0

125

91±8

25±2

2±0

97±4

50±3

2±0

97±4

50±4

482±39

43±4

30±3

482±39

82±8

27±2

20±0

It is noted that we have omitted BNFinder+BDe in this experiment. The reason is that this algorithm becomes too expensive even for this medium network. For example, at N=25, BNFinder+BDe requires around 1 minute. The execution time quickly increase to 1206±167 mins at N=50. And at N=75, we could not even complete analyzing the first of the 10 datasets: the execution was abandoned after 3 days, with BNFinder+BDe having learnt the parents for only 2 nodes. Of the algorithms reported in Table ^{+} and BNFinder. This result also highlights the major advantage of deterministic global optimization based approaches (GlobalMIT^{+}, BNFinder) over stochastic global optimization based method such as BANJO. Wherever applicable, these methods never get stuck in local minima, and are able to deliver consistent and high quality results. Of course, BANJO on the other hand is the choice for very large datasets where deterministic methods are computationally infeasible.

As for higher-order DBN learning algorithms, both GlobalMIT^{+} and GlobalMIT^{*} (with d=3) achieves significantly better sensitivity compared to first-order DBN learning algorithms (GlobalMIT, BNFinder, BANJO). The improved sensitivity is mainly credited to the ability of these algorithms to cover all the possible time-delayed interactions between the genes. More specifically, at N=125, GlobalMIT^{*} discovers on average 16.9 high-order interactions, i.e., 43% of the total high-order interactions. Meanwhile, BANJO and BNFinder+MDL only recover on average 5.5 (14%) and 4.6 (12%) high-order interactions respectively. It is also noticeable from this experiment that GlobalMIT^{*} delivered results almost identical to GlobalMIT^{+} but with a much shorter time, comparable to the 1st-order GlobalMIT.

Large scale cyanobacterial genetic networks

This section presents our analysis on a large scale cyanobacterial network. Cyanobacteria are the only prokaryotes that are capable of photosynthesis, and in recent years have received increasing interest

We used transcriptomic data from ^{*} version, with order d=3 (which indeed covers one time point lag on the original data set). GlobalMIT^{*} inferred the network as in Figure

The

**The****sp. 51142 reconstructed genetic networks, visualized with Cytoscape.** Node size is proportional to the node connectivity.

To formalize this observation, we fit the node degree (counting both in- and out-degree) in the GlobalMIT^{*} inferred network to the power-law distribution using the method of maximum likelihood (ML). The ML estimate for ^{x−2.24}curve. In order to verify that the scale-free structure is not merely an artefact of the inference algorithm, we test GlobalMIT^{*} with the same parameters on the same microarray data set, but with every gene expression profile randomly shuffled. The resulting network is shown in Figure

Node degree distribution.

**Node degree distribution.**

We next tested BNFinder and BANJO on this data set. BNFinder+BDE was abandoned after 3 days of execution without finishing. BNFinder+MDL on the other hand is relatively fast, requiring only 4 mins. The resulting network, shown in Figure ^{*} network. BANJO was run with 6 threads for 1h. The resulting network, shown in Figure

We next perform functional enrichment analysis for the top hubs in each network. For this purpose, we gathered annotation data for ^{*} network, 10 were found to be significantly enriched in major ^{*} reconstructed network. Upon inspecting BNFinder+MDL network, 6 out of the top 20 hubs were found to be significantly enriched, also in major relevant cellular processes. It is noted that while GlobalMIT^{*} show the most hubs, BNFinder+MDL manages to recover several hubs with significantly better corrected p-value. In particular, 3 hubs for nitrogen fixation, proton transport and ribosome were recovered with significantly smaller corrected ^{*}, other important functional hubs for photosynthesis, photosystem I & II were missing. BANJO on the other hand produced relatively poor result, with only 1 out of 20 top hubs turned out to be significantly enriched, but not related to any major cellular pathway. The overall results suggest that both GlobalMIT^{*} and BNFinder+MDL successfully reconstructed biologically plausible network structures, i.e., scale-free with a reasonable scaling parameter value, and with functionally enriched modules relevant to the wet-lab experimental condition under study. GlobalMIT^{*} managed to produce more enriched hubs, as a result of the higher order DBN model employed and the improved MIT scoring metric. BANJO on the other hand, generally failed to produce a plausible network structure. This experimental result thus highlights the advantage of deterministic global optimization approach, as employed by GlobalMIT^{*} and BNFinder+MDL, versus a stochastic global optimization approach as employed by BANJO.

**GlobalMIT**^{*}**network**

**Hub**

**Degree**

**Enriched function**

**Corrected****p****-value**

cce_4432

16

Nitrogen fixation

4.5E-5

cce_3394

16

Nitrogen fixation

1.7E-5

cce_3974

14

Photosynthesis, dark reaction

1.4E-2

cce_0997

13

Photosystem I

1.3E-5

cce_0103

12

Plasma membrane proton-transporting

1.7E-5

cce_0589

11

Signal transducer

9.4E-3

cce_1620

10

Photosystem II reaction center

2E-2

cce_1578

10

Structural constituent of ribosome

1E-2

cce_2038

10

Response to chemical stimulus

4.5E-2

cce_4486

9

Photosynthetic membrane

3.1E-2

**BNFinder+MDL network**

cce_3394

20

Nitrogen fixation

3.7E-8

cce_3377

17

Proton-transporting ATPase activity

2.1E-7

cce_3898

15

Structural constituent of ribosome

2.5E-11

cce_1943

11

peptidoglycan biosynthetic process

3.4E-2

cce_2639

9

thiamine-phosphate kinase activity

2.1E-2

cce_1620

8

Photosystem II reaction center

1E-2

**BANJO network**

cce_4663

10

Calcium ion binding

3.4E-2

Conclusion

In this paper, we have introduced GlobalMIT^{+} and GlobalMIT^{*}, two DBN-based algorithms for reconstructing gene regulatory networks. The GlobalMIT suite makes use of the recently introduced MIT scoring metric, which is built upon solid principles of information theory, having competitive performance compared against the other traditional scoring metrics such as BIC/MDL and BDe. In this work, we have further shown that MIT possesses another very useful characteristic in that when placed into a deterministic global optimization framework, its complexity is very reasonable. As theoretically shown and experimentally verified, GlobalMIT exhibits a much lower complexity compared to the BDe-based algorithm, i.e., BNFinder+BDe, and is comparable with the MDL-based algorithm, i.e., BNFinder+MDL. GlobalMIT^{+/*} are also designed to learn high-order variable time delayed genetic interactions that are common to biological systems. Furthermore, the GlobalMIT^{*} variant has the capability of reconstructing relatively large-scale networks. As shown in our experiments, GlobalMIT^{+/*} are able to reconstruct genetic networks with biologically plausible structure and enriched submodules significantly better than the alternative DBN-based approaches. Our current and future study of GlobalMIT^{+/*} mainly focuses on the application of these newly developed algorithms to elucidate the gene regulatory network of

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

NXV developed the algorithms and carried out the experiments. MC provided overall supervision and leadership to the research. NXV and MC drafted the manuscript. RC and PPW suggested the biological data and provided biological insights. All authors read and approved the final manuscript.

Acknowledgement

This project is supported by an Australia-India strategic research fund (AISRF).