Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, Singapore, 639798

Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA

Singapore-MIT Alliance, Singapore, 117543

Abstract

Background

Time delays are often found in gene regulation though most techniques of building gene regulatory networks are not capable of capturing such phenomena. Here we look at the delays in the DNA repair system of

Results

We evaluated our method on time-course gene expressions after DNA damage with Mitocyin C. Several time-delayed interactions were observed with our analysis. The presence of hubs in the networks indicates that a small number of transcriptional factors regulate the rest of the system. We demonstrate the use of priors to overcome over-fitting problem in the generation of networks. We compare our results with the gene networks derived with dynamic Bayesian networks (DBN).

Conclusion

Different transcription networks are active at different stages, and constant feedback and regulation is maintained throughout the activities of a biological pathway. Skip-chain models are capable of capturing, long distant and the time-delayed regulations. Use of a Dirichlet prior over parameters and Gibbs prior over structure can greatly reduce the over-fitting in the new model.

Background

Cellular activities of genes and gene products represented in gene regulatory networks (GRN) provide a basis for signal transduction pathways. Since the signal transduction is transient, the study of dynamics of the transduction is essential. Further, the

In this work, we use Bayesian networks (BN) in the stochastic framework to represent GRN. Pathways have a natural representation of BN, where genes are nodes in the network and edges are causal interactions among them. The causal dependencies are given as conditional probabilities which infer 'cause and effect' relationships among genes in the network. A BN being acyclic is not able to model feedbacks and self-regulation events. The dynamic Bayesian network (DBN) is defined by a pair of structures (_{t}, _{t+1}) each corresponding to time instances

However, several time-delayed interactions are known to exist in biological systems. DBN was extended to a higher-order where mutual information (MI) has been used to determine the best time-delay of an interaction

The linear feature attempts to model interactions which occur instantly or with little delay. The skip feature model interactions occurring much later in the pathway, for example, a gene _{i }inhibits a gene _{j }to start a process, and later _{i }regulates another gene _{k }towards the end of the process. The skip-feature probability is decomposed into a sum of terms for consecutive pairs of genes in the time-course and the most likely interactions are found using the Viterbi algorithm. The Viterbi skip-feature can automatically determine the best time delay in a higher-order Markov chain representing the instantaneous network of DBN.

Our approach consists of three stages: first, our method involves identifying time-delayed interaction features and predicting the optimal GRN by using a GA. The fitness function of the GA is modified to include Viterbi scores of time-delayed interactions by using the skip-chain model. Next, an application to DNA repair system of

Methods

BN decomposes the joint probability of genes into a product of conditional probabilities by using the chain rule and independence of non-descendant genes, given their parents

where _{1}, _{2}, ...., _{n}), the conditional probability of gene expression _{i }given its parents _{i }is _{i}|_{i}, _{i}), and _{i }denotes the parameters of the conditional probabilities.

The acyclic condition in BN does not allow self regulation and feedback, which are characteristic of GRN. To overcome this limitation, dynamic Bayesian networks (DBN) are used in which a transition network from one time point to the next characterizes the GRN. The first-order DBN is defined by a transition network of interactions between a pair of structures (_{t}, _{t+1}) corresponding to time instances

where _{ijk }= _{i, t+1 }= _{i, t }=

The classical DBN is unable to capture complex time-dependencies and is extended to an

Linear-chain feature functions _{i}, _{i(t-o:t)}, _{i}, _{i}, _{t}, _{t }and

A skip chain model

**A skip chain model**. A skip chain model has overlapping skip-edges which model long-distant dependencies.

We can interpolate the two types of features _{i }is a weighted sum of linear and skip-edge scores:

where

For interactions, we look for causal effects of regulated genes as features. We can use the Viterbi algorithm to find a maximum likelihood (ML) path between two genes at distant time points in a hidden Markov model (HMM) _{i }and _{j}, we choose the highest Viterbi score among all the possible interaction features.

A genetic algorithm is used to find the optimal network structure. Here an individual is defined by matrix {_{i, j}}_{n × n }with dimension _{i, j }is randomly initialized with interactions which have MI at a time lag _{j }is the parent of _{i}. The GA then finds the structure with the highest posterior probability (Eq. 3). The GA provides an optimal structure maximizing the likelihood asymptotically. We also explored the use of two priors over the network.

Dirichlet prior over parameters

Most higher-order Markov models are far from optimal. They are extremely sensitive to change in pathways and associated data. This happens as most of the data is general rather than feature specific for an interaction. The goal of adaption has been to make good use of available feature data and reduce the over-fitting in the model. Our adaption model combined the reliable general DBN with a volatile feature specific HMM for long delays. We further extend the MLE to a Bayesian learning where a Dirichlet conjugate prior is imposed on each of the parameters.

Given the set of conditional distributions with parameters _{i}:

The integral can be easily written in a closed form due to conjugacy between Dirichlet and multinomial distribution. However, we can alternatively maximize probability as (MAP):

Using the linear feature as a Dirichlet conjugate prior

where _{i}) is total probability of the skip-path,

Next, we can specify the interpolated probability of gene _{i }based on linear and skip-edges.

here, instead of using a constant,

Gibbs prior over graph

We can use a Gibbs Markov network (MN) to model the prior ^{E(S) }where energy of the graph _{ij }between genes _{i }and _{j}. If an interaction exists in the target network, we set _{ij }= _{1 }otherwise _{ij }= _{2}. The total energy of the graph over existing edges is _{{i, j} ∈ S }_{ij}. The posterior probability of the graph is then given by

A small _{1 }and a large _{2 }will reflect the prior target network more in the GRN and vice-versa.

Experiments and results

We evaluated our method on a DNA repair system of

The corresponding skip probabilities were calculated as described in methods. Upto seven time points of delays were allowed. Firstly, we used 9 genes previously specified

This gave us a second dataset of 32 genes. A GA was used to find the optimal structure. Only linear interactions determined by mutual information (MI) upto a time lag of four were allowed. The GA chooses the network with the best combination of skip and linear edges. Simulation was done at different numbers of individuals (N) and generations (G) (N = 200/300/400 and G = 300/400/500) for both HDBN and skip-chain model. The GA stops when the maximum number of generations is reached or if the score does not change for 20 consecutive generations. A similarity threshold of 0.7 in each generation prevents local maxima. The best prediction among all five runs was considered. Table

Time-delayed interactions in predicted network

**Higher-order edges**

**# Genes**

**Model: o**

**ML**

**1**

**2**

**3**

**4**

**5**

9

DBN:1

-14.7

9

HDBN:3

-8.69

8

2

7

SKIP-CHAIN:1

-6.05

13

(3)

32

DBN:1

-48.9

36

HDBN:4

-39.4

20

6

14

20

SKIP-CHAIN:2

-37.2

54

18

(41)

(4)

Time delayed interactions in predicted DBN, HDBN, and skip-chain:

We also looked at the use of Gibbs prior over the structures, Dirichlet prior over parameters and the combination of the two priors together (Table

Time-delayed interactions in predicted network using prior

**Higher-order edges**

**# Genes**

**Model: o**

**ML**

**1**

**2**

**3**

**4**

**5**

9

SKIP-CHAIN:1

-6.05

13

(3)

SKIP-CHAIN(Gibbs):1

-5.8

11

(2)

SKIP-CHAIN(Dirichlet):2

-5.2

7

13

(11)

(1)

SKIP-CHAIN(Gibbs and Dirichlet):3

-3.27

2

7

(4)

(5)

32

SKIP-CHAIN:2

-37.2

54

18

(41)

(4)

SKIP-CHAIN(Gibbs):3

-35.7

37

16

24

(40)

(3)

SKIP-CHAIN(Dirichlet):2

-35.05

54

16

(37)

(4)

SKIP-CHAIN(Gibbs and Dirichlet):2

-34.54

50

15

(41)

(4)

Time delayed interactions in predicted skip-chain without prior, with Dirichlet prior, with Gibbs prior and combination of both priors:

The earlier network of 16 interactions predicted using correlations is shown in Fig.

Target network and color code

**Target network and color code**. (a) Network determined by correlation and (b) color code.

Time-delayed interactions in predicted network of 9 genes

**Time-delayed interactions in predicted network of 9 genes**. Time-delayed interactions in predicted network of 9 genes (a) DBN network, (b) HDBN network, (c) Skip-chain network, (d) Skip-chain network with Gibbs prior, (e) Skip-chain network with Dirichlet prior, (f) Skip-chain network with Gibbs and Dirichlet prior.

Time-delayed interactions in predicted network of 32 genes

**Time-delayed interactions in predicted network of 32 genes**. Time-delayed interactions in predicted network of 32 genes (a) DBN network, (b) HDBN network, (c) Skip-chain network, (d) Skip-chain network with Gibbs prior, (e) Skip-chain network with Dirichlet prior, (f) Skip-chain network with Gibbs and Dirichlet prior.

The presence of hubs or single genes regulating several other genes are also seen in the network. These networks can buffer environmental variations. It can be seen that a small number of transcription factors (TF) regulate the rest of the repair system. At the same time the in-degree is low, as each gene is regulated by just one TF. RecA causes inactivation of lexA which suppresses DNA repair genes. We observe binding of recA(DNA repair) to dnaB(DNA replication) helicase. RecA also activates linB which causes dehalogenation needed for transformation events in dna repair. The Fadd genes initiate apoptosis and are also required for cell-wall formation.

The second dataset of 32 genes indicated that our method is good for identifying core genes (Fig.

Discussion and conclusion

An organism responds to changes in its environment by altering the level of expression of critical genes. The virulence of

To include time-delays, we used a skip-chain model. The Viterbi shortest path allowed us to choose between time delayed interactions of two genes of same and different time delays. This lets us identify the best interaction information from the dataset. By using a single parent Viterbi path to model the upregulated events, we were able to focus on special cases in the DBN. This significantly reduces the search space for the GA. Our search is however constrained by various parameters like MI and number of parents.

Skip-chain models address the difficulties of a DBN by easily incorporating overlapping input features. We also see that using approximate inference leads to lower total training time without loss in accuracy. The skip-chain BN is not an HDBN because usually different long-distance dependencies are used by skipping the intermediate time points. We proposed a method that can extract long distant regulations and demonstrated it on DNA repair of tuberculosis. Our approach may be useful for understanding complex gene regulation mechanisms.

Lastly, using priors gave us higher likelihood and improved the over-fitting in building the regulatory networks. The Dirichlet prior gave fewer hubs as compared to the Gibbs prior and gave a higher likelihood. The combination of the two priors gave us the best regulatory networks. We can see that the prediction with prior allows higher-orders of linear model aswell.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

I. Chaturvedi implemented the algorithm and wrote the initial draft. J. C. Rajapakse guided the project, and reformed later drafts of the manuscript. All authors read and approved the final manuscript.

Note

Other papers from the meeting have been published as part of

Acknowledgements

The authors wish to acknowledge the partial support from Interdisciplinary Research Group on Infectious Diseases at the Singapore-MIT Alliance Research and Technology (SMART) Center.

This article has been published as part of