Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Selected articles from the Tenth Asia Pacific Bioinformatics Conference (APBC 2012)

Open Access Proceedings

Identifying regulational alterations in gene regulatory networks by state space representation of vector autoregressive models and variational annealing

Kaname Kojima1, Seiya Imoto1*, Rui Yamaguchi1, André Fujita2, Mai Yamauchi1, Noriko Gotoh1 and Satoru Miyano1

Author affiliations

1 Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan

2 Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010, São Paulo 05508-090, Brazil

For all author emails, please log on.

Citation and License

BMC Genomics 2012, 13(Suppl 1):S6  doi:10.1186/1471-2164-13-S1-S6


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/13/S1/S6


Published:17 January 2012

© 2012 Kojima et al.; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded.

Methods

We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood.

Results

For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib.

Conclusions

From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.

Background

Gene network estimation from time series gene expression data is a key task for elucidating cellular systems. Thus far, wide variety of approaches have been proposed based on the vector autoregressive (VAR) model [1,3], the state space model [4-6], and the dynamic Bayesian network [7,8]. Recently, time series gene expression data on multiple conditions aiming at analyzing effects of cell treatment such as drug dosing and heat shock are available. We here assume that some gene regulations are disrupted but many of the gene regulations do not change due to some treatment of interest, and try to find a small number of changes on regulations as keys for elucidating effects by the treatment.

A possible way for finding changes on regulations is to estimate networks from two data sets separately and then compare their structures. However, due to the limited length of time series data (usually less than 10 time points) and unignorable measurement noise, networks are estimated with high error rates and the estimation errors cause the serious failure on identifying changes on regulations. Thus, approaches using two time series data in an efficient manner are strongly demanded. Also, widely used statistical methods such as the VAR model and dynamic Bayesian network assume equally spaced time points in time series data. However, observed time points on usually available time series data are not equally spaced [5,6,9], and approaches that can handle unequally spaced time series data in a theoretically correct way should be considered.

We propose a new statistical model that estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. As the basis of the proposed model, we employ the state space representation for VAR model (VAR-SSM), in which observation noise is considered between the measured or observed gene expressions and the true gene expressions in observation model and gene regulations between true gene expressions are considered in the system model [10]. The VAR-SSM can handle unequally spaced time series data by ignoring observation model on the non-observed time points. For considering the changes on regulations, we introduce hidden variables to the VAR-SSM in order to indicate the presence of regulations in each condition. If hidden binary variables on two conditions indicating the presence of a regulation are both estimated as one, the regulation is considered as a commonly existing regulation. On the other hand, if only one of the hidden binary variables for the regulation is estimated as one, the regulation is considered as a condition specific regulation. We also introduce a potential function between the hidden binary variables that is designed to take high probability if the hidden binary variables on two conditions take the same value. From the design of the potential function, the similarity of networks on two conditions is automatically considered. Since the time series data on both conditions are used for estimating commonly existing regulation due to the use of the hidden binary variables, an efficient data assignment is achieved. In addition, from the more accurate estimation of commonly existing regulations by the efficient data assignment, accurate identification of changes on regulations is induced.

The hidden binary variables are estimated by searching the configuration of binary variables that maximizes the marginal likelihood of the model. However, searching the optimal configuration is computationally intractable. Thus, as an alternative approach, we derive a new variational annealing method based on [11] in order to estimate the hidden binary variables. We also give a proof for the effectiveness of the variational annealing compared to other candidate alternatives, the variational annealing and the EM algorithm, in order to show the validity of using the variational annealing.

For the performance evaluation, we generate two regulatory networks in such a way that most of the regulations commonly exist and some exist only on one of the networks. We then apply our proposed approach and existing var model based and dynamic Bayesian network based approaches to two equally spaced time series data drawn separately from the generated networks. From the comparisons of true positive rates and false positive rates of these approaches, we confirm the effectiveness of our approach. We also generate unequally spaced time series data from these networks, and show that our approach works correctly on unequally spaced time series data while the performance of the existing approaches assuming equally spaced time points is drastically worsened.

Our proposed approach is used to analyze changes on regulations in gene networks between normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. A lung cancer condition is simulated by the stimulation of EGF-receptors in the treated cells. Since Gefitinib is known as a selective inhibitor of EGF-receptors, the stimulation of EGF-receptors would be counteracted by Gefitinib, and hence the treated cells are expected to be the same condition as normal cells. However, gene expression profiles from normal and treated cells are actually different, and off-targets of Gefitinib causing unexpected positive or negative effects are implied. We focus on genes with changes on regulations between the networks estimated by our approach and find possible off-target genes of Gefitinib. According to the published clinical information, one of the possible off-target genes is suggested as one of factors of interstitial pneumonia, which is known as a side effect of Gefitinib.

Methods

Vector autoregressive model and its state space representation

Vector autoregressive model

Given gene expression profile vectors of p genes during T time points {y1, ..., yT}, the first order vector autoregressive (VAR(1)) model at time point t is given by

y t = A y t - 1 + ε 1 ,

where A is a p × p autoregressive coefficient matrix, and εt is observation noise at time t and follows N ( 0 , diag [ σ 1 2 , . . . , σ p 2 ] ) , a normal distribution with mean 0 and variance diag [ σ 1 2 , . . . , σ p 2 ] . The (i, j)th element of A, Aij, indicates a temporal regulation from the jth gene to the ith gene, and if Aij ≠ 0, regulation from the jth gene to the ith gene is considered. By examining whether Aij is zero or not for all i and j, a gene network is constructed. Since equally space time points are assumed in the VAR model, it has difficulty on handling unequally spaced time series data.

State space representation of VAR model (VAR-SSM)

Let T be the set of equally spaced entire T time points and T o b s the set of time points where gene expressions are observed. Note that T o b s T holds. VAR-SSM is comprised of two models: system model and observation model. Let xt be hidden variable vector representing true gene expression at time t. The system model is given as the VAR model of xt:

x t = A x t - 1 + η t , t T ,

where ηt is the system noise normally distributed with mean 0 and variance H = diag[h1,...,hp]. The observation model represents measurement error of observed gene expression yt and true gene expression xt at observed time point t T o b s :

y t = x t + ρ t , t T o b s ,

where ρt is the observation noise normally distributed with mean 0 and variance R = diag[r1,..., rp]. Unequally spaced time series data are handled by ignoring observation model at non-observed time points.

Joint model of VAR-SSM for two time series data

Let { y t ( c ) } t T o b s ( c ) be time series gene expression data on cell condition c, where T o b s ( c ) is the set of observed time points on cell condition c. We also let T ( c ) be the set of time points from 1 to T(c), where T ( c ) = max { t T o b s ( c ) } . Given time series data on two types of cell conditions c = 1 and 2, we propose a new VAR-SSM model to estimate gene networks in the two conditions as well as identify changes on regulations between them. The model is comprised of the following two equations:

x t ( c ) = A E ( c ) x t - 1 ( c ) + η t ( c ) , t T ( c ) , y t ( c ) = x t ( c ) + ρ t ( c ) , t T o b s ( c ) ,

where ∘ denotes the Hadamard product, E(c) is a p × p binary matrix, and η t ( c ) and ρ t ( c ) are respectively system and observation noises from N ( 0 , H ) and N ( 0 , R ) . In this model, the (i, j)th element E i j ( c ) takes one if regulation from gene j to gene i exists on condition c and zero otherwise, i.e., the presence of regulations is controlled by E(c) and the AR coefficient matrix A is commonly used in conditions 1 and 2. Changes on regulations are identified when regulations exist only in a condition.

The complete likelihood of our model, P(Y, X, Θ, E), where Y, X, Θ, and E are respectively the sets of y t ( c ) , x t ( c ) , parameters, and E(c), is given by the following equation:

P ( Y , X , Θ , E ) = c = 1 2 t T ( c ) | H | - 1 2 2 π p exp - 1 2 ( x t ( c ) - A E ( c ) x t - 1 ( c ) ) H - 1 ( x t ( c ) - A E ( c ) x t - 1 ( c ) ) (1) × t T o b s ( c ) | R | - 1 2 2 π p exp - 1 2 ( y t ( c ) - x t ( c ) ) R - 1 ( y t ( c ) - x t ( c ) ) P ( Θ , E ) , (2) (3) 

where the prior distribution P(Θ, E) is assumed to be factorized as

P ( Θ , E ) = i P ( h i ) P ( r i ) j P ( A i j ) P ( E i j ( 1 ) , E i j ( 2 ) , z i j ) .

Here, zij is a parameter for a potential function of E i j ( 1 ) and E i j ( 2 ) defined later. The prior distributions of Aij is given by

P ( A i j | h i , E i j ( 1 ) , E i j ( 2 ) ) = N ( A i j ; 0 , h i α 1 ) F i j N ( A i j ; 0 , h i α 0 ) 1 - F i j ,

where α0 and α1 are parameters controlling the shrinkage of coefficients A and Fij is a binary variable that takes 1 if E i j ( 1 ) or E i j ( 2 ) takes 1 and 0 otherwise, i.e., Fij is given by 1 - c = 1 2 ( 1 - E i j c ) . α1 is set to a large value, while α0 is set to smaller than α1. From the design of the prior for Aij, if E i j ( 1 ) or E i j ( 2 ) takes one, i.e., there exists regulation from gene j to i in condition 1 or 2, weaker prior N ( A i j ; 0 , h i α 1 ) is selected and the shrinkage of the coefficients are avoided. Otherwise stronger prior N ( A i j ; 0 , h i α 0 ) is selected and the sparsity of the network structures is promoted. The prior distributions of hi , and ri are given by

P ( h i ) = I G ( h i ; u 0 , k 0 ) , P ( r i ) = I G ( r i ; v 0 , l 0 ) ,

where I G represents the density function of inverse gamma distribution, u0 and v0 are the shape parameters, and k0 and l0 are the inverse scaling parameters. Under the assumption that a small number of regulations change between two conditions, we design the prior distribution for E i j ( 1 ) and E i j ( 2 ) , P ( E i j ( 1 ) , E i j ( 2 ) , z i j ) by using the following potential function between E i j ( c ) for c = 1 or 2:

1 2 ϕ ( E i j ( 1 ) , E i j ( 2 ) ; z i j ) = z i j if E i j ( 1 ) E i j ( 2 ) 1 - z i j otherwise .

In this setting, if zij is small, E i j ( 1 ) and E i j ( 2 ) tend to take the same value and thus most of the regulations exist in both two conditions. We also introduce a prior distribution for zij by beta distribution with parameters ζi0 and ζi1:

B ( z i j ; ζ i 0 , ζ i 1 ) = 1 B ( ζ i 0 , ζ i 1 ) z i j ζ i 0 - 1 ( 1 - z i j ) ζ i 1 - 1 ,

where B(·) is the beta function. Thus, the prior distribution of E i j ( c ) is given by

P ( E i j ( 1 ) , E i j ( 2 ) , z i j ) = ϕ ( E i j ( 1 ) , E i j ( 2 ) ) B ( z i j ; ζ i 0 , ζ i 1 ) .

Figure 1 shows a graphical representation of the proposed model, where dependency of the parameters and variables are indicated. The hyperparameters are omitted and the observed data yt is represented by gray nodes. In the observed data y t ( 1 ) and y t ( 2 ) are propagated mainly via hidden variables E i j ( 1 ) and E i j ( 2 ) . Due to the data propagation, more accurate estimation is expected in the proposed model than the approaches considering data on two conditions independently.

thumbnailFigure 1. A graphical representation of the proposed model. Hyperparameters are omitted from this representation. The nodes in gray denote observed data.

For the parameter estimation, we search the configuration of E maximizing the following marginal likelihood:

Ê = arg max E d X d Θ P ( Y , X , Θ , E ) . (1)

Finding the optimal configuration of E is computationally intractable, and heuristics approaches such as the EM algorithm and the variational method are used in practice. Here, we use the variational annealing, an extension of the deterministic annealing for discrete variables [11]. In the next section, we give a small explanation of the variational annealing and show its effectiveness compared to the EM algorithm and the variational method.

Parameter estimation by variational annealing

In the deterministic annealing, optimization problem is solved while gradually changing temperature in a some schedule, and maximum likelihood estimator is obtained like the EM algorithm [12-14]. Yoshida and West proposed to use the deterministic annealing to find the configuration of the binary variables that maximizes the likelihood of factor models with sparseness priors [11]. We derive a new variational annealing method by extending Yoshida and West's approach to find the configuration of the binary variables on marginal likelihood function, which can be applied for searching E that maximizes Equation (1).

Let E, X, and Θ be p dimensional binary variables, unobserved variables, and parameters, respectively, and consider to search E maximizing the following marginal likelihood:

max E { 0 , 1 } p log d X d Θ P ( X , Θ , E ) . (2)

The maximum of the marginal likelihood on E is bounded by the following formula:

max E { 0 , 1 } p log d X d Θ P ( X , Θ , E ) τ log ( 0 , 1 ) p d E d X d Θ P ( X , Θ , E ) 1 τ , (3)

where τ is called temperature and the equality holds for τ → +0. Hereafter, the integral range of E is omitted if no confusion occurs. Let Q(E) be a normalized non-negative function, i.e., Q(E) ≥ 0 and ∫ Q(E)dE = 1.

From the Gibbs inequality, the right side of Equation (3) is also bounded:

τ log d E d X d Θ P ( X , Θ , E ) 1 τ d E Q ( E ) log d X d Θ P ( x , Θ , E ) Q ( E ) τ .

Here, Q(E) is considered as an approximation function of P ( E ) 1 τ P ( E ) 1 τ d E . Under the assumption that P(X,Θ,E) ∝ Q(X)Q(Θ), where Q(X) and Q(Θ) are normalized non-negative functions, we have the following inequality

d E Q ( E ) log d X d Θ P ( X , Θ , E ) Q ( E ) τ d E d X d Θ Q ( X ) Q ( Θ ) Q ( E ) log P ( X , Θ , E ) Q ( X ) Q ( Θ ) Q ( E ) τ . (4)

Thus, as an approximation of E maximizing Equation (2), we try to find Ê = arg max E Q ( E ) , where Q(E) is the function maximizing the lower bound in Equation (4). Since higher values on P(E) are weighed more in the approximation by Q(E) for τ < 1, the better approximation is expected for the higher values. This property on the limited case is shown in Proposition 1. As is in the variational method and the EM algorithm, the maximum of the lower bound in Equation (4) is searched by a hill climbing from high temperature τ > 1.

In the hill climbing, Q(X), Q(Θ), and Q(E) are alternately updated from the following equations:

Q ( E ) exp 1 τ d X d Θ Q ( X ) Q ( Θ ) log P ( E | X , Θ ) , Q ( X ) exp d E d Θ Q ( E ) Q ( Θ ) log P ( X | Θ , E ) , Q ( Θ ) exp d E d X Q ( E ) Q ( X ) log P ( Θ | X , E ) .

Gradually converging τ to 0, local optimum of the lower bound and corresponding Q(E), Q(X), and Q(Θ) are obtained.

Effectiveness of variational annealing

As alternatives of the variational annealing, we may consider the variational method and the EM algorithm where X is the set of hidden variables and E is handled as the set of parameters to be maximized. We show the effectiveness of variational annealing compared to the variational method and the EM algorithm under the following conditions: P(X, Θ, E) is factorized into P(X, E)P(Θ, E), and P(Ei|X, Θ, E\{Ei}) is given as a binomial distribution, where Ei is the ith element of E. If factorization of P(X, Θ, E) = Q(X)Q(Θ)Q(E) is assumed in the calculation of the variational method, arg maxE Q(E) is not the optimal solution of Equation (2) in general. In the EM algorithm, by allowing E to move around a p-dimensional continuous space ( 0 , 1 ) p , Ê EM = arg max E ( 0 , 1 ) m P ( E ) can be calculated. Let Ê EM , i be the ith element of Ê EM . Usually, Ê EM , i is mapped to 1 if Ê EM, i > 0 . 5 and 0 otherwise for discretizing Ê EM to the p-dimensional binary space {0,1}p, but such a mapping is not guaranteed to provide arg max E { 0 , 1 } m P ( E ) . Although other mappings can be considered, to the best of our knowledge, no mapping is guaranteed to provide the optimal solution in polynomial time of p.

In the following, we prove a proposition in order to show that the variational annealing possibly give the optimal solution of Equation (2) even if the factorization of Q ( E ) = i = 1 p Q ( E i ) is additionally considered.

Proposition 1. P(X,Θ,E) is factorized into P(Θ,E)P(X,E), and P(Ei|X,Θ,E\{Ei}) is given as a binomial distribution. Let Q ^ ( E i ) be Q(Ei) maximizing the lower bound of the variational annealing for τ → +0 given by

d E 1 . . . d E p d X d Θ Q ( X ) Q ( Θ ) i Q ( E i ) log P ( X , Θ , E ) Q ( X ) Q ( Θ ) i Q ( E i ) τ . (5)

Then, the set of Ei ∈ {0,1} maximizing Ê EM is arg max E { 0 , 1 } p P ( E ) .

For the proof of the proposition, see Section 1 in Additional file 1. From Proposition 1, if the factorization P(X, Θ, E) = P(Θ, E)P(X, E) is satisfied and optimal Q functions are found, the variational annealing is guaranteed to provide the optimal solution of Equation (2) while the variational method and the EM algorithm are not. Although the factorization is not a generally satisfied property, the factorization is often assumed in approaches based on the variational method, and the assumption usually works as good approximations. Thus, the variational annealing is expected to provide the better performance than the variational method and the EM algorithm even if the factorization is not satisfied exactly.

Additional file 1. Proof of Proposition 1 and more details on the procedures of variational annealing. A proof of Proposition 1 and more details on the procedures of variational annealing on the proposed model are described.

Format: PDF Size: 96KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Procedures of variational annealing on proposed model

In the variational annealing on the proposed model, we calculate Q functions for hidden variables X, parameters Θ, and binary variables E iteratively while cooling temperature τ to zero gradually at each iteration cycle. In the following, we show the calculation procedures of Q(X), Q(Θ), and Q(E) on the proposed model as variational E-step, variational M-step, and variational A-step, respectively. More details of the procedures are given in Additional file 1. For the notational brevity, we denote the expectation of a value x with a probability distribution Q(y) as 〈xQ(y).

Variational E-step

Parameters of Q(X) are mean of xt, variance of xt, and cross time variance of xt-1 and xt. These parameters can be calculated via variational Kalman filter by using following terms expected with Q(Θ)Q(E): 〈E(c)Q(Θ)Q(E), 〈AQ(Θ)Q(E), 〈H-1 A E(c)Q(Θ)Q(E), and 〈(A E(c))'H-1A E(c)Q(Θ)Q(E). For the details of variational Kalman filter, see [4]. From the parameters of Q(X), expectations of the following terms with Q(X) required in other steps are calculated: x t ( c ) Q ( X ) , x t ( c ) x t ( c ) Q ( X ) , and x t + 1 ( c ) x t ( c ) Q ( X ) .

Variational M-step

Q(Θ) is factorized into ∏iQ(Ai|hi) Q(hi)Q(ri) ∏jQ(zij), where Ai is a vector given by (Ai1, ..., Aip)'. From the design of the proposed model, Q(Ai|hi), Q(hi), Q(ri), and Q(zij) are given in the following form:

Q ( A i | h i ) = N ( A i ; μ A i , h i T A i - 1 ) , Q ( h i ) = I G ( h i ; u i , k i ) , Q ( r i ) = I G ( r i ; v i , l i ) , Q ( z i j ) = B ( ζ i j , 0 ; ζ i j , 1 ) .

Parameters for the above functions are calculated by using x t ( c ) Q ( X ) , x t ( c ) x t ( c ) Q ( X ) , x t + 1 ( c ) x t ( c ) Q ( X ) , , and E i j ( c ) Q ( E ) .

Variational A-step

For the calculation of Q(E), we assume the factorization of Q(E) to c i j Q ( E i j ( c ) ) in order to make the computation tractable. Q ( E i j ( c ) ) follows a binomial distribution that takes one with probability e i j ( c ) and zero with probability 1 - e i j ( c ) , and thus the expectation of E i j ( c ) with Q(E) is given by e i j ( c ) . For the preparation, we calculate A Q ( Θ ) , H - 1 A Q ( Θ ) , A H - 1 A Q ( Θ ) , x t ( c ) Q ( X ) , x t ( c ) x t ( c ) Q ( X ) , and x t + 1 ( c ) x t ( c ) Q ( X ) . Q ( E i j ( c ) ) is then iteratively calculated by using these expected terms as well as E i k ( c ) Q ( E ) for k j. A few iterations are enough for the convergence.

Update and selection of hyperparameters

The proposed model contains u0, k0, v0, l0, ζi0, ζi 1, α0, and α1 as hyperparameters. α0 and α1 should be α0 <α1 as in the model setting, but this condition can be violated in the update step of the variational method. Thus, we select α0 and α1 by cross validation, and update other hyperparameters as in the variational method.

We first consider update of hyperparameters u0, k0, v0, l0, ζi0, and ζi1 to increase the lower bound of marginal probability. u0 and k0 are updated by maximizing the following equation:

( û 0 , k ^ 0 ) = arg max ( u 0 , k 0 ) i Q ( h i ) log I G ( h i ; u 0 , k 0 ) d h i = arg max ( u 0 , k 0 ) ( u 0 - 1 ) i log h i Q ( h i ) p + u 0 log k 0 - k 0 i 1 h i Q ( h i ) p - log Γ ( u 0 ) .

û 0 and k ^ 0 are obtained by numerical optimization methods such as the Newton-Raphson method. v0 and l0 are also updated in a similar manner to u0 and k0. ζi0 and ζi1 are updated by solving the following equation:

( ζ ^ i 0 , ζ ^ i 1 ) = arg max ζ i 0 , ζ i 1 j Q ( z i j ) log B ( z i j ; ζ i 0 , ζ i 1 ) d z i j = arg max ζ i 0 , ζ i 1 ( ζ i 1 - 1 ) j log ( 1 - z i j ) Q ( z i j ) p + ( ζ i 0 - 1 ) j log z i j Q ( z i j ) p - log B ( ζ i 0 , ζ i 1 ) .

ζ ^ i 0 and ζ ^ i 1 can also be obtained by the Newton-Raphson method.

For the selection of α0 and α1, we set α1 to some large value and select α0 by a leave one out cross validation procedure. For condition c ∈ {1, 2} and time point t T o b s ( c ) , we remove y t ( c ) from data set y and use the data set to train the model. We calculate square sum of residues r t , c 2 between y t ( c ) and the prediction of x t ( c ) estimated from the variational Kalman filter on the trained model given by r t , c 2 = ( y t ( c ) - x t ( c ) ) ( y t ( c ) - x t ( c ) ) . By grid search on parameter space of α0, we select α0 that minimizes c = 1 2 t T o b s ( c ) r t , c 2 .

Summary of procedures

The procedures for estimating parameters in the proposed model are summarized as follows:

1. Set τ to some large value. Also set α0 to a small value and α1 to a large value satisfying that α0 <α1.

2. Initialize other hyperparameters and hidden variables.

3. Perform the following procedures:

(a) Calculate variational M-step.

(b) Update hyperparameters.

(c) Calculate variational E-step.

(d) Calculate variational A-step.

(e) Go back to step (a) until some convergence criterion is satisfied.

4. Divide τ by some value > 1 such as 1.05.

5. Go back to step 3 if τ is larger than some very small value > 0.

In our setting, α1 is set to 1,000. For the initialization of τ and other hyperparameters, we use the following settings: τ = 2 . 5 , E i j ( c ) = 0 . 5 , u i = 1 , k i = 1 , v i = 1 , l i = 1 , ζ i 0 = 10 ,  and  ζ i 1 = 10 . If y t ( c ) is observed, we initialize x t ( c ) with y t ( c ) . Otherwise, we use the linearly interpolated one.

Results and discussion

Performance evaluation by Monte Carlo experiments

For the evaluation of the proposed approach, we generate two linear regulatory network models with similar topological structures G1 and G2 based on a linear regulatory network model G0. G0 is prepared in the following manner: (i) a scale free network of 100 nodes and 150 edges is generated; (ii) edge directions are assigned randomly; (iii) autoloop edges are added to root nodes of the directed network; and (iv) AR coefficients for the directed edges are chosen randomly from {-0.9, -0.8, -0.7, -0.6, -0.5, 0.5, 0.6, 0.7, 0.8, 0.9}. We then generate G1 and G2 from G0 as follows: (i) autoloop edges and 70% of non-autoloop edges in G0 are used for commonly existing edges in G1 and G2; and (ii) the other 30% of non-autoloop edges are randomly assigned as either G1 or G2 specific edges. Note that AR coefficients on edges of G1 and G2 are preserved, i.e., if a regulation from gene j to gene i exists in G1 or G2, then its coefficient is the same as that of the regulation from gene j to gene i in G0.

Figure 2 gives graph structure of G1 and G2, where commonly regulations, G1 specific regulations, and G2 specific regulations are represented with black, red, and green arrows, respectively.

thumbnailFigure 2. The graph structures of G1 and G2. Commonly regulations, G1 specific regulations, and G2 specific regulations are represented with black, red, and green arrows, respectively.

From each of G1 and G2, we obtain two equally spaced time series data of 25 time points and 50 time points. For system noise and observation noise, normally distributed values with mean 0 and standard deviation 1 and mean 0 and standard deviation 0.1 and 1 are used, respectively. The signal-noise ratio for system noise with standard deviation 1 and observation noise with standard deviation 0.1 is 0.03dB, and the signal is bit stronger than the noise. On the other hand, for observation noise with standard deviation 1, the signal-noise ratio is -0.26dB. In the condition, the noise is stronger than the signal, and the noise level is quite high.

Comparison between variational annealing and EM algorithm

We first compare the performances of the proposed approach and the approach that is based on the proposed approach but uses the EM algorithm instead of the variation annealing using the equally spaced time series data of 50 and 25 time points on the system noise with standard deviation 1 and observation noise with standard deviation 0.1. From the comparison, we verify the effectiveness of the variational annealing, compared to the EM algorithm. Table 1 summarizes the results of the proposed approach and the EM algorithm based approach. For the EM algorithm, the regulation from j to i on condition c is considered to exist if the estimated E i j ( c ) is more than 0.5.

Table 1. Comparison of the variation annealing (Proposed) and EM algorithm (EM) based on the proposed model

From the comparison, the results of the proposed approach contain more true positives than those of the EM algorithm based approach except for identifying changes on regulations for time points 50. For identifying changes on regulations, the EM algorithm based approach estimates bit more true positives than the proposed approach, but the difference is so small that it can be ignored. On the other hand, the results of the EM algorithm based approach contain more false positives than those of the proposed approach, and hence the precision of the results by the EM algorithm is worse than that of the proposed approach. Therefore, the effectiveness of the variational annealing is confirmed in the computational experiment as well.

Comparison between proposed approach and existing approaches

We employ the elastic net based VAR model approach [2] and the dynamic Bayesian network based approach termed G1DBN [8] as existing approaches for estimating networks from time series data. For the experiments, two versions of these approaches are considered: ENet1 and ENet2 from the elastic net based VAR model approach, and G1DBN1 and G1DBN2 from G1DBN. These approaches are different in the following point: ENet1 and G1DBN1 estimate G1 and G2 independently, i.e., for the estimation of G1, only time series data from G1 is used, while ENet2 and G1DBN2 assume that G1 and G2 have the same network structure and estimate a network by using two time series data. Thus, ENet2 and G1DBN2 are considered to more use data sample than ENet1 and G1DBN1 for network estimation, but changes on regulations between G1 and G2 are not considered. For selection of hyperparameters in ENet1 and ENet2, AICc is used [2], and for hyperparameters α1 and α2 of G1DBN1 and G1DBN2, a setting of α1 = 0.1 and α2 = 0.0059 considered in [8] is used.

For the comparison of these approaches, we focus on the following two points: the number of correctly estimated regulations and the number of correctly estimated changes on regulations. The former is usually considered for evaluating the performance of gene network estimation methods. The numbers of true positives and false negatives of the estimated regulations are summarized in Table 2(a). The precisions of the results given by

Table 2. A summary of results for system noise with standard deviation 1 and observation noise with standard deviation 0.1

The number of true positives The number of true positives + The number of false positives

are also provided. The results are averaged on ten data sets. The number of regulations in the true network models of G1 and G2 are in total 305. For the latter point, we consider the estimated regulations existing only in one of two estimated networks as changed regulations, and check if they correctly exist only in the corresponding true network. The numbers of true positives and false negatives of the estimated changes on regulations and the precisions are summarized in Table 2(b). The results are also averaged on ten data sets. The number of true changes on regulations between in G1 and G2 are 47, i.e., the number of true positives on this case is at most 47.

For the estimation of the regulations in Table 2(a), the proposed approach outperforms other approaches in terms of true positives. The proposed approach contains more false positives than ENet1, G1DBN1, and G1DBN2 on the data of 25 time points. We further consider these cases in terms of the precision. The precisions of the proposed approach, ENet1, G1DBN1, and G1DBN2 are given by 0.77, 0.62, 0.68, and 0.74, respectively. From this analysis, the proposed approach shows better performance than ENet2, G1DBN1, and G1DBN2 on the whole. More true positives are estimated by ENet2 than ENet1, and the precisions of ENet2 tend to be better than those of ENet1. This type of relationship is also observed between G1DBN1 and G1DBN2. Also, the elastic net based VAR model approaches estimate more true positives than the approaches from G1DBN. However, in the precision, the approaches from G1DBN are better than the elastic net based VAR model approaches. From the results in Table 2(b), we see that the proposed approach estimates more true changes on regulations than ENet1 and G1DBN1 on both data of 25 and 50 time points. In addition, the results of the proposed approach contain less false positives than those of ENet1 and G1DBN1. No changes on regulations are detected in ENet2 and G1DBN2 as topological differences are ignored in these approaches. Since the proposed approach considers differences on network structures as well as uses two time series data efficiently, it can provide better results than other approaches on both estimating regulations and identifying changes on regulations.

One may think it is strange that false positives in ENet1 and G1DBN1 in Table 2(b) is more than those in Table 2(a), but this case can occur from the following reason. If a regulation exists in both of the true network models of G1 and G2, but is estimated only for G1, then the case is not counted as a false positive in Table 2(a) while it is counted as a false positive in Table 2(b). Thus, the number of false positives in Table 2(b) can be greater than those in Table 2(a).

In order to show the performance in unequally spaced time series data, we generate unequally spaced time series data of 25 and 50 observed time points. For time series data of 25 observed time points, we first generate equally spaced time series data of 40 time points and divide it into three blocks: 15 time points, 10 time points, and 15 time points. We then remove time points in the following manner: no time point is removed in the first block; one of every two time points are removed in the second block; and two of every three time points are removed in the third block. Figure 3 shows the time point schedule of time series data obtained in this process. For time series data of 50 observed time points, we first generate equally space time series data of 80 time points, divide it into three blocks: 30 time points, 20 time points, and 30 time points. Then, some time points are removed in a similar manner. We apply the proposed approach, ENet1, ENet2, G1DBN1, and G1DBN2 to the unequally spaced time series data. Results for the dataset are also summarized in Tables 2(a) and 2(b). From the comparison of results on equally and spaced time series data, the results of the proposed approach from unequally spaced time series data are worse than those from equally spaced one even with the same number of observed time points. However, results of ENet1, ENet2, G1DBN1, and G1DBN2 are worsened more than those of the proposed approach. This is probably because unequally spaced time points break their assumption, and their estimation process is misled.

thumbnailFigure 3. A time point schedule on unequally spaced time series data in the Monte Carlo experiment. Observed points in the time schedule are indicated by arrows. 15 time points are equally spaced in first block, every second point is observed in second block comprised of 5 observed time points, and every third point is observed in third block comprised of 5 observed time points.

We also consider the time series data with the high level noise: system noise with standard deviation 1 and observation noise with standard deviation 1. The results for the case are summarized in Tables 3(a) and 3(b). The proposed approach shows the better performance on the number of true positives and precisions than other approaches except for the identification of the changes on regulations from the equally spaced time series data of 50 time points. For equally spaced time series data of 50 time points, the number of true positives on the changes on regulations estimated by the proposed approach is more than that of G1DBN1, but less than that of ENet1. However, the precisions on the estimated changes by both ENet1 and G1DBN are much worse than the proposed approach. Thus, overall, the proposed approach is more effective than other methods. Although the proposed approach provides the better performance than other methods, the results of all the approaches are worsened due to the high level noise, and the differences on the performance among the approaches get smaller, compared to the case of observation noise with standard deviation 0.1.

Table 3. A summary of results for system noise with standard deviation 1 and observation noise with standard deviation 1

Analysis of time series microarray data from Human small airway epithelial cells

We apply the proposed approach to two time series microarray gene expression data from normal Human small airway epithelial cells (SAECs) and SAECs treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. EGF-receptors are often overexpressed in lung cancer cells such as tumoral SAECs, and a lung cancer condition is simulated in the treated SAECs by stimulating EGF-receptors. Since Gefitinib is known as a selective inhibiter of EGF-receptors, the stimulation of EGF-receptors would be counteracted by Gefitinib, and the condition of treated SACEs should be the same as that of normal SAECs in theory. However, since some gene expression patterns are different between the two conditions in practice, some unknown effects by Gefitinib may be involved in the phenomenon. Thus, we focus on changed regulations between the gene networks estimated from gene expression data in these two conditions in order to find some insights on the unknown effects of Gefitinib.

For gene set selection, we first screen 500 genes from the ranking of the gene list sorted by coefficient variation [5]. We then select 100 genes with highly varied expression profiles between normal SAECs and treated SAECs. The time series gene expression data from both types of cells are comprised of spaced 14 time points in 48 hours. The time schedule of 14 time points are {0 h, 6 h, 9 h, 12 h, 15 h, 18 h, 21 h, 24 h, 27 h, 30 h, 33 h, 36 h, 39 h, 48 h}. For the analysis on the proposed approach, we set interval on system model to three hours.

The estimated networks from time series gene expression data in normal SAECs and treated SAECs are summarized and given in Figure 4. Black arrows, red arrows, and green arrows indicate regulations in both conditions, only in normal SAECs, and only in treated SAECs, respectively. Table 4 gives a list of the estimated regulations only in normal SAECs or in treated SAEC.

thumbnailFigure 4. An estimated gene network by the proposed approach from time series gene expression data on normal SAECs and treated SAECs. In the estimated network, regulations in both conditions are in black, and regulations only in normal SAECs and treated SAECs are in red and green, respectively.

Table 4. Changes on regulations between normal and treated SAECs

From Table 4, we see that Prss22 is involved in most of the regulations only in treated SAECs. Prss22 is a tryptase, one of serine proteases, and its relationship with the airways is suggested by a report about its expression in the airways in a developmentally regulated manner. Tryptase is a potent mitogen of fibroblast [15], and it is reported that the increase and activation of fibroblast are promoted in lung cells under the condition of interstitial pneumonia. Interstitial pneumonia is known as a side effect of Gefitinib, and these findings suggest that Prss22 is an off-target of Gefitinib and is possibly related to the side effect of Gefitinib. Also, FILIP1L, a gene estimated as one of targets of Prss22 in treated SAECs, is reported as an inhibitor of cell proliferation of fibroblast [16]. This observation supports the relation between Prss22 and fibroblast as well.

We also focus on other several genes related to changes on regulations in normal and treated SAECs. LIF, leukemia inhibitory factor, is known to affect cell growth and development. Gefitinib is also known to be effective for acute myelogenous leukemia via Sky, which is an off-target gene of Gefitinib [17]. In addition, Wang et al. reported that LIF prolongs the cell cycle of stem cells on acute myelogenous leukemia lines [18]. Although, to the best of our knowledge, no direct influence from Gefitinib to LIF is reported, the above facts suggest some relation between Gefitinib and LIF, and support changes on regulations of LIF between normal and treated SAECs.

Heikema et al. reported that Human Siglecs, Siglecs-14, -15, and -16 interact with transmembrain adaptor proteins containing the immunoreceptor tyrosine-based activation motif such as DAP12 [19], and therefore they potentially mediate the activation of intracellular signaling. Gefitinib is a selective inhibitor of tyrosine kinase, and the inhibition of tyrosine kinase is considered to affect the regulations around Siglec-15.

Although the stimulation of EGF-receptors in the treated SAECs is considered to be counteracted by Gefitinib, the expressions of some genes may be affected by the stimulation in practical conditions. HAS3 is related to synthesis of the unbranched glycosaminoglycan hyaluronic acid and is reported to be up-regulated by EGF [20]. The stimulation of EGF-receptors affects the amount of EGF taken into cells, and hence the stimulation is expected to cause the changes on the regulations around HAS3. Foxn2 is a member of family of Fox proteins. Fox proteins are known to play important roles on controlling the expressions of genes related to cell growth, proliferation, and differentiation. Some members of Fox family are related to EFG-receptors, e.g., the expression of Foxn1 is suppressed by EGF-receptor signaling [21] although no direct relation between EGF-receptors and Foxn2 is found. FOXA2 is also a member of family of Fox proteins. EGF-receptor signaling is known to decrease the expression of FOXA, which prevents the mucus production [22]. [23] reported the relation between EGF-receptors and FOXA2 in the airways of asthmatic patients. Thus, it appears that the change on the regulation related to FOXA2 is caused by the stimulation of EGF-receptors.

Conclusions

We proposed the new computational model that is based on VAR-SSM and estimates gene networks from time series data on normal and treated conditions as well as identifies changes regulations by the treatment. Unlike many of existing gene network estimation approaches assuming equally spaced time points, our approach can handle unequally spaced time series data. The efficient use of time series data is achieved by representing the presence of regulations on each condition with hidden binary variables. Since finding the optimal configuration of the hidden binary variables on the proposed model is computationally in tractable, we derive the extended variational annealing method in order to address the problem as the alternative method.

In the Monte Carlo experiments, we use equally and spaced time series data from synthetically generated two regulatory networks whose structures are different in several regulations, and verified the effectiveness of the proposed model in both estimation of regulations and changes on regulations between the two conditions, compared to existing methods.

As the real data application, we use the proposed approach to analyze two time series data from normal SAECs and SAECs treated by stimulating EGF-receptors and dosing Gefitinib. From genes related to changes on regulations by the treatment, we find possible off-target genes of Gefitinib, and one of these genes is suggested to be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib. In this study, we consider changes on regulations in two conditions, but the proposed approach can be extended to identifying changes among more than two conditions.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

KK, SI, RY, and SM designed the approach to identify the changes on regulations by the cell treatment to SAECs. KK, SI, and AF contributed to the statistical modeling for the approach, and devised the details of methodologies for estimating the proposed model. MY and NG carried out the microarray experiment for measuring time series the gene expression data on normal and treated SAECs.

Acknowledgements

We thank the anonymous reviewers for their constructive comments and suggestions, which improved the quality of this publication.

This article has been published as part of BMC Genomics Volume 13 Supplement 1, 2012: Selected articles from the Tenth Asia Pacific Bioinformatics Conference (APBC 2012). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/13?issue=S1.

References

  1. Opgen-Rhein R, Strimmer K: From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data.

    BMC Syst Biol 2007, 1:37. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  2. Shimamura T, Imoto S, Yamaguchi R, Fujita A, Nagasaki M, Miyano S: Recursive regularization for inferring gene networks from time-course gene expression profiles.

    BMC Syst Biol 2009, 3:41. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  3. Fujita A, Sato JR, Kojima K, Gomes LR, Nagasaki M, Sogayar MC, Miyano S: Identification of Granger causality between gene sets.

    Journal of Bioinformatics and Computational Biology 2010, 8:679-701. Publisher Full Text OpenURL

  4. Beal MJ: Variational algorithms for approximate Bayesian inference. PhD thesis. University College London, Gatsby Computational Neuroscience Unit; 2003. OpenURL

  5. Yamaguchi R, Imoto S, Yamauchi M, Nagasaki M, Shimamura T, Hatanaka Y, Ueno K, Higuchi T, Gotoh N, Miyano S: Predicting differences in gene regulatory systems by state space models.

    Genome Inform 2008, 21:101-113. PubMed Abstract OpenURL

  6. Shiraishi Y, Kimura S, Okada M: Inferring cluster-based networks from differently stimulated multiple time-course gene expression data.

    Bioinformatics 2010, 26(8):1073-1081. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  7. Perrin BE, Ralaivola L, Mazurie A, Bottani S, Mallet J, d'Alché Buc F: Gene networks inference using dynamic Bayesian networks.

    Bioinformatics 2003, 19(Suppl 2):ii138-ii148. PubMed Abstract | Publisher Full Text OpenURL

  8. Lèbre S: Inferring dynamic genetic networks with low order independencies.

    Stat Appl Genet Mol Biol 2009, 8:Article 9. PubMed Abstract | Publisher Full Text OpenURL

  9. Bar-Joseph Z, Gerber G, Simon I, Gifford DK, Jaakkola TS: Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes.

    Proc Natl Acad Sci U S A 2003, 100(18):10146-10151. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  10. Kojima K, Yamaguchi R, Imoto S, Yamauchi M, Nagasaki M, Shimamura T, Ueno K, Higuchi T, Gotoh N, Miyano S: A state space representation of VAR models with sparse learning for dynamic gene networks.

    Genome Inform 2010, 22:56-68. PubMed Abstract OpenURL

  11. Yoshida R, West M: Bayesian learning in sparse graphical factor models via variational mean-field annealing.

    J Mach Learn Res 2010, 11:1771-1798. PubMed Abstract | PubMed Central Full Text OpenURL

  12. Rose K: Deterministic annealing for clustering, compression, classication, regression, and related optimization problems.

    Proceedings of the IEEE 1998, 86(11):2210-2239. Publisher Full Text OpenURL

  13. Ueda N: Deterministic annealing EM algorithm.

    Neural Networks 1998, 11(2):271-282. PubMed Abstract | Publisher Full Text OpenURL

  14. Itaya Y, Zen H, Nankaku Y, Miyajima C, Tokuda K, Kitamura T: Deterministic annealing EM algorithm in parameter estimation for acoustic model.

    The 8th International Conference on Spoken Language Processing 2004, 433-436. OpenURL

  15. Ruoss SJ, Hartmann T, Caughey GH: Mast cell tryptase is a mitogen for cultured fibroblasts.

    J Clin Invest 1991, 88(2):493-499. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  16. Kwon M, Hanna E, Lorang D, Quick JS, He M, Adem A, Stevenson C, Chung J, Hewitt SM, Zudaire E, Esposito D, Cuttitta F, Libutti SK: Functional characterization of filamin A interacting protein 1-like, a novel candidate for antivascular cancer therapy.

    Cancer Res 2008, 68(18):7332-7341. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Downing JR: Can treating the SYK cell cure leukemia?

    Cancer Cell 2009, 16(4):270-271. PubMed Abstract | Publisher Full Text OpenURL

  18. Wang C, Lishner M, Minden MD, McCulloch EA: The effects of leukemia inhibitory factor (LIF) on the blast stem cells of acute myeloblastic leukemia.

    Leukemia 1990, 4(8):548-552. PubMed Abstract OpenURL

  19. Heikema AP, Bergman MP, Crocker PR, Gilbert M, Samsom JN, van Wamel WJ, Endtz HP, van Belkum A: Characterization of the specific interaction between sialoadhesin and sialylated Campylobacter jejuni lipooligosaccharides.

    Infect Immun 2010, 78(7):3237-3246. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  20. Pasonen-Seppänen S, Karvinen S, Törrönen K, Hyttinen JM, Jokela T, Lammi MJ, Tammi MI, Tammi R: EGF upregulates, whereas TGF-beta downregulates, the hyaluronan synthases Has2 and Has3 in organotypic keratinocyte cultures: correlations with epidermal proliferation and differentiation.

    J Invest Dermatol 2003, 120(6):1038-1044. PubMed Abstract | Publisher Full Text OpenURL

  21. Mandinova A, Kolev V, Neel V, Hu B, Stonely W, Lieb J, Wu X, Colli C, Han R, Pazin MJ, Ostano P, Dummer R, Brissette JL, Dotto GP: A positive FGFR3/FOXN1 feedback loop underlies benign skin keratosis versus squamous cell carcinoma formation in humans.

    J Clin Invest 2009, 119(10):3127-3137. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  22. Zhen G, Park SW, Nguyenvu LT, Rodriguez MW, Barbeau R, Paquet AC, Erle DJ: IL-13 and epidermal growth factor receptor have critical but distinct roles in epithelial cell mucin production.

    Am J Respir Cell Mol Biol 2007, 36(2):244-253. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  23. Lai HY, Rogers DF: Mucus hypersecretion in asthma: intracellular signaling pathways as targets for pharmacotherapy.

    Curr Opin Allergy Clin Immunol 2010, 10:67-76. PubMed Abstract | Publisher Full Text OpenURL