Email updates

Keep up to date with the latest news and content from BMC Systems Biology and BioMed Central.

This article is part of the supplement: The 2010 International Conference on Bioinformatics and Computational Biology (BIOCOMP 2010): Systems Biology

Open Access Highly Accessed Research article

Bayesian parameter estimation for nonlinear modelling of biological pathways

Omid Ghasemi1, Merry L Lindsey2, Tianyi Yang1, Nguyen Nguyen1, Yufei Huang1* and Yu-Fang Jin1*

Author Affiliations

1 Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, USA

2 Barshop Institute of Longevity and Aging Studies and the Division of Geriatrics, Gerontology and Palliative Medicine, Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA

For all author emails, please log on.

BMC Systems Biology 2011, 5(Suppl 3):S9  doi:10.1186/1752-0509-5-S3-S9

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1752-0509/5/S3/S9


Published:23 December 2011

© 2011 Ghasemi et al.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

The availability of temporal measurements on biological experiments has significantly promoted research areas in systems biology. To gain insight into the interaction and regulation of biological systems, mathematical frameworks such as ordinary differential equations have been widely applied to model biological pathways and interpret the temporal data. Hill equations are the preferred formats to represent the reaction rate in differential equation frameworks, due to their simple structures and their capabilities for easy fitting to saturated experimental measurements. However, Hill equations are highly nonlinearly parameterized functions, and parameters in these functions cannot be measured easily. Additionally, because of its high nonlinearity, adaptive parameter estimation algorithms developed for linear parameterized differential equations cannot be applied. Therefore, parameter estimation in nonlinearly parameterized differential equation models for biological pathways is both challenging and rewarding. In this study, we propose a Bayesian parameter estimation algorithm to estimate parameters in nonlinear mathematical models for biological pathways using time series data.

Results

We used the Runge-Kutta method to transform differential equations to difference equations assuming a known structure of the differential equations. This transformation allowed us to generate predictions dependent on previous states and to apply a Bayesian approach, namely, the Markov chain Monte Carlo (MCMC) method. We applied this approach to the biological pathways involved in the left ventricle (LV) response to myocardial infarction (MI) and verified our algorithm by estimating two parameters in a Hill equation embedded in the nonlinear model. We further evaluated our estimation performance with different parameter settings and signal to noise ratios. Our results demonstrated the effectiveness of the algorithm for both linearly and nonlinearly parameterized dynamic systems.

Conclusions

Our proposed Bayesian algorithm successfully estimated parameters in nonlinear mathematical models for biological pathways. This method can be further extended to high order systems and thus provides a useful tool to analyze biological dynamics and extract information using temporal data.

Background

In the past decade, there has been a rapid development in systems biology approaches driven by high-throughput data characterizing regulations of genetic networks, interactions among proteins, and reactions in metabolic pathways. These data usually provide a specific scenario of a biological system which may be compared with an alternative system, for instance, expression levels of biomarkers associated with a disease pattern versus healthy controls. Extending the snapshot-type data to condensed data in a time sequence, which is more suitable for profiling the temporal dynamics, provides insights into the functions and underlying regulating mechanisms of the biological system. To gain these insights, mathematical representations of biological systems established with temporal data are highly desirable.

Establishment of proper mathematical representations requires identification of a suitable model with an adequate framework and structure to determine the parameters in the framework. For the structure identification of a model, extensive research has been carried out and mathematical models have been developed to represent the instantaneous rate of a process as an explicit function of all the state variables x Rn that have a direct influence on the process. In these representations, the rate of change in each variable xi, is determined by the difference between influx and efflux of the variable (equation 1) and each flux vi is approximated by a product of power-law functions as shown in equation 2:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M1">View MathML</a>

(1)

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M2">View MathML</a>

(2)

where γi represents the rate constant and ρij represents the kinetic order of vi with respect to xj [1-3]. These approximations have been widely applied to modeling and analysis of biochemical systems for allowing computational simulation of dynamics and parameter estimation for unknown γi and ρij. However, the representations have a low range of accuracy when saturation and cooperativity are represented. To address the reaction rate of enzyme-catalyzed reactions with cooperativity and saturation, a preferred mathematical function is the Hill equation which is described as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M3">View MathML</a>

(3)

where V denotes the maximal rate, ui denotes the reaction factor, K represents the saturation constant, and H denotes the Hill coefficient. The Hill coefficient H corresponds to the number of binding sites in the molecule that catalyzes the process [4]. Though there is some disagreement regarding how accurate it is to determine the Hill coefficient H by the number of binding sites [5], H is generally assumed to be a known constant that can be estimated from experimental data. However, it remains problematic to determine the parameters of V and K.

For linearly parameterized systems, the least squares method generally gives the optimal estimate of parameters. In addition to the least square approach, an adaptive estimation algorithm serves as a powerful tool to estimate the unknown parameters in ordinary differential equations (ODE) [6-9][10]. For nonlinearly parameterized dynamics, Cao and colleagues have studied the conditions for parameter convergence if the nonlinearly parameterized function satisfies the Lipschitz condition [11]. Qu and colleagues have proposed an adaptive control algorithm for a nonlinearly parameterized system with specific input/control in lieu of requiring the Lipschitz condition with respect to parameters [12]. However, a Hill equation in an ODE does not satisfy Lipschitz condition with respect to the parameter K and, generally, it is difficult to apply a continuous control determined by estimated parameters and states of biological systems due to the lack of real time measurements. While estimating the parameters of the Hill equation in ODEs provides accurate prediction of the reactions, it is very difficult to incorporate the continuous evaluation of the states that is needed to better understand the regulation of the biological system. Additionally, it is even more challenging when there is sparse experimental data in discrete time sequences, as is often the case.

Bayesian approaches have been widely used for machine learning, adaptive filters, information theory, and pattern recognition [13-16][17][18]. Specifically, Markov chain Monte Carlo (MCMC) methods have demonstrated to be a powerful inference tool for complex systems raised in behavioral science and computational biology [19,20][21]. MCMC gains its popularity due to its easy implementation, ability to generate statistically samples from a target high dimensional distribution, good inference performance, and convenience for statistical analysis of results. Therefore, it is very promising to apply MCMC methods to estimate parameters in nonlinearly parameterized dynamics.

The aim of this study is to estimate the unknown parameters θ using a Bayesian approach in nonlinear ODEs representing a biological system as equation (4):

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M4">View MathML</a>

(4)

In this representation, x Rn denotes the system's state variables, for instance, the concentrations of biochemical factors, and x0 is the initial state, f(·) is a set of nonlinear functions describing the dynamical property of the biological systems, u(t) ∈ Rl is the systems input denoting concentrations of stimuli, and θ Rp are parameters that characterize dynamic reactions, y Rn represents the observed data subject to a Gaussian white noise ε(t) ~ N(0,σ2), g(·) represents a measurement function and atypical format will be an identical matrix. We assume we have discrete time series of y(t), and u(t) and all parameters in θ are positive.

We applied our Bayesian algorithm to estimate unknown parameters in the biological pathways involved in the left ventricle (LV) response to myocardial infarction, which involves inflammatory and fibrotic components typical of a wound healing response. Macrophages begin to infiltrate the LV at day 3 post-MI and are stimulated by interleukin-10 (IL-10) to release transforming growth factor β (TGF-β). In turn, TGF-β stimulates fibroblasts to secrete extracellular matrix components that are necessary for an adequate scar to form. Estimates of the parameters were close to their true value with considerably small estimatiom errors, particularly with regards to small noise variances.

Methods

The mathematical models represented as ODEs generally lead to continuous solutions, while real observed data are typically discrete in the time domain. To bridge the gap between our mathematical model and the real experimental data, and to predict future samples with available observational data, we first transformed the ODE presentation to difference equations.

Transformation of differential equations to difference equations

With known parameters θ, solutions of equation (4) can be approximated with the fourth-order Runge-Kutta method as follows:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M5">View MathML</a>

(5)

where ti, i = 0,1,,2,⋯ denotes different sampling time points and h denotes a constant interval between ti and ti+1. Thus, the next step of xi+1 is determined by present value xi and the weighted average of 4 incremental. Without loss of generality, the measurement function g(·) is an identical matrix, i.e. yi = xi if there is no measurement noise. The predicted output ys at ti+1 can be obtained with all available yi, ui, and replacement of θ with estimated parameters <a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M6">View MathML</a> as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M7">View MathML</a>

(6)

Estimations of parameter θ can be obtained by applying a Bayesian approach as follows.

Estimation of parameters using a Bayesian approach

The goal of estimating θ using a Bayesian method is to obtain the posterior distribution p(θ|y), which represents our knowledge about the unknown parameters based on the experimental data y, and it can be expressed as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M8">View MathML</a>

(7)

where p(θ) is the prior distribution representing our knowledge about the parameter θ prior to observing the experimental data y, p(y|θ) is the likelihood function denoting how likely it is to observe the experimental data set given an estimated θ. Based on the posterior distribution, the unknown parameters θ can be estimated by the minimum mean square error (MMSE) or the maximum a posteriori (MAP) criterions, which estimate θ by the mean or the mode of the posterior distribution p(y|θ), respectively.

However, since the function f(θ, xi, ui, ti) is highly nonlinear, the close form expression of p(y|θ) cannot be obtained analytically, hence neither the Bayesian estimates. We resort to the numerical solutions using MCMC and specifically, the Metropolis-Hasting (M-H) algorithm. The MH algorithm provides a scheme for generating random samples from the desired posterior distribution, even though its close form is not available. These random samples can be used with ease to approximate the posterior distribution and calculate various estimates of the unknowns.

The MH algorithm is an iterative algorithm and the steps of the proposed algorithm for model (5) at the (i+1)th iteration can be summarized as the following:

1) Given the parameter sample θi obtained in the ith iteration;

2) Draw θfrom the proposal distribution q(θ|θi) as a proposed sample;

3) Calculate the ratio:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M9">View MathML</a>

(8)

4) Draw a random sample U[0,1] and assign the (i+1)th sample as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M10">View MathML</a>

(9)

where λ = min{1,α}.

With the assumption that all parameters are positive, the proposal distribution to generate θis chosen as a Gamma distribution expressed as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M11">View MathML</a>

(10)

Accordingly, the proposal distributions q(θ|θi) and q(θi|θ) can be written as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M12">View MathML</a>

(11)

and:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M13">View MathML</a>

(12)

The second fraction in equation (8) becomes:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M14">View MathML</a>

(13)

In a real application, there are unavoidable statistical and model noise, which is modeled in this case by the i.i.d. Gaussian distribution with the unknown noise variance σ2. Therefore, in equation (8), p(y|θ) is the marginal likelihood by integrating the noise variance σ2 from the complete likelihood function p(y|θ,σ2), i.e.,

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M15">View MathML</a>

(14)

where p(σ2) is the prior distribution taken to be the conjugate Inverse Gamma (IG) distribution (IG(η2, β2)) as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M16">View MathML</a>

(15)

and p(y|θ, σ) is the product of a series of independent Gaussian distributions. Both of them have the form <a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M17">View MathML</a>, where y is the experimental data and ys can be computed using the classical Runge-Kutta method as shown in equation (6). In this relation, given the total number of the observations being m, the integration (14) can be expressed as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M18">View MathML</a>

(16)

With the definition of model error as <a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M19">View MathML</a>, we can have

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M20">View MathML</a>

(17)

Now, define new shape and scale factors of the IG function by <a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M21">View MathML</a> &<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M22">View MathML</a> and substitute them in equation (16), we obtain the following equation

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M23">View MathML</a>

(18)

Define <a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M24">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M25">View MathML</a> &σ2 = τ then the integral in equation can be rewritten as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M26">View MathML</a>

(19)

which is the integral of an IG-type function. Therefore, the expression of the marginal likelihood function can be expressed as:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M27">View MathML</a>

(20)

Substituting equations (13) and (20) into equation (8) results in:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M28">View MathML</a>

i.e.

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M29">View MathML</a>

(21)

The above proposed MH algorithm will be run for many iterations until convergence and the samples obtained after convergence are considered samples from the posterior distribution p(θ|y). The span of iterations until convergence is referred to as the burn-in period. Suppose that N converged samples are collected after the burn-in. Then the Bayesian MMSE estimate can be calculated as the mean of the N samples. The confidence of the estimation can be also evaluated with these samples.

Results

A first order ODE equation was employed in this study to estimate the unknown parameters in a nonlinear mathematical model. This ODE was originally established to describe temporal profiles of TGF-β post-MI. After MI, the major sources of TGF-β include activated macrophages and fibroblasts. IL-10 stimulates macrophages to secrete TGF-β and its stimulation effect can be approximated as a Hill equation. Since we are initially interested in the macrophage related function at the early stage and will incorporate the effect of fibroblasts at the later stage, we established the mathematical model as follows:

<a onClick="popup('http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1752-0509/5/S3/S9/mathml/M30">View MathML</a>

(22)

Where x denotes the concentration of TGF-β, Mϕ denotes the macrophage cell density, u(t) denotes the concentration of IL-10 post-MI, parameter a denotes the degradation rate of TGF-β, the maximum activation rate of macrophages by IL-10 and the secretion rate of macrophages for TGF-β are combined together and denoted as the maximum reaction rate as V, and the saturation rate for macrophage activation is denoted as K. The temporal profiles of IL-10 and macrophage infiltration post-MI are determined by the published experimental results in C57 mice [21,21,21,21]. In our computational simulation, parameter a = 15 is calculated by the half-life data [21], x0 = 0.21 is the concentration of TGF-β measured in healthy adult mice before MI. Both V and K are the unknown parameters to be estimated.

A stationary Markov chain was generated by following the proposed MH algorithm. Only samples after the burn-in are retained. Since no prior information about the parameters is available, a flat gamma distribution is chosen for the prior distribution of θ. The values of scale parameter for both linear and nonlinear parameter (V and K respectively) is 2 (η = 2); for the shape parameter (β), two different values were chosen for linear and nonlinear parameters. This is due to the different range of values that each of them is covering. So for the linear parameter as V ∈ [0.01, 10], a shape parameter of β = 4 is selected. On the other hand, when K ∈ [1 1000], a proper parameter factor would be β = 2000. The proposed distribution for p(θ|θi) and q(θi;|θ) follows the Gamma distribution defined by the same parameters as mentioned earlier (equations 11& 12). The variance of the additive noises follows an Inverse-Gamma distribution defined by parameters η2 = 2, and β2 = 10, which are chosen to model the non-informative prior knowledge about the variance. All simulations were run for 2000 iterations and the first 1500 were considered burn-in and removed. The 500 samples after the burn-in of each run were averaged as an MMSE estimate.

We have simulated three situations: 1) Estimate parameter V with known parameter K; this allows us to evaluate the performance of linear parameterized system using the Bayesian approach. 2) Estimate parameter K given a known V; this allows us to evaluate the performance of estimating a single parameter in nonlinearly parameterized dynamics. 3) Estimate both V and K using the proposed Bayesian approach. To mimic the real experimental data, we sampled our computed state at 500 time points. The temporal profiles of macrophage cell density, IL-10 concentrations, TGF-β concentrations, and sampled TGF-β (500 samples over 20 days) in the time sequence were shown in Figure 1.

thumbnailFigure 1. Temporal profiles of macrophage numbers, IL-10 concentrations, and TGF-β concentrations post myocardial infarction establish the nonlinear dynamic patterns of the MI response. A: Temporal profile of macrophage infiltration (cell numbers/mm2) post myocardial infarction reconstructed from experimental data in C57 mice [21]. B: Temporal profile of IL-10 concentrations (pg/ml) post myocardial infarction reconstructed from experimental data in C57 mice [19,20]. C: Temporal profile of TGF-β computed based on the nonlinear dynamics equation (24). D: Sampled TGF-β profile in part C to represent the sparse experimental data on TGF-β concentrations.

Estimate parameters in linear parameterized system

In a Hill-equation, the parameter V is linearly parameterized. In the first set of our simulations, we set K = 2 and estimate the parameter V with the temporal data. The nominal value of V is 5, and the estimated V using MCMC ranges from 4.9247 to 5.0045. The performance of the estimation algorithm can also be evaluated by examining the mean squared error (RMSE) of V with respect to the variance σ2 of the noises as shown in Figure 2. RMSE of V increases monotonically as variance σ2 increases but remains in a very small region. RMSE of V was calculated as 0.017 while the variance σ was increased to 1, suggesting that the estimate of V remains accurate when signal to noise ratio gets lower. This performance demonstrated that MCMC worked very well for linearly parameterized dynamics.

thumbnailFigure 2. Performance evaluation for linear parameter estimation. A: Root mean square error of parameter V given parameter K = 2 was plotted while variance of the noise increased from 0.01 to 100 with both MCMC and least square. The true value of V was set as 5. Root mean square error monotonically increased as variance increased. The root mean square error was calculated as 0.0331 and 0.0256 as variance of noise increased to 100 for MCMC and least square, respectively. This demonstrated that the linear parameter estimation performed within the recommended range. B: the boxplot for the estimation of V were shown for both least square and MCMC methods as the noise variances increased from 0 to 100. The number of outliers is significantly higher for MCMC comparing to least square.

At the same time, the performance of MCMC was compared with least square algorithm. As parameter V to be estimated is a linear parameter, this comparison will give us a good idea about the performance of MCMC. It is expected that the least square gives the best results in estimating the linear parameter with the presence of noise which is shown in Figure 2. There exist large differences between the least square and MCMC algorithms when the variance of noise (σ2) is small. As the variance of noise increases, the error difference decreases. However it should be mentioned that the outliers in MCMC estimation is significantly more than least square. Same as MCMC estimation, the nominal value of V is 5, and the estimated V using least square ranges from 4.9308 to 5.0561.

Estimate parameters in nonlinearly parameterized dynamics

To verify our algorithm for nonlinearly parameterized dynamics, we estimated parameter K assuming V available and parameters V and K when both are unknown.

When a nominal value of K is set as 5000, we ran 6 groups of simulations according to 6 different parameter settings for V (V = 0.01,0.1, 0.5, 1, 5 and 10). Output of each group was subject to white noises with different variances ranging from σ2 = 0 to σ2 = 10. We repeated such simulation at K = 1, 10, 50, 100, 500, 1000, 5000 & 10000, respectively, and showed our RMSE error of estimated values of V subject to different variances while V = 1 and V = 10 in Table 1. To illustrate the accuracy of the estimation, we also depicted the RMSE of estimated K with different setting of V and variances in Figure 3. Our results demonstrated that the estimated parameter error with the proposed algorithm decreases by increasing the values of V. It was also shown that by increasing the value of K, the parameter can be estimated with less error.

Table 1. Estimated values of parameter K subject to different noise variances.

thumbnailFigure 3. Performance evaluation for nonlinearly parameterized situation with known V. Root mean square error of parameter K (A: K = 5000, and B: K = 10000) was plotted with respect to different noise variances ranging from 0.01 to 10 and different values of V. Colors of the curves denote different parameter settings of parameter V. (Blue: V = 0.01, Red: V = 0.1, Green V = 0.5, Cyan: V = 1, Magenta: V = 5, and Black: V = 10).

In case that both V and K are unknown, we also ran twenty different settings of the parameters and verified the error of the estimation. We verified the cases while the true value of K was 5000 and 10000, and true value of V was 0.01, 0.1, 0.5, 1, 5, and 10, and while the true value of V was set as 1 and 10, and true value of K was set as 1, 10, 50, 100, 500, 1000, 5000 and 10000. Again, our algorithm generated estimates close to the nominal values and the RMSE of different parameter V when K = 5000 was shown in Figure 4(A), RMSE of different parameter V while K = 10000 were shown in Figure 4(B), RMSE of different parameter K while V = 1 was shown in Figure 4(C), and RMSE of different parameter K while V = 10 was shown in Figure 4(D), respectively. It can be seen from our results in Figure (4A and 4B) that the error of estimated parameter decreases when the values of V increased. It was also demonstrated that when K increases, a better estimation of this parameter can be achieved. It can be seen in Figure (4C and 4D) that when parameter K increases, K can be estimated with less error for any particular value of linear parameter V. There is not a huge difference between the RMSE of the estimated parameter when increasing V from 1 to 10. In general, RMSE of parameters increased as variances of the noises increased, and RMSE of parameter K was greater than RMSE of parameter V.

thumbnailFigure 4. Performance evaluation for nonlinearly parameterized situation with unknown V and K. Root mean square error of parameter K was plotted with respect to different noise variances ranging from 0.01 to 1 and different values of V in A and B. (A: true value of K = 5000, and B: True value of K = 10000). In subfigures A and B, colors of the curves denote different parameter settings of parameter V (Blue: V = 0.01, Red: V = 0.1, Green V = 0.5, Cyan: V = 1, Magenta: V = 5, Black: V = 10). Root mean square error of parameter V was plotted with respect to different noise variances ranging from 0.01 to 1 and different values of K in C and D (C: true value of V = 1, and B: True value of V = 10). In subfigures C and D, colors of the curves denote different parameter settings of parameter K (Blue: K = 1, Red: K = 10, Green K = 50, Cyan: K = 100, Magenta K = 500, Black K = 1000, Dark Green K = 5000, Brown K = 10000).

Discussion

This study is the first investigation to estimate unknown parameters in nonlinearly parameterized biological dynamics using MCMC algorithm. We have applied a Bayesian approach to estimate two unknown parameters in an ODE model describing the temporal profiles of TGF-β in the post-MI setting. Our computational results have demonstrated the effectiveness of the Bayesian approach for parameter estimation in a nonlinear model for biological pathways. As such, this study provides a valid estimation approach for nonlinear dynamics of biological pathways. The most important contributions of this study are listed as follows: 1) The new proposed method bridges the gap between the sparse observational data and the need for continuous signals embedded in mathematical models. Therefore, parameters estimated on the basis of experimental data have clear biological meaning in the mathematical models. 2) The introduction of additive noises and measurement functions reflect real scenarios in biological experiments, therefore, giving more confidence to the parameter estimation real world in applications. 3) A new MCMC algorithm is proposed to estimate parameters in general nonlinearly-parameterized dynamics. Our results demonstrated good performance in estimating two parameters of an ODE with a Hill equation. Together, this new method will have widespread applicability to many biological systems, not limited to investigations on cardiovascular disease.

In this study, our key task is defined as parameter estimation for a nonlinearly parameterized mathematical model for biological pathways. As there exist different representations of mathematical models such as linearized models and power law functions, it is possible to approximate nonlinearly parameterized dynamics by linearly parameterized dynamics [21], which would significantly reduce the difficulty for parameter estimation. However, it is worth noting that transformations to a linearized model and power law modelonly guarantee 1) the transformed models are identical to the original Hill representation at the operating point u0; and 2) the transformed model have the identical first-order derivatives at the operating point u0 as the original Hill representations. Therefore, both linear and power low approximations hold locally in a small vicinity of the operating point u0. When the variable, u, in Hill representation has large variations, in reality, these transformations may lead to huge modeling errors. Though these transformations will greatly reduce the complexity of parameter estimation, they cannot provide accurate estimation. This emphasizes the necessity of parameter estimation in nonlinearly parameterized dynamics directly and our proposed MCMC algorithm is a response to this need.

While we illustrated the effectiveness of the algorithm with a first order ODE model, the algorithm can be expanded to estimate more parameters with higher order ODE models for more complicated systems. In that case, convergence of the estimates and convergence speed of the algorithm should be further studied. Additionally, the measurement function we used in this study is an identical matrix, this identical matrix can be relaxed by an observable function where all states x can be reconstructed by the output y.

In this study, we proposed flat Gamma distributions as the proposal distributions in the MH algorithm. Although they lead to implementations with relatively slow convergence of Markov chains, the algorithm can still produce very robust estimation results. Selection of better proposal distributions that will lead to faster convergence, thus more efficient implementation of the algorithm will be a focus of our future study. In addition, we employed real experimental data in this study to estimate the effects of IL-10 on TGF-β concentrations in left ventricle post-MI and our measurement equation includes additive noises to simulate the real biological systems. However, we are well aware that the structure of the model is simplified and there exist modeling errors embedded in the structure of the mathematical model. These modeling errors will likely lead to estimation errors of the parameters. We can minimize the modeling error with the accumulation of more biological knowledge. Though it is beyond the scope of the current paper, further investigation on modeling structure using real in vivo experimental results has been planned for our future research.

Conclusions

In conclusion, we have proposed an algorithm which combines the transformation from ODEs to difference expressions and a Bayesian algorithm to estimate multiple parameters in a nonlinear mathematical model for biological systems using discrete observational experimental data. Estimates of the parameters were close to their true values with considerably small estimation errors, particularly with regard to small noise variances. This proposed estimation algorithm provides a powerful tool to analyze time series data and better understand the interactions among biological pathways.

Authors' contributions

YFJ, and YH designed the research; OG, TY and NN performed computational simulation. OG, TY, MLL, YH., and YFJ analyzed the results and wrote the manuscript. All authors have read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

The authors acknowledge grant and contract support from NHLBIHHSN268201000036C (N01-HV-00244), NIH R01 HL75360, Veteran's Administration Merit Award, and the Max and Minnie Tomerlin Voelker Fund (to MLL), NSF 0546345 and Qatar National Research Fund (09-874-3-235) to YH, and N.I.H 1R03EB009496, NIHSC2HL101430, NSF 0649172, and AT&T foundation (to YFJ).

This article has been published as part of BMC Systems Biology Volume 5 Supplement 3, 2011: BIOCOMP 2010 - The 2010 International Conference on Bioinformatics & Computational Biology: Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/1752-0509/5?issue=S3.

References

  1. Ignotz RA, Massagué J: Transforming growth factor-beta stimulates the expression of fibronectin and collagen and their incorporation into the extracellular matrix.

    J Biol Chem 1986, 261:4337-4345. PubMed Abstract | Publisher Full Text OpenURL

  2. Leroy EC: Increased collagen synthesis by scleroderma skin fibroblasts in vitro: a possible defect in the regulation or activation of the scleroderma fibroblast.

    J Clin Invest 1974, 54:880-889. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Wu CH, Donovan CB, Wu GY: Evidence for pretranslational regulation of collagen synthesis by procollagen propeptides.

    J Biol Chem 1986, 261:10482-10484. PubMed Abstract | Publisher Full Text OpenURL

  4. Hill AV: The possible effects of the aggregation of the molecules of hamoglobin on its dissociation curves.

    J Physiol 1910, 40:IV-VII. OpenURL

  5. Muñoz-Alicea R, Negrón-Marrero PV, Marcano-Velázquez M: A mathematical model for macrophage, T-cell, and mycobacterium tuberculosis interactions. In A Mathematical Model for Macrophage, T-cell, and Mycobacterium Tuberculosis Interactions. University of Puerto Rico; 1999. OpenURL

  6. Borg TK, Markwald R: Periostin: more than just an adhesion molecule.

    Circ Res 2007, 101:230-231. PubMed Abstract | Publisher Full Text OpenURL

  7. Jin Y, Lindsey M: Stability analysis of genetic regulatory network with additive noises.

    BMC Genomics 2008, 9(Suppl 1):S21. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  8. Ioannou PA, Sun J: Robust Adaptive Control. Upper Saddle River, NJ: Prentice-Hall; 1996. OpenURL

  9. Sastry SS, Bodson M: Adaptive Control: Stability, Convergence, and Robustness. Upper Saddle, NJ: Prentice Hall; 1989. OpenURL

  10. Naugle JEOE, Zhang X, Mase SE, Pilati CF, Maron MB, Folkesson HG, Horne WI, Doane KJ, Meszaros JG: Type VI collagen induces cardiac myofibroblast differentiation: implications for postinfarction remodeling.

    Am J Physiol Heart Circ Physiol 2006, 290:H323-H330. PubMed Abstract | Publisher Full Text OpenURL

  11. Zamilpa R, Lopez EF, Chiao YA, Dai Q, Escobar GP, Hakala K, Weintraub ST, Lindsey ML: Proteomic analysis identifies in vivo candidate matrix metalloproteinase-9 substrates in the left ventricle post-myocardial infarction.

    Proteomics 2010, 10:2214-2223. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  12. Gao X-M, Xu Q, Kiriazis H, Dart AM, Du X-J: Mouse model of post-infarct ventricular rupture: time course, strain- and gender-dependency, tensile strength, and histopathology.

    Cardiovasc Res 2005, 65:469-477. PubMed Abstract | Publisher Full Text OpenURL

  13. Rice JA: Mathrmatical Statistics and Data Analysis. Cengage Learning; 2007. OpenURL

  14. Xiao-Ming G, Ziqiu M, Yidan S, Lu F, Helen K, Qi X, Anthony MD, Xiao-Jun D: Infarct size and post-infarct inflammation determine the risk of cardiac rupture in mice.

    Int J Cardiol 2010, 143:20-28. PubMed Abstract | Publisher Full Text OpenURL

  15. Sumitra M, Manikandan P, Nayeem M, Manohar BM, Lokanadam B, Vairamuthu S, Subramaniam S, Puvanakrishnan R: Time course studies on the initiation of complement activation in acute myocardial infarction induced by coronary artery ligation in rats.

    Mol Cell Biochem 2005, 268:149-158. PubMed Abstract | Publisher Full Text OpenURL

  16. Kleinbaum DG, Kupper LL, Nizam A, Muller KE: Applied Regression Analysis and Other Multivariable Methods. 4th edition. Thomson Higher Education; 2008. OpenURL

  17. Cody R: Learning SAS by Examples: A Programmer's Guide. SAS Publishing; 2007. OpenURL

  18. Sun Y, Zhang JQ, Zhang J, Lamparter S: Cardiac remodeling by fibrous tissue after infarction in rats.

    J Lab Clin Med 2000, 135:316-323. PubMed Abstract | Publisher Full Text OpenURL

  19. Vandervelde S, van Luyn MJA, Rozenbaum MH, Petersen AH, Tio RA, Harmsen MC: Stem cell-related cardiac gene expression early after murine myocardial infarction.

    Cardiovasc Res 2007, 73:783-793. PubMed Abstract | Publisher Full Text OpenURL

  20. Krishnamurthy P, Rajasingh J, Lambers E, Qin G, Losordo DW, Kishore R: IL-10 inhibits inflammation and attenuates left ventricular remodeling after myocardial infarction via activation of STAT3 and suppression of HuR.

    Circ Res 2009, 104:e9-18. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  21. Yang F, Liu YH, Yang XP, Xu J, Kapke A, Carretero OA: Myocardial infarction and cardiac remodelling in mice.

    Exp Physiol 2002, 87:547-555. PubMed Abstract | Publisher Full Text OpenURL

  22. Zhang H, Ahmad M, Gronowicz G: Effects of transforming growth factor-beta 1 (TGF-[beta]1) on in vitro mineralization of human osteoblasts on implant materials.

    Biomaterials 2003, 24:2013-2020. PubMed Abstract | Publisher Full Text OpenURL

  23. Loftis MJ, Sexton D, Carver W: Effects of collagen density on cardiac fibroblast behavior and gene expression.

    J Cell Physiol 2003, 196:504-511. PubMed Abstract | Publisher Full Text OpenURL