Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX, USA

Barshop Institute of Longevity and Aging Studies and the Division of Geriatrics, Gerontology and Palliative Medicine, Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA

Abstract

Background

The availability of temporal measurements on biological experiments has significantly promoted research areas in systems biology. To gain insight into the interaction and regulation of biological systems, mathematical frameworks such as ordinary differential equations have been widely applied to model biological pathways and interpret the temporal data. Hill equations are the preferred formats to represent the reaction rate in differential equation frameworks, due to their simple structures and their capabilities for easy fitting to saturated experimental measurements. However, Hill equations are highly nonlinearly parameterized functions, and parameters in these functions cannot be measured easily. Additionally, because of its high nonlinearity, adaptive parameter estimation algorithms developed for linear parameterized differential equations cannot be applied. Therefore, parameter estimation in nonlinearly parameterized differential equation models for biological pathways is both challenging and rewarding. In this study, we propose a Bayesian parameter estimation algorithm to estimate parameters in nonlinear mathematical models for biological pathways using time series data.

Results

We used the Runge-Kutta method to transform differential equations to difference equations assuming a known structure of the differential equations. This transformation allowed us to generate predictions dependent on previous states and to apply a Bayesian approach, namely, the Markov chain Monte Carlo (MCMC) method. We applied this approach to the biological pathways involved in the left ventricle (LV) response to myocardial infarction (MI) and verified our algorithm by estimating two parameters in a Hill equation embedded in the nonlinear model. We further evaluated our estimation performance with different parameter settings and signal to noise ratios. Our results demonstrated the effectiveness of the algorithm for both linearly and nonlinearly parameterized dynamic systems.

Conclusions

Our proposed Bayesian algorithm successfully estimated parameters in nonlinear mathematical models for biological pathways. This method can be further extended to high order systems and thus provides a useful tool to analyze biological dynamics and extract information using temporal data.

Background

In the past decade, there has been a rapid development in systems biology approaches driven by high-throughput data characterizing regulations of genetic networks, interactions among proteins, and reactions in metabolic pathways. These data usually provide a specific scenario of a biological system which may be compared with an alternative system, for instance, expression levels of biomarkers associated with a disease pattern versus healthy controls. Extending the snapshot-type data to condensed data in a time sequence, which is more suitable for profiling the temporal dynamics, provides insights into the functions and underlying regulating mechanisms of the biological system. To gain these insights, mathematical representations of biological systems established with temporal data are highly desirable.

Establishment of proper mathematical representations requires identification of a suitable model with an adequate framework and structure to determine the parameters in the framework. For the structure identification of a model, extensive research has been carried out and mathematical models have been developed to represent the instantaneous rate of a process as an explicit function of all the state variables ^{n }_{i}_{i }

where _{i }_{ij }_{i }_{j }_{i }_{ij}

where _{i }

For linearly parameterized systems, the least squares method generally gives the optimal estimate of parameters. In addition to the least square approach, an adaptive estimation algorithm serves as a powerful tool to estimate the unknown parameters in ordinary differential equations (ODE)

Bayesian approaches have been widely used for machine learning, adaptive filters, information theory, and pattern recognition

The aim of this study is to estimate the unknown parameters **θ **using a Bayesian approach in nonlinear ODEs representing a biological system as equation (4):

In this representation, ^{n }_{0 }is the initial state, ^{l }** θ **∈

We applied our Bayesian algorithm to estimate unknown parameters in the biological pathways involved in the left ventricle (LV) response to myocardial infarction, which involves inflammatory and fibrotic components typical of a wound healing response. Macrophages begin to infiltrate the LV at day 3 post-MI and are stimulated by interleukin-10 (IL-10) to release transforming growth factor β (TGF-β). In turn, TGF-β stimulates fibroblasts to secrete extracellular matrix components that are necessary for an adequate scar to form. Estimates of the parameters were close to their true value with considerably small estimatiom errors, particularly with regards to small noise variances.

Methods

The mathematical models represented as ODEs generally lead to continuous solutions, while real observed data are typically discrete in the time domain. To bridge the gap between our mathematical model and the real experimental data, and to predict future samples with available observational data, we first transformed the ODE presentation to difference equations.

Transformation of differential equations to difference equations

With known parameters ** θ**, solutions of equation (4) can be approximated with the fourth-order Runge-Kutta method as follows:

where _{i}_{i }_{i+1}. Thus, the next step of _{i+1 }is determined by present value _{i }_{i }_{i }^{s }_{i+1 }can be obtained with all available _{i}_{i}** θ **with estimated parameters

Estimations of parameter ** θ **can be obtained by applying a Bayesian approach as follows.

Estimation of parameters using a Bayesian approach

The goal of estimating ** θ **using a Bayesian method is to obtain the posterior distribution

**
**

where ** θ**) is the

However, since the function _{i}_{i}_{i}** y**|

The MH algorithm is an iterative algorithm and the steps of the proposed algorithm for model (5) at the (^{th }

1) Given the parameter sample ^{i }obtained in the ^{th }

2) Draw ^{⋆ }from the proposal distribution ^{⋆}|^{i}) as a proposed sample;

3) Calculate the ratio:

**
**

4) Draw a random sample ^{th }

**
**

where

With the assumption that all parameters are positive, the proposal distribution to generate ^{⋆ }is chosen as a Gamma distribution expressed as:

Accordingly, the proposal distributions ^{⋆}|^{i}) and ^{i}|^{⋆}) can be written as:

and:

The second fraction in equation (8) becomes:

In a real application, there are unavoidable statistical and model noise, which is modeled in this case by the i.i.d. Gaussian distribution with the unknown noise variance ^{2}. Therefore, in equation (8), ** y**|

where ^{2}) is the prior distribution taken to be the conjugate Inverse Gamma (IG) distribution (_{2}, _{2})) as:

and ** y**|

With the definition of model error as

Now, define new shape and scale factors of the IG function by

Define ^{2 }=

which is the integral of an IG-type function. Therefore, the expression of the marginal likelihood function can be expressed as:

Substituting equations (13) and (20) into equation (8) results in:

i.e.

The above proposed MH algorithm will be run for many iterations until convergence and the samples obtained after convergence are considered samples from the posterior distribution ** θ**|

Results

A first order ODE equation was employed in this study to estimate the unknown parameters in a nonlinear mathematical model. This ODE was originally established to describe temporal profiles of TGF-β post-MI. After MI, the major sources of TGF-β include activated macrophages and fibroblasts. IL-10 stimulates macrophages to secrete TGF-β and its stimulation effect can be approximated as a Hill equation. Since we are initially interested in the macrophage related function at the early stage and will incorporate the effect of fibroblasts at the later stage, we established the mathematical model as follows:

Where _{ϕ }_{0 }= 0.21 is the concentration of TGF-β measured in healthy adult mice before MI. Both V and K are the unknown parameters to be estimated.

A stationary Markov chain was generated by following the proposed MH algorithm. Only samples after the burn-in are retained. Since no prior information about the parameters is available, a flat gamma distribution is chosen for the prior distribution of ** θ**. The values of scale parameter for both linear and nonlinear parameter (V and K respectively) is 2 (

We have simulated three situations: 1) Estimate parameter V with known parameter K; this allows us to evaluate the performance of linear parameterized system using the Bayesian approach. 2) Estimate parameter K given a known V; this allows us to evaluate the performance of estimating a single parameter in nonlinearly parameterized dynamics. 3) Estimate both V and K using the proposed Bayesian approach. To mimic the real experimental data, we sampled our computed state at 500 time points. The temporal profiles of macrophage cell density, IL-10 concentrations, TGF-β concentrations, and sampled TGF-β (500 samples over 20 days) in the time sequence were shown in Figure

Temporal profiles of macrophage numbers, IL-10 concentrations, and TGF-β concentrations post myocardial infarction establish the nonlinear dynamic patterns of the MI response

**Temporal profiles of macrophage numbers, IL-10 concentrations, and TGF-β concentrations post myocardial infarction establish the nonlinear dynamic patterns of the MI response**. A: Temporal profile of macrophage infiltration (cell numbers/mm^{2}) post myocardial infarction reconstructed from experimental data in C57 mice

Estimate parameters in linear parameterized system

In a Hill-equation, the parameter V is linearly parameterized. In the first set of our simulations, we set K = 2 and estimate the parameter V with the temporal data. The nominal value of V is 5, and the estimated V using MCMC ranges from 4.9247 to 5.0045. The performance of the estimation algorithm can also be evaluated by examining the mean squared error (RMSE) of V with respect to the variance ^{2 }of the noises as shown in Figure ^{2 }increases but remains in a very small region. RMSE of V was calculated as 0.017 while the variance

Performance evaluation for linear parameter estimation

**Performance evaluation for linear parameter estimation**. A: Root mean square error of parameter V given parameter K = 2 was plotted while variance of the noise increased from 0.01 to 100 with both MCMC and least square. The true value of V was set as 5. Root mean square error monotonically increased as variance increased. The root mean square error was calculated as 0.0331 and 0.0256 as variance of noise increased to 100 for MCMC and least square, respectively. This demonstrated that the linear parameter estimation performed within the recommended range. B: the boxplot for the estimation of V were shown for both least square and MCMC methods as the noise variances increased from 0 to 100. The number of outliers is significantly higher for MCMC comparing to least square.

At the same time, the performance of MCMC was compared with least square algorithm. As parameter V to be estimated is a linear parameter, this comparison will give us a good idea about the performance of MCMC. It is expected that the least square gives the best results in estimating the linear parameter with the presence of noise which is shown in Figure ^{2}) is small. As the variance of noise increases, the error difference decreases. However it should be mentioned that the outliers in MCMC estimation is significantly more than least square. Same as MCMC estimation, the nominal value of V is 5, and the estimated V using least square ranges from 4.9308 to 5.0561.

Estimate parameters in nonlinearly parameterized dynamics

To verify our algorithm for nonlinearly parameterized dynamics, we estimated parameter K assuming V available and parameters V and K when both are unknown.

When a nominal value of K is set as 5000, we ran 6 groups of simulations according to 6 different parameter settings for V (V = 0.01,0.1, 0.5, 1, 5 and 10). Output of each group was subject to white noises with different variances ranging from ^{2 }= 0 ^{2 }= 10. We repeated such simulation at K = 1, 10, 50, 100, 500, 1000, 5000 & 10000, respectively, and showed our RMSE error of estimated values of V subject to different variances while V = 1 and V = 10 in Table

Estimated values of parameter K subject to different noise variances.

**Estimated value of K**

**Parameter**

**V**

**True value of**

**K**

**σ ^{2}= 0**

**σ ^{2 }= 0.1**

**σ ^{2}= 0.5**

**σ ^{2 }= 1**

**σ ^{2 }= 10**

V = 1

K = 1

0.895860924

1.198924271

2.060871616

2.392641064

2.2417669

V = 1

K = 10

9.942624719

9.417099524

5.18899955

3.041348517

2.3984733

V = 1

K = 50

49.84538377

50.3653591

49.79822957

48.21724047

17.612567

V = 1

K = 100

99.57735594

100.5475019

98.86122817

95.19722428

35.651171

V = 1

K = 500

499.7813772

498.9833568

495.3510638

501.8771493

450.50892

V = 1

K = 1000

999.2728579

997.8776983

995.7044492

988.870943

891.65597

V = 10

K = 1

0.970450637

1.006078611

1.094535769

1.282321201

2.1026316

V = 10

K = 10

9.975237184

9.997279446

9.847326837

9.531280999

5.3013845

V = 10

K = 50

50.05571987

49.87267536

49.91859144

50.0966008

49.055078

V = 10

K = 100

99.87386748

99.92057687

100.6109818

99.56079265

97.374093

V = 10

K = 500

500.0809047

499.7714225

499.802065

500.2063382

497.87452

V = 10

K = 1000

999.5597507

999.5670495

1000.211577

1001.028307

996.15873

Parameter V was set to 1 and 10, respectively

Performance evaluation for nonlinearly parameterized situation with known V

**Performance evaluation for nonlinearly parameterized situation with known V**. Root mean square error of parameter K (A: K = 5000, and B: K = 10000) was plotted with respect to different noise variances ranging from 0.01 to 10 and different values of V. Colors of the curves denote different parameter settings of parameter V. (Blue: V = 0.01, Red: V = 0.1, Green V = 0.5, Cyan: V = 1, Magenta: V = 5, and Black: V = 10).

In case that both V and K are unknown, we also ran twenty different settings of the parameters and verified the error of the estimation. We verified the cases while the true value of K was 5000 and 10000, and true value of V was 0.01, 0.1, 0.5, 1, 5, and 10, and while the true value of V was set as 1 and 10, and true value of K was set as 1, 10, 50, 100, 500, 1000, 5000 and 10000. Again, our algorithm generated estimates close to the nominal values and the RMSE of different parameter V when K = 5000 was shown in Figure

Performance evaluation for nonlinearly parameterized situation with unknown V and K

**Performance evaluation for nonlinearly parameterized situation with unknown V and K**. Root mean square error of parameter K was plotted with respect to different noise variances ranging from 0.01 to 1 and different values of V in A and B. (A: true value of K = 5000, and B: True value of K = 10000). In subfigures A and B, colors of the curves denote different parameter settings of parameter V (Blue: V = 0.01, Red: V = 0.1, Green V = 0.5, Cyan: V = 1, Magenta: V = 5, Black: V = 10). Root mean square error of parameter V was plotted with respect to different noise variances ranging from 0.01 to 1 and different values of K in C and D (C: true value of V = 1, and B: True value of V = 10). In subfigures C and D, colors of the curves denote different parameter settings of parameter K (Blue: K = 1, Red: K = 10, Green K = 50, Cyan: K = 100, Magenta K = 500, Black K = 1000, Dark Green K = 5000, Brown K = 10000).

Discussion

This study is the first investigation to estimate unknown parameters in nonlinearly parameterized biological dynamics using MCMC algorithm. We have applied a Bayesian approach to estimate two unknown parameters in an ODE model describing the temporal profiles of TGF-β in the post-MI setting. Our computational results have demonstrated the effectiveness of the Bayesian approach for parameter estimation in a nonlinear model for biological pathways. As such, this study provides a valid estimation approach for nonlinear dynamics of biological pathways. The most important contributions of this study are listed as follows: 1) The new proposed method bridges the gap between the sparse observational data and the need for continuous signals embedded in mathematical models. Therefore, parameters estimated on the basis of experimental data have clear biological meaning in the mathematical models. 2) The introduction of additive noises and measurement functions reflect real scenarios in biological experiments, therefore, giving more confidence to the parameter estimation real world in applications. 3) A new MCMC algorithm is proposed to estimate parameters in general nonlinearly-parameterized dynamics. Our results demonstrated good performance in estimating two parameters of an ODE with a Hill equation. Together, this new method will have widespread applicability to many biological systems, not limited to investigations on cardiovascular disease.

In this study, our key task is defined as parameter estimation for a nonlinearly parameterized mathematical model for biological pathways. As there exist different representations of mathematical models such as linearized models and power law functions, it is possible to approximate nonlinearly parameterized dynamics by linearly parameterized dynamics _{0}; and 2) the transformed model have the identical first-order derivatives at the operating point _{0 }as the original Hill representations. Therefore, both linear and power low approximations hold locally in a small vicinity of the operating point _{0}. When the variable,

While we illustrated the effectiveness of the algorithm with a first order ODE model, the algorithm can be expanded to estimate more parameters with higher order ODE models for more complicated systems. In that case, convergence of the estimates and convergence speed of the algorithm should be further studied. Additionally, the measurement function we used in this study is an identical matrix, this identical matrix can be relaxed by an observable function where all states

In this study, we proposed flat Gamma distributions as the proposal distributions in the MH algorithm. Although they lead to implementations with relatively slow convergence of Markov chains, the algorithm can still produce very robust estimation results. Selection of better proposal distributions that will lead to faster convergence, thus more efficient implementation of the algorithm will be a focus of our future study. In addition, we employed real experimental data in this study to estimate the effects of IL-10 on TGF-β concentrations in left ventricle post-MI and our measurement equation includes additive noises to simulate the real biological systems. However, we are well aware that the structure of the model is simplified and there exist modeling errors embedded in the structure of the mathematical model. These modeling errors will likely lead to estimation errors of the parameters. We can minimize the modeling error with the accumulation of more biological knowledge. Though it is beyond the scope of the current paper, further investigation on modeling structure using real

Conclusions

In conclusion, we have proposed an algorithm which combines the transformation from ODEs to difference expressions and a Bayesian algorithm to estimate multiple parameters in a nonlinear mathematical model for biological systems using discrete observational experimental data. Estimates of the parameters were close to their true values with considerably small estimation errors, particularly with regard to small noise variances. This proposed estimation algorithm provides a powerful tool to analyze time series data and better understand the interactions among biological pathways.

Authors' contributions

YFJ, and YH designed the research; OG, TY and NN performed computational simulation. OG, TY, MLL, YH., and YFJ analyzed the results and wrote the manuscript. All authors have read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

The authors acknowledge grant and contract support from NHLBIHHSN268201000036C (N01-HV-00244), NIH R01 HL75360, Veteran's Administration Merit Award, and the Max and Minnie Tomerlin Voelker Fund (to MLL), NSF 0546345 and Qatar National Research Fund (09-874-3-235) to YH, and N.I.H 1R03EB009496, NIHSC2HL101430, NSF 0649172, and AT&T foundation (to YFJ).

This article has been published as part of