Department of Computer Science, University of California Santa Barbara, Santa Barbara, California 93106, USA

Department of Statistics, Iowa State University, Ames, Iowa 50011, USA

Abstract

Background

A prerequisite for the mechanistic simulation of a biochemical system is detailed knowledge of its kinetic parameters. Despite recent experimental advances, the estimation of unknown parameter values from observed data is still a bottleneck for obtaining accurate simulation results. Many methods exist for parameter estimation in deterministic biochemical systems; methods for discrete stochastic systems are less well developed. Given the probabilistic nature of stochastic biochemical models, a natural approach is to choose parameter values that maximize the probability of the observed data with respect to the unknown parameters, a.k.a. the maximum likelihood parameter estimates (MLEs). MLE computation for all but the simplest models requires the simulation of many system trajectories that are consistent with experimental data. For models with unknown parameters, this presents a computational challenge, as the generation of consistent trajectories can be an extremely rare occurrence.

Results

We have developed Monte Carlo Expectation-Maximization with Modified Cross-Entropy Method (MCEM^{2}): an accelerated method for calculating MLEs that combines advances in rare event simulation with a computationally efficient version of the Monte Carlo expectation-maximization (MCEM) algorithm. Our method requires no prior knowledge regarding parameter values, and it automatically provides a multivariate parameter uncertainty estimate. We applied the method to five stochastic systems of increasing complexity, progressing from an analytically tractable pure-birth model to a computationally demanding model of yeast-polarization. Our results demonstrate that MCEM^{2} substantially accelerates MLE computation on all tested models when compared to a stand-alone version of MCEM. Additionally, we show how our method identifies parameter values for certain classes of models more accurately than two recently proposed computationally efficient methods.

Conclusions

This work provides a novel, accelerated version of a likelihood-based parameter estimation method that can be readily applied to stochastic biochemical systems. In addition, our results suggest opportunities for added efficiency improvements that will further enhance our ability to mechanistically simulate biological processes.

Background

Conducting accurate mechanistic simulations of biochemical systems is a central task in computational systems biology. For systems where a detailed model is available, simulation results can be applied to a wide variety of tasks including sensitivity analysis,

Despite recent advances in experimental methodology, the estimation of unknown kinetic parameters from data is a bottleneck for performing accurate simulations

Given the probabilistic nature of stochastic biochemical models, a natural approach for parameter estimation is to choose values that maximize the probability of the observed data with respect to the unknown parameters (maximum likelihood estimates or MLEs). In the case of fully observed data, where the number of molecules of each system species is known at all time points, MLEs can be calculated analytically. However, since realistic biochemical systems are discretely and partially observed, computational MLE methods are necessary. One of the earliest examples presented, simulated maximum likelihood (SML), combines a non-parametric density function estimator with Monte Carlo simulation to approximate the likelihood function

Although not strictly an MLE method, Boys

All of the above MLE approaches essentially iterate between two steps: (A) approximating a parameter likelihood using Monte Carlo sampling and (B) maximizing that approximation with respect to the unknown parameters using an optimization algorithm. We note that the Bayesian method of Boys

In this work, we develop Monte Carlo Expectation-Maximization with Modified Cross-Entropy Method (MCEM^{2}), a novel, accelerated approach for computing MLEs along with uncertainty estimates. MCEM^{2} combines advances in rare event simulation
^{2} generates probabilistically coherent system trajectories using the SSA. The remainder of the paper is structured as follows: We first provide derivation and implementation details of MCEM^{2} (Methods). Next, we apply our method to five stochastic biochemical models of increasing complexity and realism: a pure-birth process, a birth-death process, a decay-dimerization, a prokaryotic auto-regulatory gene network, and a model of yeast-polarization (Results). Through these examples, we demonstrate the superior performance of MCEM^{2} to an existing implementation of MCEM and the SGD and Poisson approximation methods. Finally, we discuss the distinguishing features of our method and motivate several promising future areas of research (Discussion).

Methods

Discrete-state stochastic chemical kinetic system

We focus on stochastic biochemical models that assume a well-stirred chemical system with _{1}, …, _{N}}, whose discrete-valued molecular population numbers evolve through the firing of _{1}, …, _{M}}. We represent the state of the system at time **X**(_{1}(_{N}(_{i}(_{i }at time _{j}(**x**)(_{j} fires in the interval [**X**(**x**. The sum of all **x** is denoted _{0}(**x**). We restrict our attention to reactions that obey mass action kinetics—i.e. where _{j}(**x**) ≡ _{j}_{j}(**x**) with _{j} a positive real kinetic constant and _{j}(**x**) a function that quantifies the number of possible ways reaction _{j} can occur given system state **x**. Examples of _{j}(**x**) include: 1, _{1},
_{1}_{2} for zeroth-order, unimolecular, homo-bimolecular, and hetero-bimolecular reactions, respectively. Further details on mass action propensity functions can be found in

The "direct method" implementation of Gillespie’s stochastic simulation algorithm (SSA) provides a simple numerical procedure for generating exact system trajectories from their underlying (intractable) probability distribution
_{0}(**x**) and the index of the next reaction (_{j}(**x**) / _{0}(**x**)(**X**(0) = **x**_{0}, application of the direct method yields a reaction trajectory
**z** is only of length **x**_{0 }allows us to identify the complete system state at any time in the interval [0,**x**_{0}, **z**) as the following function of the kinetic parameters ** θ** ≡ (

where _{r + 1} is the time interval between the firing of the final reaction and **x**_{i−1} is the easily computable system state at the time immediately after the (^{st} firing event (i.e. when

Maximum likelihood parameter estimation

If the true values of the kinetic parameters ^{∗} are unknown and we are given a complete system trajectory (**x**_{0}, **z**), a natural approach for generating parameter estimates
** θ **that maximize the likelihood with respect to the trajectory (Equation (1)). These maximum likelihood parameter estimates (MLEs) can be analytically computed for each reaction as follows (see

where _{j} is the total number of times reaction _{j }fires in **z**. Although simple, Equation (2) is only useful in the presence of a complete system trajectory. Experimentally observed data are typically much less informative, consisting of the initial system state plus numbers of molecules for a subset of the system species at _{i}. Knowledge of any **y **of finite size is insufficient for reconstructing the complete system trajectory (**x**_{0}, **z**) and the corresponding likelihood (Equation (1)); thus, Equation (2) is not a feasible approach for computing MLEs. Instead, we require a method that can accommodate "unobserved data"—i.e., the states of all system species at all times not included in the observed data.

In this work we use the expectation-maximization (EM) algorithm

where
**z** given **y** and
**y **(i.e. trajectories that pass through all observed data points exactly), and
**z**. The theory behind the EM algorithm guarantees that Equation (3) will converge to estimates that locally maximize the observed data likelihood, given

where
^{th} SSA trajectory simulated using the parameter vector
**y** (and 0 otherwise), and ^{′} indexes only the ^{′} simulated trajectories that are consistent with the observed data. In practice, we set ^{′}. We note that Equations (4a) and (4b) describe a rejection sampling approach to generating reaction trajectories conditional to the observed data, in which only those simulated trajectories consistent with data are retained and all others are rejected. In practice, we simulate trajectories incrementally between two data points at a time, further propagating only those trajectories that pass through the second data point exactly. Although this incremental approach is much more efficient than performing rejection sampling across full length trajectories, as we describe below it can still be computationally prohibitive.

By simplifying Equation (4b) with the same procedure used to derive Equation (2)

Equation (5) is analogous to Equation (2), with trajectory features having an added subscript ^{′} and superscript (

An open question in the use of MCEM involves efficient selection of the numbers of consistent trajectories ^{′} and iterations ^{′} at each iteration according to an estimate of the current Monte Carlo error and terminating the algorithm when the estimated change in conditional log-likelihood
^{′} to 10 and the sample size increment parameters

Accelerating MLE computation

Equation (5) requires the generation of ^{′ }trajectories that are consistent with observed data. For datasets with closely spaced time points and reasonably accurate initial parameter estimates

The cross-entropy method

The cross-entropy (CE) method was first developed by Rubinstein
^{4} and

When applied to the task of stochastic parameter estimation, the CE method proposes an iterative optimization very similar to Equation (4a):

where
^{(m)} is the (^{th }quantile of distances achieved by the _{1 }distance evaluated at each observed time point for each observed species (i.e. we divide each absolute deviation by the quantity [1 + the value of the corresponding data point]). Upon simplification of Equation (6), we obtain the following expression for each CE reaction parameter:

Once ^{(m)} = 0, Equation (7) is used a final time to obtain
^{′} / ^{′ }consistent trajectories. Generally speaking, the algorithm is guaranteed to terminate provided ^{′} are sufficiently small and sufficiently large, respectively (see

Multilevel splitting

If the observed data consist of many time points, simulating a trajectory that passes through all of the data will be extremely unlikely, even when using the true parameter values. Consequently, our CE method will require a very small

A natural definition of a sub-trajectory in the context of observed data is the portion of a trajectory from time 0 to a recorded time point _{i} ≤ _{d}. Starting from _{i}]. We then compute the distance
^{′} × 100)^{th} quantile of distances (where we typically choose ^{′} = ^{′} = 1 results in a nearly unmodified CE method as described above, and the amount of trajectory splitting can be easily tuned to the desired level by changing ^{′}accordingly.

Multilevel splitting applied to CE phase of MCEM^{2}.

**Multilevel splitting applied to CE phase of MCEM**^{2}**.** Using
_{1 }(red traces). The ending states of the ⌈_{1 }until reaching _{2}. Here, we select the ⌈_{2}] that are closest to the first and second data points (black circles at times _{1 }and _{2}) and use them to initiate the third simulation ensemble. We repeat this process until reaching _{4}, at which time we compute the first set of parameter estimates

Parameter perturbation

Both the CE method and its MS modification rely on the system’s intrinsic variability to refine parameter estimates. If a system exhibits a low level of variability, each selected subset of ⌈

where

Computing MLE uncertainty estimates

An advantage of using MCEM to identify MLEs is the simplicity with which uncertainty estimates can be computed. In general, MLEs exhibit asymptotic normality; consequently, their covariance matrix can also be estimated using Monte Carlo simulation

where {·} delimits a matrix, **a**^{T} represents the transpose of vector **a**, _{ω}(·) is equivalent to Equation (1) with exp(** ω**) substituted for

where

Upon solving Equation (10) for
** ω** using the properties of the multivariate normal distribution. We then transform these coordinates by exponentiation to yield (strictly positive) confidence bounds for

To summarize, our proposed method for accelerating MLE identification in stochastic biochemical systems works in three steps: first, it identifies an initial parameter estimate
^{2}: Monte Carlo Expectation-Maximization with Modified Cross-Entropy Method.

Results

We now illustrate the utility of MCEM^{2} for estimating unknown parameters by applying it to data from five stochastic biochemical model systems: a pure-birth process, a birth-death process, a decay-dimerization, an auto-regulatory gene network, and a model of yeast-polarization. For each model, we first simulate a single system trajectory (with known parameters) using the SSA for a given final time ^{2} on the dataset and a version of the model where all information about model parameters has been withheld. Unless otherwise noted, we set the initial parameter vector for each system

Pure-birth process

A system for which MLEs can be computed analytically from discretely observed data is the pure-birth process, also known as a homogeneous Poisson process. The model is given by the single reaction

with initial conditions **x**_{0} = 0. The MLE for a given dataset from this model can be easily computed by dividing the number of molecules of ^{2} and standard ascent-based MCEM will also return this MLE (albeit at a greater computational expense), as any version of EM applied to this model ultimately reduces to the exact computation

Thus, the only potential difference between MCEM^{2} and MCEM for this system is the required computing time. To quantify this difference, we generated data for 100 pure-birth models, with ^{∗}, the true value, ranging from .01 to 10. For each model, we used ^{2}, both with
^{∗}. We see that the time required for MCEM increases dramatically as values of ^{∗} depart from
^{∗}increases. As shown in Figure
^{∗}. In contrast, the computing time for MCEM^{2} stays approximately constant for values of ^{∗} less than 1 and increases relatively slowly for values greater than 1. This cost increase is due to the simulation cost of firing more birth reactions required for larger ^{∗}. MCEM^{2} does not appear to suffer from a cost associated with the discrepancy between
^{∗}.

Computing time of MCEM versus MCEM^{2} for pure-birth process.

**Computing time of MCEM versus MCEM**^{2}**for pure-birth process.** Red circles and curve fit depict computing time required for MCEM^{2} to return MLEs for the pure-birth model with
^{∗} values. Blue circles and curve fit depict identical quantities for ascent-based MCEM. Performance of MCEM^{2} is robust to the discrepancy between initial and true parameter values, while ascent-based MCEM quickly becomes computationally intractable as the discrepancy increases.

We next investigated the accuracy of MCEM^{2} uncertainty estimates. Figure
^{2} MLEs with 95% confidence intervals (CIs) for all models. Out of 100 CIs, only eight (denoted by blue circles) do not overlap the true values. This figure matches well with the expected number of missed overlaps (100 × (1 − .95) = 5) and suggests that our asymptotic normality assumption for deriving MLE confidence bounds is valid. We note that the relative magnitudes of the CIs decrease with increasing ^{∗}; this is due to the diminishing effect of noise on the system as the average number of reaction firings per unit time increases.

Pure-birth process MCEM^{2} MLEs and confidence intervals.

**Pure-birth process MCEM**^{2}**MLEs and confidence intervals.** Colored circles depict MCEM^{2} MLEs normalized by true parameter values for the pure-birth model with
^{∗}. Error bars denote 95% confidence intervals (CIs) for each model. Out of 100 models tested, only eight (centered at blue circles) do not overlap the true parameter values (green line) whereas the remaining 92 (centered at red circles) enclose the truth. This agrees well with the expected 95/100.

Birth-death process

The second model doubles the number of reactions of the pure-birth process by adding a degradation reaction. The birth-death process takes the form:

The presence of a single first order reaction (degradation) renders the analytical calculation of MLEs infeasible. Furthermore, computational parameter identification for the birth-death process is significantly more challenging than for the pure-birth process. This challenge stems from the degeneracy present in a discretely observed dataset: the net increase of a single molecule of _{1} and _{2} reaction firings (where ^{2} on this system, we first generated single trajectory data for a model with ^{∗} = (1, .06) and **x**_{0} = 17, where the system starts in stochastic equilibrium. We used ^{2} iteration. The modified cross-entropy phase of the algorithm required only three iterations (labeled -2, -1, 0), transforming

Birth-death process MCEM^{2} parameter estimate progression.

**Birth-death process MCEM**^{2}**parameter estimate progression.** Green and blue bold lines denote MCEM^{2} parameter estimates

Next, we investigated the effect of appending data at additional time points to the original data set. Figure
^{2} MLEs along with 68%, 95%, and 99% confidence ellipses (warped due to exponentiation—see Methods) that represent parameter uncertainty as a function of both parameters. We see that as ^{∗} until at ^{2} (1 × Intel 3 GHz processor) on each of the four datasets was approximately the same: one hour.

Effects of birth-death dataset size on parameter estimates and MCEM^{2} uncertainty.

**Effects of birth-death dataset size on parameter estimates and MCEM**^{2}**uncertainty.** Each panel displays MCEM^{2} and SGD birth-death MLEs (red and blue circles, respectively) as well as Poisson method point estimates (orange circles) versus the true parameter values (green circles), along with MCEM^{2} 68%, 95%, and 99% confidence ellipses (black curves ranging in size from smallest to largest, respectively). **A**, **B**, **C**, and **D** display results for datasets of 40, 60, 80, and 100 data points, respectively. The three methods tested identified parameters with comparable accuracy across all datasets. As the numbers of data points increase, the MCEM^{2} MLEs get closer to the truth and the confidence ellipses shrink in size. The green sloped line plots the ratio

We also compared MCEM^{2} performance to that of two recent methods: an MLE method utilizing reversible jump Markov chain Monte Carlo coupled with stochastic gradient descent ("SGD")
^{7} (with 10^{5} burn-in iterations and 10^{4} thinning interval). These options were chosen to yield sufficient mixing and convergence properties as evidenced by the diagnostic plots from the R package. We then computed the mean value of each parameter to arrive at point estimates. As with MCEM^{2}, we set
^{2}, all three methods identified parameters with comparable accuracy, with SGD and Poisson methods performing better when ^{2} performing better when ^{2}, conveying the same information regarding the ratios of the two parameters (not shown). As noted above, the SGD method did not provide parameter uncertainty estimates. Regarding run time, the Poisson method required between 20 and 60 minutes to identify parameters for the four datasets, while the SGD method needed between 30 minutes and several days (the latter time due to a lack of convergence when using the

We next modified the birth-death process such that the equilibrium value of species **x**_{0} to each of the following values (listed in order): (5, 1, 1, 1, 1). We then generated 20 independent datasets for each of the five models, using ^{2} and the Poisson method to each of these datasets. Although both methods perform equally well for the first three models (when the equilibrium value of ^{2} clearly identifies parameters more accurately than the Poisson method for the last two datasets (when the equilibrium values of ^{2}, which generates exact system trajectories using the SSA, experiences no such loss of accuracy. Unfortunately, we were unable to evaluate SGD on these modified birth-death process datasets, as the MATLAB package consistently terminated with an error related to the zero molecule count of S.

Effects of decreasing birth-death equilibrium on MCEM^{2} and Poisson method performance.

**Effects of decreasing birth-death equilibrium on MCEM**^{2}**and Poisson method performance.** Boxplots (displaying median, first and third quartiles, and most extreme data point within 1.5 ×the interquartile range from the box) summarize mean relative errors of MCEM^{2} and the Poisson method applied to 20 birth-death datasets for each of five models (true parameter values listed on x-axis). Models are sorted in decreasing order of the equilibrium value of ^{2} performance does not vary appreciably across the different models, while the Poisson method exhibits increasing error with decreasing equilibrium value.

Decay-dimerization model

The next system contains reactions involving species decay and dimerization. We begin with the following three reactions, where the dimerization step is reversible:

with **x**_{0} = (40, 0). We generated ten single-trajectory datasets for a model where ^{∗} = (.2, .04, .5), using

with all other properties unchanged. We again generated ten single-trajectory datasets for this model. Finally, we evaluated MCEM^{2}, the Poisson approximation method, and SGD on each of the 20 datasets. Figure
^{2} and the Poisson method perform very similarly in terms of accuracy (as well as run time: between 3 and 10 minutes for both models), with a slightly higher error for the irreversible model. In contrast, use of SGD results in higher errors for both models, with the irreversible model consistently yielding estimates with infinite error. This latter error is due to the estimate of _{1} quickly tending to infinity, regardless of how small we set the initial gradient descent step size. These results highlight a significant limitation of the SGD method: in order to generate a diversity of consistent trajectories, there must exist combinations of reactions that do not alter species counts. The reversible decay-dimerization model contains such a combination (reactions 2 and 3), while the irreversible model does not, leading to a divergent gradient descent.

Effects of decay-dimerization model structure on MCEM^{2}, Poisson method, and SGD performance.

**Effects of decay-dimerization model structure on MCEM**^{2}**, Poisson method, and SGD performance.** Boxplots summarize mean relative errors of the three methods applied to 10 decay-dimerization datasets for each of two three-reaction models. The two models differ only in their third reaction (listed on x-axis); the first model contains a reversible dimerization, while the second model does not. MCEM^{2} and the Poisson method perform similarly across both models, while SGD consistently incurs an infinite mean relative error (due to the estimate of _{1}quickly tending to infinity) when applied to the second (irreversible) model.

To further explore the ability of MCEM^{2} to estimate parameters for a decay-dimerization, we introduced a third model which adds a conversion reaction to the reversible model above. Previously analyzed in

with **x**_{0} = (1000, 10, 10). We generated single trajectory data for a model where ^{∗} = (.2, .04, .5, 1), using _{2} is nearly 4000 times larger than the propensity of its backwards counterpart _{3}; consequently, we expect observed data to reflect relatively few _{3} firings (and thus contain relatively little information about

Decay-dimerization dataset.

**Decay-dimerization dataset.** Red, green, and blue circles depict initial system states and five data points for species _{1}, _{2}, and _{3}, respectively. This dataset is sparsely observed, as species _{1} changes substantially between

To investigate the impact of parameter perturbation on the performance of MCEM^{2}, we estimated parameters from this decay-dimerization dataset using both ^{2} appears to navigate the parameter space more efficiently and hence require much less computational time. We note that three of the four parameters reach approximately the same values at the end of the CE phase in the perturbed and non-perturbed cases, with

Effects of parameter perturbation on decay-dimerization cross-entropy phase.

**Effects of parameter perturbation on decay-dimerization cross-entropy phase.** Red, blue, green, and orange lines represent MCEM^{2} parameter estimates

Figure
^{2} when applied to this decay-dimerization dataset. Specifically, MCEM^{2} returned
_{3} is much larger than for the other reactions, confirming our hypothesis that the dataset contains substantially less information about the backwards rate of the dimerization.

Parameter estimation results for decay-dimerization model.

**Parameter estimation results for decay-dimerization model.** Each panel displays MCEM^{2} MLEs (red circles) versus the true parameter values ^{∗} = (.2, .04, .5, 1)(green circles), along with 68%, 95%, and 99% confidence ellipses. All six pairwise parameter comparisons are shown. The mean relative error for MCEM^{2} was 22.8%. All MCEM^{2} confidence ellipses enclose the true parameter values, and uncertainty is relatively low for all estimates except

Auto-regulatory gene network

To further compare MCEM^{2} to the Poisson method and SGD, we tested all methods on a system for which SGD was previously shown to perform well: a prokaryotic auto-regulatory gene network

where _{2}, and **x**_{0} ≡ (_{2}, _{2}) = (7, 3, 10, 10, 10) and generated single trajectory data using ^{∗} = (.1, .7, .35, .3, .1, .9, .2, .1)with ^{2} and SGD to this dataset using
^{8}, with 10^{6} burn-in iterations and 10^{5} thinning interval (these values were increased from before to preserve adequate mixing and convergence). As in previous examples, we initially used ^{2}. However, this proportion was not small enough to enable the generation of ⌈^{2} using ^{5}. This time, the CE phase completed easily in five iterations.

Figure
^{2} pairwise confidence ellipses for the four reversible reaction pairs. We see that all methods estimate most parameters with approximately equal accuracy, although MCEM^{2} and SGD more accurately determine
^{2}, SGD, and the Poisson method were 52%, 20%, and 30%, respectively. The MCEM^{2} 95% confidence ellipses enclose all true parameters except
^{2} required 2.3 and 8.7 days on a single processor, respectively, to complete.

Parameter estimation results for auto-regulatory gene network.

**Parameter estimation results for auto-regulatory gene network.** Each panel displays MCEM^{2} and SGD MLEs and Poisson method point estimates computed using
^{2} 68%, 95%, and 99% confidence ellipses. **A**, **B**, **C**, and **D** compare the four reversible pairs of reactions in the system. Mean relative errors for MCEM^{2}, SGD, and the Poisson method were 52%, 20%, and 30%, respectively. MCEM^{2} 95% confidence ellipses enclosed all true parameter values except

In
_{2} at all time points except ^{2}. Upon convergence, we obtained

Yeast-polarization model

The final system we used to evaluate MCEM^{2} models the pheromone-induced G-protein cycle in

where _{a}, _{bg}, and _{d}. We used **x**_{0} ≡ (_{a}, _{bg}, _{d}) = (500, 4, 110, 300, 2, 20, 90) and generated single trajectory data for ^{∗} = (.38, .04, .082, .12, .021, .1,.005, 13.21)using _{a}, and _{bg} at early time points.

Yeast-polarization dataset.

**Yeast-polarization dataset.** Colored circles depict initial system states and 15 data points for all seven species. Like the decay-dimerization dataset, these data are sparsely observed, particularly with respect to species _{a}, and _{bg} between

We first tested MCEM^{2} on this dataset with
^{(m)} ≈ .033(see Methods). Although we expected premature entry into MCEM to increase the time required to simulate consistent trajectories in the first few iterations, we did not notice an appreciable trend and MCEM converged (defined here as when the change in conditional log-likelihood was less than .005 for at least one iteration) in 55 iterations. The resulting MLEs and available 68% confidence intervals (CIs) are displayed in Table
^{2} achieved a 34.7% mean relative error, and all determined CIs enclosed the corresponding true parameter values.

**Method**

**Type**

_{1}

_{2}

_{3}

_{4}

_{5}

_{6}

_{7}

_{8}

**% Error**

**True**

**.38**

**.04**

**.082**

**.12**

**.021**

**.1**

**.005**

**13.21**

Lower

n/a

.014

.076

n/a

.021

.089

.005

7.386

MCEM^{2}

MLE

.0005

.026

.081

.0009

.022

.104

.006

11.479

34.7

Upper

n/a

.048

.087

n/a

.024

.122

.006

17.839

Lower

.002

.003

.080

.0001

.018

.069

.004

.0005

Poisson

Mean

2.233

.020

.086

.016

.019

.083

.005

1.719

93.3

Upper

4.749

.033

.092

.027

.021

.095

.005

3.972

SGD_{1}

MLE

1.000

.798

.334

1.425

Inf

.591

.039

1.024

Inf

SGD_{2}

MLE

.439

.043

.042

3.241

Inf

.029

.003

2.649

Inf

We next tested the Poisson method on the yeast-polarization dataset, using
^{2}, the Poisson method incurred a 2.7-fold higher mean relative error, and only half of the CIs enclosed the true parameter values. Although less accurate for this example, the Poisson method required substantially less run time than MCEM^{2}: three hours versus ∼30 days on a single processor. This difference reflects the significant cost of simulating trajectories with the SSA rather than using a Poisson approximation.

Finally, we tested SGD on the yeast-polarization dataset using the same options as in previous examples ("SGD_{1}"). As in the decay-dimerization model, the SGD estimate for one of the parameters (_{5}) tended to infinity within nine steps of the algorithm (and thus resulted in an infinite mean relative error), even when using an initial gradient descent step size as small as 10^{−6}(see Table
_{2}"). This is in contrast to MCEM^{2} and SGD_{1}, which were run with initial parameter values set to a vector of ones. As before, the same parameter estimate tended to infinity, although this time 46 steps were required to do so. Although the yeast-polarization system contains combinations of reactions that leave species numbers unchanged, they are evidently not sufficient to allow adequate trajectory generation for a non-divergent gradient descent. Table

Discussion

This work presents MCEM^{2}, a novel enhancement of MCEM that accurately estimates unknown parameters of stochastic biochemical systems from observed data. MCEM^{2} combines a state of the art, adaptive implementation of MCEM (ascent-based MCEM) with algorithms from rare event simulation (the CE method and multilevel splitting) to substantially accelerate parameter estimation. Unlike a previous application of the EM algorithm to stochastic parameter estimation
^{2} concludes by executing an unmodified MCEM iteration. This places MCEM^{2} on solid theoretical foundations, with the CE phase of the algorithm serving only to accelerate the eventual MCEM phase. We note that this acceleration is essential for the method to be useful, as the use of unmodified MCEM is computationally tractable only when initial parameter estimates are close to the true values (see Figure
^{2} even further, without noticeable effects on the resulting parameter estimates. This was true even when using values of

MCEM^{2} requires selection of three additional user-defined quantities to achieve good performance: **z**, **y**), an observed data distance function; _{1} distance, intended to provide approximately equal weight to each of the system species. Although this distance function yielded excellent performance, other functions are certainly possible (e.g. sum of squared deviations). However, we note that work performed using the related approximate Bayesian computation (ABC) methods suggests that the resulting parameter estimates are not sensitive to the choice of the distance metric
^{4} and ^{5} and lowered ^{4} and ^{2} with early termination resembles the ABC method of Toni ^{2} automatically chooses this threshold using the parameter ^{2} needs no prior parameter information. This latter difference also sets our method apart from the SML and histogram-based approaches for identifying MLEs

Another important advantage of MCEM^{2} over existing MLE methods is the ease with which it can estimate parameter uncertainty. Existing MLE methods return parameter point estimates, but these estimates carry no measures of confidence or interdependency. In contrast, MCEM^{2} returns a multivariate parameter uncertainty estimate. This estimate indicates correlations between particular parameter estimates (see Figures
^{2} assumes that MLEs are multivariate log-normally distributed, which can be shown to be true as the number of data points increases asymptotically. However, 30 data points appear to be sufficient to satisfy this assumption (Figure
_{5} and _{6} of auto-regulatory gene network: Figure
^{2} was still able to correctly identify the ratio of the parameters. We note that Bayesian approaches like the Poisson approximation method also generate multivariate parameter uncertainty estimates which provide similar information to that given by MCEM^{2}.

We compared MCEM^{2} to the recently proposed Poisson approximation and SGD approaches by applying all three methods to four examples: birth-death process, decay-dimerization, auto-regulatory gene network, and the yeast-polarization model. Overall, the results demonstrate that MCEM^{2} performs relatively well for all examples. The first example illustrated that predictions made by the Poisson approximation method increasingly lose accuracy as species molecule counts tend to zero. MCEM^{2} avoids any such accuracy loss due to its exact simulation of consistent trajectories. The second example illustrated a limitation of the SGD method: to function properly, it requires systems to contain combinations of reactions that do not alter species counts. MCEM^{2} (as well as the Poisson method) imposes no such requirement. The divergence of the gradient descent in the yeast-polarization model also suggests that the mere presence of these combinations of reactions are not sufficient to lead to good SGD performance.

When functioning correctly on larger systems, an advantage of both SGD and the Poisson approximation method over MCEM^{2} is their lower required computational time. In particular, SGD ran 3.78-fold faster than MCEM^{2} for the auto-regulatory gene network, and the Poisson method ran an additional 36.8-fold faster than SGD. On the yeast-polarization model, the Poisson method ran 240-fold faster than MCEM^{2}. These speed-ups are due to both methods’ "simulation free" approaches for generating consistent trajectories, which is advantageous for computationally expensive models. Although the CE phase of MCEM^{2} typically completes in only a few iterations, the MCEM phase can require ≥ 100 iterations, with each iteration modifying the parameter estimates only slightly. Thus, a modified version of MCEM that takes larger steps in parameter space would further accelerate convergence. Such modifications have previously been described in the literature
^{2}. We note that one simple way to reduce the computational time required by MCEM^{2} is to simulate trajectories in parallel, using either clusters of CPUs (central processing units) or GPUs (graphics processing units). Since each consistent trajectory can be simulated independently of all others, the computation time of each MCEM^{2} iteration can in principle be reduced to the longest time required to simulate a single consistent trajectory.

One final enhancement that would broaden the applicability of MCEM^{2} involves accommodating measurement error in the observed data. Implementing this enhancement would be relatively straightforward given probabilistic error with known distribution. In this case, we could simply replace the indicator function in Equation 4b with the corresponding density function of the error, given a simulated trajectory. This modification would substantially improve the efficiency of MCEM^{2}, as any simulated trajectory could now have a nonzero likelihood of generating the observed data (and thus all trajectories could be consistent with observed data). Future work will focus on incorporating this enhancement into MCEM^{2}.

Conclusions

In this work, we developed Monte Carlo Expectation-Maximization with Modified Cross-Entropy Method (MCEM^{2}), a novel method for maximum likelihood parameter estimation of stochastic biochemical systems. Through applying MCEM^{2} to five example systems, we demonstrated its accurate performance and distinct advantages over existing methods. We expect these advantages to permit analysis of larger and more realistic biochemical models, ultimately providing an improved mechanistic understanding of important biological processes.

Algorithm 1: Pseudo-code for CE phase of MCEM^{2}

1:

2: **while **^{(m) }> 0 **do**

3:

4: _{0} ← 0

5:

6: **for ****do**

7: **for ****do**

8: generate

9: _{i−1}

10: **if ****then**

11: **x** ← **x**_{0}

12: **else**

13: **x** ← final state of

14: **end if**

15: **while **_{i }**do**

16: compute all _{j}(**x**)

17: generate ^{′ }using the SSA with

18: **x** to reflect the firing of reaction

19: **end while**

20: **end for**

21:

22: **if ****then**

23: replace
^{(m)})

24: **end if**

25: **end for**

26: compute

27: **end while**

28: **return**

Algorithm 2: Pseudo-code for MCEM phase of MCEM^{2}

1:

2: **while** (upper bound of the change in conditionallog-likelihood > .005) **do**

3:

4: **if ****then**

5: increment ^{′ }as described in

6: **end if**

7: _{0} ← 0

8:

9: **for ****do**

10: **for **^{′} = 1 to ^{′ }**do**

11: _{i−1}

12: **if ****then**

13: **x** ← **x**_{0}

14: **else**

15:

16: **end if**

17: **while **_{i }**do**

18: compute all _{j}(**x**)

19: generate ^{′ }using the SSA with

20: **x** to reflect the firing of reaction

21: **end while**

22: **if**
**then**

23: reset

24: **go to** step 11

25: **end if**

26: **end for**

27: **end for**

28: compute

29: **end while**

30: **return**

Algorithm 3: Pseudo-code for computing MCEM^{2} uncertainty estimates

1: _{0} ← 0

2:

3: **for ****do**

4: **for **^{′} = 1 to ^{′ }**do**

5: _{i−1}

6: **if ****then**

7: **x** ← **x**_{0}

8: **else**

9:

10: **end if**

11: **while **_{i }**do**

12: compute all _{j}(**x**)

13: generate ^{′ }using the SSA with

14: **x** to reflect the firing of reaction

15: **end while**

16: **if**
**then**

17: reset

18: **go to** step 5

19: **end if**

20: **end for**

21: **end for**

22: compute

23: **return**

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Conceived and designed the experiments: BJDJ MKR LRP JN. Performed the experiments: BJDJ MKR. Wrote the paper: BJDJ LRP JN. All authors read and approved the final manuscript.

Acknowledgements

We thank Matthew Wheeler for useful suggestions and comments on this work. We also acknowledge the following financial support: BJDJ and LRP were supported by the Institute for Collaborative Biotechnologies through grant W911NF-09-0001 from the U.S. Army Research Office. The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. MKR and LRP were supported by NIH Grant No. 5R01EB007511-03 and DOE Grant No. DE-FG02-04ER25621. LRP was also supported by NSF Grant No. DMS-1001012.