Electrical Engineering and Computer Science Department, University of Central Florida, Orlando FL 32816 USA

Computer Science Department, Carnegie Mellon University, Pittsburgh PA 15213 USA

Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh PA 15213 USA

Abstract

Stochastic Differential Equations (SDE) are often used to model the stochastic dynamics of biological systems. Unfortunately, rare but biologically interesting behaviors (e.g., oncogenesis) can be difficult to observe in stochastic models. Consequently, the analysis of behaviors of SDE models using numerical simulations can be challenging. We introduce a method for solving the following problem: given a SDE model and a high-level behavioral specification about the dynamics of the model, algorithmically decide whether the model satisfies the specification. While there are a number of techniques for addressing this problem for discrete-state stochastic models, the analysis of SDE and other continuous-state models has received less attention. Our proposed solution uses a combination of Bayesian sequential hypothesis testing,

Background

The dynamics of biological systems are largely driven by stochastic processes and subject to random external perturbations. The consequences of such random processes are often investigated through the development and analysis of stochastic models (e.g.,

Existing methods for validating and analyzing stochastic models often require extensive Monte Carlo sampling of independent trajectories to verify that the model is consistent with known data, and to characterize the model's expected behavior under various initial conditions. Sampling strategies are either unbiased or biased. Unbiased sampling strategies draw trajectories according to the probability distribution implied by the model, and are thus not well-suited to investigating rare behaviors. For example, if the actual probability that the model will exhibit a given behavior is 10^{-10}, then the expected number of samples need to see such behaviors is about 10^{10 }(See Figure

Observing rare behaviors in i.i.d sampling is challenging

**Observing rare behaviors in i.i.d. sampling is challenging**. A toy model with a one low-probability state. An unbiased sampling algorithm may require billions of samples in order to observe the 'bad' state. Statistical algorithms based on

Our method uses a combination of biased sampling and sequential hypothesis testing

Related work

Our method performs statistical model checking using hypothesis testing

Methods

Our method draws on concepts from several different fields. We begin by briefly surveying the semantics of stochastic differential equations, a language for formally specifying dynamic behaviors, Girsanov's theorem on change of measures, and results on consistency and concentration of Bayesian posteriors.

Stochastic differential equation models

A stochastic differential equation (SDE)

where

1. _{0 }= 0

2. _{t }

3. _{t }

• _{t }_{s }_{t' }- _{s' }are independent if 0 ≤

•

Consider the time between 0 and _{1}, _{2 }... _{m }

In what follows,

Girsanov's theorem for perturbing stochastic differential equation models

Given a process {_{t }_{t }

Here, _{t }

Specifying dynamic behaviors

Next, we define a formalism for encoding

** Definition **1 (Adapted Finitely Monitorable). Let

Certain AFM specifications can be expressed as formulas in

** Definition **2 (Probabilistic Adapted Finitely Monitorable)

Some common examples of PAFM specifications include Probabilistic Bounded Linear Temporal Logic (PBLTL) (e.g., see

Semantics of bounded linear temporal logic (BLTL)

We define the semantics of BLTL with respect to the paths of _{0}, Δ_{0}), (_{1}, Δ_{1}),... be a sampled execution of the model along states _{0}, _{1},... with durations Δ_{0}, Δ_{1}, ^{i }^{0 }denotes the original execution

1. ^{k }

2. ^{k }_{1 }∨_{2 }if and only if ^{k }_{1 }or ^{k }_{2}.

3. ^{k }_{1 }∧ _{2 }if and only if ^{k }_{1 }and ^{k }_{2};

4. ^{k }_{1 }if and only if ^{k }_{1 }does not hold.

5. ^{k }⊨ _{1}**U**^{t}_{2 }if and only if there exists _{0 ≤ l <i}Δ_{k+l}≤ ^{k+i}⊨ _{2}; and (c) for each 0 ≤ ^{k+j}⊨ _{1};

Statistical model validation

Our algorithm performs statistical model checking using Bayesian sequential hypothesis testing

Sequential hypothesis testing

Let _{0 }: _{1 }: _{0 }indicates that _{1 }denotes that

** Definition **3 (Type I and II errors). A Type I error is an error where the hypothesis test asserts that the null hypothesis

The basic idea behind _{i }_{0 }or _{1}.

Bayesian sequential hypothesis testing

Recall that for any finite trace _{i }_{i }_{i }_{i }_{i }_{i }_{i }_{q }ϕ_{i }

Bayesian statistics requires that

Suppose we have a sequence of _{1},..., _{n }_{1},..., _{n}

** Definition **4. The Bayes factor

The Bayes factor may be used as a measure of relative confidence in _{0 }vs. _{1}, as proposed by Jeffreys

We note that the Bayes factor depends on both the data _{0 }vs. _{1 }provided by the data _{1},..., _{n}

Non-i.i.d. Bayesian sequential hypothesis testing

Traditional methods for hypothesis testing, including those outlined in the previous two subsection*s, assume that the samples are drawn

We begin by reviewing some fundamental concepts from Bayesian statistics including KL divergence, KL support, affinity, and

** Definition **5 (Kullback-Leibler (KL) Divergence). Given a parameterized family of probability distributions {

** Definition **6 (KL Neighborhood). Given a parameterized family of probability distributions {

** Definition **7 (KL Support). A point

** Definition **8 (Affinity). The affinity Aff(

** Definition **9 (Strong

Given these definitions, it can be shown that the Bayesian posterior concentrates exponentially under certain technical conditions

Bounding errors under a change of measure

Next, we develop the machinery needed to compute bounds on the Type-I/Type-II errors for a testing strategy based on non-

A stochastic differential equation model _{1}, _{2},... to _{i }_{i }_{i }

We use the following result regarding change of measures. Suppose a given behavior, say

Here, _{i }^{th }_{i }

Note that the term _{i }_{i }_{i }_{1}, _{2},... _{n}

A sampling algorithm can compute

Consider the following expression that is computable without knowing the implied Radon-Nikodym derivative or change of measure explicitly.

Now, we can rewrite the above expression as:

Our result will exploit the fact that we do not allow our testing or sampling procedures to have arbitrary implied Radon-Nikodym derivatives. This is reasonable as no statistical guarantees should be available for an intelligently designed but adversarial test procedure that (say) tries to avoid sampling from the given behavior. Suppose that the implied Radon-Nikodym derivative always lies between a constant

Furthermore,

Thus, by allowing the sampling algorithm to change measures by at most ^{2}.

**Example: **Suppose, the testing strategy has made _{1}, _{2},... _{n}

Similarly,

Termination conditions for non-i.i.d. sampling

Traditional (i.e.,

To consider the conditions under which our algorithm will terminate after observing ^{2n }can outweigh the gain made by the concentration of the probability measure ^{-nb}. This is not surprising because our construction thus far does not force the test

** Definition **10. A testing strategy is

Note that a fair test strategy does

** Definition **11. An

The notion of a

Algorithm

Finally, we present our Statistical Verification algorithm (See Figure ^{2n }if the Bayes Factor is larger than one. If the Bayes Factor is less than one, the algorithm multiplies the Bayes Factor by the factor ^{2n}.

Non-i.i.d. Statistical Verification Algorithm

**Non-i.i.d. Statistical Verification Algorithm**. The figure illustrated the

Results and discussion

We applied our algorithm to two SDE models of tumor dynamics from the literature. The first model is a single dimensional stochastic differential equation for the influence of chemotherapy on cancer cells, and the second model is a pair of SDEs that describe an immunogenic tumor.

Lefever and Garay model

Lefever and Garay

Here, _{0}_{0 }is the linear per capita birth rate of cancer cells, _{t }

We demonstrate our algorithm on a simple property of the model. Namely, starting with a tumor consisting of a billion cells, is there at least a 1% chance that the tumor could increase to one hundred billion cells under under immune surveillance and chemotherapy. The following BLTL specification captures the behavioral specification:

Figure

Comparison of i.i.d. and non-i.i.d. sampling

**Comparison of i.i.d. and non-i.i.d. sampling**. Non-i.i.d. vs i.i.d. sampling based verification for the Lefever and Garay model.

We note that there are circumstances when our algorithm may require

Nonlinear immunogenic tumor model

The second model we analyze studies immunogenic tumor growth

The parameters _{1 }an _{1 }denote the stochastic equilibrium point of the model. Briefly, the model assumes that the amount of noise increases with the distance to the equilibrium point.

For this model, we considered the following property: starting from 0.1 units each of tumor and immune cells, is there at least a 1% chance that the number of tumor cells could increase to 3.3 units. The property can be encoded into the following BLTL specification:

Default model parameters were those used in

Comparison of i.i.d. and non-i.i.d. sampling

**Comparison of i.i.d. and non-i.i.d. sampling**. Non-i.i.d. vs i.i.d. Sampling based verification for the nonlinear Immunogenic tumor model.

We also considered the property that the number of tumor cells increases to 4.0 units. We evaluated whether this property is true with probability at least 0.000005 under a Bayes Factor of 100, 000. The

Discussion

Our results confirm that non-

Conclusions

We have introduced the first algorithm for verifying properties of stochastic differential equations using sequential hypothesis testing. Our technique combines Bayesian statistical model checking and non-

The present paper only considers SDEs with independent Brownian noise. We believe that these results can be extended to handle SDEs with certain kinds of correlated noise. Another interesting direction for future work is the extension of these method to stochastic partial differential equations, which are used to model spatially inhomogeneous processes. Such analysis methods could be used, for example, to investigate properties concerning spatial properties of tumors, the propagation of electrical waves in cardiac tissue, or more generally, to the diffusion processes observed in nature.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SKJ and CJL contributed equally to all parts of the paper.

Acknowledgements

The authors acknowledge the feedback received from the anonymous reviewers for the first IEEE Conference on Compuational Advances in Bio and Medical Sciences (ICCABS) 2011.

This article has been published as part of