Open Access Highly Accessed Methodology article

A continuous-time adaptive particle filter for estimations under measurement time uncertainties with an application to a plasma-leucine mixed effects model

Annette Krengel1*, Jan Hauth1, Marja-Riitta Taskinen2, Martin Adiels34 and Mats Jirstrand5

Author Affiliations

1 Fraunhofer-Institut für Techno- und Wirtschaftsmathematik (ITWM; Fraunhofer Institute for Industrial Mathematics), Kaiserslautern, Germany

2 Department of Medicine, University of Helsinki, Helsinki, Finland

3 Department of Mathematical Sciences, University of Gothenburg, Gothenburg, Sweden

4 Wallenberg laboratory, Sahlgrenska Center for Cardiovascular Research, Department of Medicine, University of Gothenburg, Gothenburg, Sweden

5 Fraunhofer Chalmers Centre (FCC), Gothenburg, Sweden

For all author emails, please log on.

BMC Systems Biology 2013, 7:8  doi:10.1186/1752-0509-7-8

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1752-0509/7/8


Received:31 October 2011
Accepted:7 January 2013
Published:19 January 2013

© 2013 Krengel et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

When mathematical modelling is applied to many different application areas, a common task is the estimation of states and parameters based on measurements. With this kind of inference making, uncertainties in the time when the measurements have been taken are often neglected, but especially in applications taken from the life sciences, this kind of errors can considerably influence the estimation results. As an example in the context of personalized medicine, the model-based assessment of the effectiveness of drugs is becoming to play an important role. Systems biology may help here by providing good pharmacokinetic and pharmacodynamic (PK/PD) models. Inference on these systems based on data gained from clinical studies with several patient groups becomes a major challenge. Particle filters are a promising approach to tackle these difficulties but are by itself not ready to handle uncertainties in measurement times.

Results

In this article, we describe a variant of the standard particle filter (PF) algorithm which allows state and parameter estimation with the inclusion of measurement time uncertainties (MTU). The modified particle filter, which we call MTU-PF, also allows the application of an adaptive stepsize choice in the time-continuous case to avoid degeneracy problems. The modification is based on the model assumption of uncertain measurement times. While the assumption of randomness in the measurements themselves is common, the corresponding measurement times are generally taken as deterministic and exactly known. Especially in cases where the data are gained from measurements on blood or tissue samples, a relatively high uncertainty in the true measurement time seems to be a natural assumption. Our method is appropriate in cases where relatively few data are used from a relatively large number of groups or individuals, which introduce mixed effects in the model. This is a typical setting of clinical studies. We demonstrate the method on a small artificial example and apply it to a mixed effects model of plasma-leucine kinetics with data from a clinical study which included 34 patients.

Conclusions

Comparisons of our MTU-PF with the standard PF and with an alternative Maximum Likelihood estimation method on the small artificial example clearly show that the MTU-PF obtains better estimations. Considering the application to the data from the clinical study, the MTU-PF shows a similar performance with respect to the quality of estimated parameters compared with the standard particle filter, but besides that, the MTU algorithm shows to be less prone to degeneration than the standard particle filter.

Keywords:
Particle filter; Sequential Monte Carlo methods; Nonlinear filtering; Parameter estimation; Measurement time uncertainties; PK/PD; Mixed effects; Leucine kinetics

Background

Measurement time uncertainties

Uncertainty in the time at which a measurement is taken is an often neglected source of random error. While in many application areas, this kind of error is generally small and indeed neglectable (due to automated measurements and precise timings), in others it may be of real influence, especially in the life sciences. As a prominent example, one may consider pharmacokinetic and pharmacodynamic (PK/PD) models which are used to describe the metabolic interactions and the effects of a chemical agent (like a drug or a labelled substance) over time inside an organism, respectively.

A typical population experiment in the PK/PD context consists in the analysis of the contents of the blood plasma of several individuals with respect to concentrations of certain molecules of interest. For this purpose, blood probes have to be taken from each individual at certain (fixed) time points after a certain event has occurred (e.g. a drug or a labelled substance has been applied). It is clear from the setting of the experiments that there is some variation in the real point in time when the blood probe has been taken: the true time when the measurement value has been obtained might be shortly before or after the intended time, and this true measurement time is not known to us. Since the inclusion of those time uncertainties in the model usually makes the analysis more difficult, it is standard to lump the time uncertainties with the measurement error. But especially at early times when concentrations change quickly, this may easily lead to wrong estimations, even if one assumes very high variances of the measurement error (we will demonstrate this later on a simple example). On the other hand, the inclusion of measurement time uncertainties (MTU) in algorithms aiming at inference making in complex models is not straightforward. In this article, we will present a modification of the Particle Filter (PF) algorithm (which we call MTU-PF) which is able to fully include a statistical model of the time uncertainties.

Inference in complex systems

The assessment of the effectiveness of a drug in a clinical study has been done in the past by the direct computation of relatively simple statistical values. The enormous increase in complexity of the underlying models, due to present developments in medicine and biology, for instance in the areas of personalized medicine or systems biology, increases also the need for more sophisticated model-based inference methods.

The estimation of unobservable internal variables or model parameters from data which have been obtained from blood or tissue samples at several time points can reveal information on the concentrations and effectiveness of the substance under question. If these data come from individuals which belong to two different (or even more) groups, e.g. test and control group, mixed effects are introduced in the underlying models. The inherent non-linearity and high variability of biological processes adds considerably to the difficulties one faces during the inference step. Inference in connection with dynamic models plays a major role in many other application areas. State and parameter estimation as well as model discrimination and validation are most common, but also optimal control problems should be mentioned.

It is often not enough to consider (independent) measurement noise [1]. Correlations between residuals are not uncommon, and the violation of this statistical assumption may lead to wrong estimates. A natural way to include correlated noise is to model two different types of noise: the dynamic (process or system) noise which is present in the dynamics of the system states and originates either from true random fluctuations in the system or from unmodelled dynamics in the system, and the measurement noise which is introduced by the measurement procedure or equipment and modelled by independent residuals. One possible approach is to use state space models which consist of a time-continuous model for the system states, e.g. based on Stochastic Differential Equations (SDEs), and a separate model for the time-discrete measurements.

Parameter estimation with Maximum Likelihood approach

Parameter estimation in state space systems is a difficult problem. In a context where the system dynamics are modelled by Ordinary Differential Equations (ODEs) without correlated noise, the problem is most often considered as a (deterministic) optimization problem based on a Maximum Likelihood (ML) formulation. An overview of these approaches can be found in [2] and [3]; see also [4], which consider also other aspects like identifiability. A generalization of the ML approach including more flexible cost functions is given by the prediction error estimation method ([5]). The introduction of system noise in the state variables leads to optimization problems with SDE constraints. In this case, internal system states which cannot be directly observed need to be estimated jointly with the parameters, given the data. For this purpose, the parameter estimation methods must be augmented by appropriate state filtering methods. An overview of ML parameter estimation in these types of models is given in [6]. If the SDEs are non-linear, linearizations to the Kalman Filter, like the Extended Kalman Filter (EKF) or the Unscented Kalman Filter (UKF), are used to establish approximations to the means and covariances of the filter distributions over time. All those approximations suffer from the fact that they approximate the filtering distributions of the states by a Gaussian distribution at all time points and cannot adequately approximate skewed or multimodal distributions. Better approximations are provided by simulation based methods like Sequential Monte Carlo (SMC) algorithms where good convergence results have been established ([7]). Nevertheless, they suffer from several drawbacks when applied to the joint estimation of dynamic states and fixed parameters ([8-10], see also [11]).

Parameter estimation in a Bayesian context

In a Bayesian context, in contrast to the “classical” ML approach, a prior distribution is assigned to the parameter vector, hence the parameters can be treated as random variables. In this sense, parameter estimation is done by evaluating the so-called posterior distribution which can be computed (at least theoretically) by Bayes’ theorem given the observations (measurements) and the prior distribution. In the context of high-dimensional spaces, this requires the computation of high-dimensional integrals which is not possible to do analytically. For this purpose, Markov Chain Monte Carlo (MCMC) methods provide powerful tools for the computation of simulation-based approximations to the posterior distribution. Again, in the context of the joint estimation of dynamic states and fixed parameters, the design of good proposal densities is a very difficult problem which renders the use of standard MCMC methods like the Metropolis-Hastings sampler impractical for the purposes of parameter estimation in state space systems.

It has long been a wish to combine both (dynamic) SMC and (static) MCMC methods to provide a general tool for the joint estimation of dynamic states and static parameters. Only recently, Andrieu et al. [11] proposed a very promising combination of both types of Monte Carlo approaches called Particle Markov Chain Monte Carlo (PMCMC) which is generally applicable and where also convergence has been proved.

In the present article, even though the PMCMC approach might be the preferred method for parameter estimation in state space systems, we will concentrate solely on the SMC methods, since our modification affects only this part. However, to be able to do parameter estimation in a pure SMC context, we rely on an approach that is very often used to avoid problems with the estimation of constant parameters. This approach consists in the introduction of artificial dynamics in the parameters, that means the parameters are allowed to slightly change their values over time. In this way, and in a Bayesian context, the parameters can be treated exactly in the same way as the system states. After building an augmented system state by concatenating the parameter vector and the state vector, the joint estimation of states and parameters reduces to filtering of the augmented state vector which makes SMC methods directly applicable to the problem.

Particle filters for state and parameter estimation

Particle filters ([12-14]) belong to the class of SMC methods for state filtering in state space models. Using the state augmentation approach, the method is also capable of estimating system parameters. The standard particle filter is designed for discrete, non-linear, and non-Gaussian models and can routinely be adapted to the continuous case with measurements at discrete times. The idea of the particle filter is that, at each time point, there is a sample based representation (the weighted particles) of the current estimate of the inner states and parameters which is based on the measurements that have been obtained up to the current time point. The particle cloud is then propagated through time, and the particles and weights are updated accordingly at each time point where measurements are available.

Non-Linear Mixed Effects models

Estimation in a Non-linear Mixed Effects model (NLME) involves the estimation of both global and individual parameters. With classical maximum likelihood estimation, the individual parameters are random variables equipped with a distribution while the global parameters remain constants with a “true” but unknown value. If the underlying model equations are non-linear, this leads to likelihood functions which are not analytically accessible and one has to rely on approximations. In the context where the system dynamics are modelled by ODEs, the most popular algorithm for NLME parameter estimation in the PK/PD context is the tool NONMEM ([15]). In [1] an estimation algorithm for NLME models based on Stochastic Differential Equations (SDEs) was proposed that uses the First-Order Conditional Estimation (FOCE) method to approximate the likelihood in combination with the EKF estimation in the SDEs. This has been added to NONMEM ([16]). In [17], a comparison between ODE and SDE based parameter estimation has been performed which showed that the interindividual variabilities were in general estimated to be smaller for the SDE model. Donnet and Samson ([18]) proposed a stochastic version of the Expectation-Maximization (SAEM) algorithm (for the estimation of the global parameters) in combination with MCMC methods (for the estimation of states and individual parameters). However, since MCMC exhibits slow mixing properties in the context of the estimation of states and parameters in state space models, in [19] MCMC has been replaced by the more promising PMCMC approach of Andrieu et al. ([11]).

On the other hand, in a Bayesian context, also the global parameters are equipped with a (prior) probability distribution, and the conceptual difference between global and individual parameters vanishes. The mixed effects model can then be considered as a hierarchical model with dependent parameters ([20,21], see also [22] for a more recent population-based Bayesian approach to PK/PD modelling). Simulation-based (Monte Carlo) methods can easily be adapted to this case. Nevertheless, the above mentioned challenges to both SMC and MCMC methods are even higher due to the increased number of states and parameters in NLME models (the number of states and individual parameters has to be multiplied by the number of individuals).

Aim of the article

Our goal is two-fold: Firstly, we want to show that the particle filter algorithm is applicable (with our modifications) also to more complex models when time uncertainties are formulated explicitly. Secondly, we want to show that the modification may even provide the possibility for further enhancement of the performance of the algorithm by presenting an adaptive time-stepping scheme which is only possible in the context of the new algorithm.

We do not claim that our MTU algorithm generally performs better or worse than the standard filter, nor that it should be the preferred method for estimation in non-linear mixed effects models. Rather, we provide a method which is usable for models where time uncertainties may play a major role. In these cases, it may indeed lead to better estimations. On the other hand, our method transfers the time-discrete particle filter approach, where updates based on the measurements very strictly depend on the measurement times, to a truly time-continuous approach, where updates to the filtering distributions can be performed at every point on the time-scale. Since we want to focus on the time uncertainties, we neglect discussing further issues like identifiability, model evaluation and model discrimination. In our application to the model of plasma-leucine kinetics, we try to avoid these issues by providing ad-hoc values to some of the parameters (especially to the variances of the system states).

Motivating example

Let us have a look at an example for illustrating the benefits of a separate modelling of measurement time uncertainties. Let us consider a state space system given by the ODE

d q ( t ) = ( - αq ( t ) + β ) d t

with parameters α , β R . Here q ( t ) R denotes the state of the system at time t. We call the state trajectories obtained by this deterministic system the nominal evolutions of the states. We add noise to the system in a standard way by introducing an additional term σ d W t , with a standard Wiener process ( W t ) t R 0 and a diffusion parameter σ R . This leads to the following SDE describing the evolution of the state q over time:

d q ( t ) = ( - α q ( t ) + β ) d t + σ d W t .

Furthermore, let the initial state q(0) be given by a log-normal distribution with parameters μ q 0 and σ q 0 2 (mean and variance of the logarithm of q(0), respectively). The parameters chosen in our implementation of this example are shown in Table 1.

Table 1. Parameters for the motivating example

We assume that M measurements of the state q(t) will be taken at times Tj and that each measurement j, j=1,…,M is disturbed by normal noise with mean q(tj) and with a fixed variance σ y 2 , i.e. measurement j is distributed according to y j N ( q ( t j ) , σ y 2 ) . Usually, the times Tj are assumed to be known. In contrast, we will assume that in addition to the measurement value error, there is some uncertainty about the exact times where the measurements have been taken. If we attempt to take the jth measurement at the intended (or nominal) time t ̂ j , the measurement in fact takes place at time Tj which may be shortly before or after the intended time t ̂ j . A natural way to model these uncertainties is to assume that the measurement time Tj is given as a realization of a random variable Tj. In our example, we assume that Tj follows a truncated normal distribution given by the density

γ j ( t j ) : = 1 Γ j exp ( t j - t ̂ j ) 2 0 . 3 2 if max { 0 , t ̂ j - 1 } t j t ̂ j + 1 , 0 otherwise

with normalizing constant

Γ j : = max { 0 , t ̂ j - 1 } t ̂ j + 1 exp ( t j - t ̂ j ) 2 0 . 3 2

and given intended measurement times t ̂ j . Figure 1 shows the different distributions for one measurement j for all possible intended times t ̂ j . In each of the subfigures (a)-(d), the shaded green area gives an impression of the “density” of the distribution of the measurement value yj in dependence of the intended measurement time t ̂ j on the x-axis, while the dark-green dashed line depicts the nominal evolution of the state q over time. Subfigure (a) shows the distribution of the measurement values with time uncertainties, while (b)-(d) depict the distribution of the measurement values with known measurement times (in this case t ̂ j = t j ), for several standard deviations σy: in (b), the original standard deviation is used, while in (c) and (d), higher standard deviations are used which correspond to the cases with lumped value and time variations.

thumbnailFigure 1. Assumed measurement distributions for the motivating example. Measurement distribution resulting from (a) separate modelling of measurement time uncertainties and measurement value uncertainties, with σy=0.005, and (b-d) lumped time-and-value uncertainties with several different assumed lumped measurement variances σy. The dashed dark-green line depicts the nominal evolution of the state q over time. The green shaded area depicts the region where the measurements are expected.

Comparing Figures 1(a) and 1(b-d), we observe that the distributions of the measurements exhibit clearly different shapes. For the “true” model depicted in Figure 1(a), if we consider a single point in time that lies in a time segment where the state values change quickly, the distribution of the measurement at this certain point in time is quite broad. The variance in the measured value is very high, whereas it is small in time segments where the state values change slowly. In contrast, for the standard particle filter, the measurement variance is constant and hence the assumed measurement distributions differ remarkably from the “true” distributions, howsoever the value of σ y 2 is chosen. It must be expected that this leads to difficulties when inference on the states and parameters needs to be done based on these models. We will resume our example after having presented the MTU particle filter and will show that this is indeed the case.

Methods

We divide this section into three subsections. In the first subsection, we fix the state and observation model we want to consider. In the second subsection entitled “Standard case” we outline the standard particle filter algorithm in the context of time-continuous states with time-discrete measurements, and the various probability distributions involved. Although nothing is new in this subsection, it serves several purposes. Firstly, the time-continuous case is relatively rarely considered in the literature; secondly, the derivation of our modification needs a slightly more general formulation than it is standard for the discrete-time filter; and lastly, the comparison of our modified version with the standard case might more clearly reveal the differences between the two approaches. In the third subsection entitled “MTU particle filter”, we present our new modification of the particle filter. In the following section “Results and Discussion”, we compare the new MTU particle filter to the standard particle filter and to an alternative Maximum Likelihood estimation method on a simple artificial example. We also present an application of our MTU-PF method to a PK/PD study in a non-linear mixed-effects setting in direct comparison with the standard particle filter.

Note: a list of all used symbols with a short explanation can be found at the end of this paper.

The model

State process

Let ( Ω , A , P ) be a probability space and for each t∈ [t0,) with t 0 R let ( X t , B X t ) be an arbitrary measurable space. For each t∈ [t0,) let further X t : Ω X t be an A - B X t measurable random variable such that X [ t 0 , ) : = ( X t ) t [ t 0 , ) is a continuous-time Markov process with general state space

X [ t 0 , ) : = t 0 s X s .

For each t∈ [t0,), denote by L X t the pushforward measure of P under Xt, i.e. L X t ( B ) : = P ( X t - 1 ( B ) ) for all B B X t . Further, denote by L X [ t 0 , ) the pushforward measure of P under X [ t 0 , ) : = ( X s ) s [ t 0 , ) (with the corresponding product algebra). Analogously, denote by

X [ t 0 , t ] : = t 0 s t X s for each t t 0

the state space restricted to the interval [t0,t], and denote by L X [ t 0 , t ] the corresponding pushforward measure. For each s and t with t>st0, let Ks,t(xs, dxt) be the Markov kernel of the process X [ t 0 , ) from time s to time t.

An important special case for X [ t 0 , ) is given by a multidimensional Itô process on X t = R n (equipped with the corresponding Borel σ-algebra) defined through a stochastic differential equation (SDE)

d X t = a ( X t , t ) d t + B ( X t , t ) d W t

with drift a(x,t), diffusion matrix B(x,t), multidimensional standard Wiener process W t , and initial variable X t 0 . In this case, it is possible to sample directly (at least approximately) from the kernels Ks,t when a suitable discretization method is applied, for instance the Euler-Maruyama method.

Observations / measurements

Let the process X [ t 0 , ) be observed via M random variables Y1:M with values in measurable spaces ( Y j , B Y j ) . Each single observation (measurement) yj depends on the state variable x t j at some time Tj and on the observation time (measurement time) Tj itself. We assume that, given the observation time Tj and the state X t j = x t j , the variable yj is independent of all other variables, and the conditional measure can be expressed via some conditional probability density g j ( y j | x t j , t j ) with respect to a reference measure μ Y j on ( Y j , B Y j ) . We do not require any further restrictions on g such as linear dependence on the states or Gaussianity.

Observation / measurement times

The observation times (measurement times) Tj for j=1,…,M are usually assumed to be deterministically given and known. Our variant of the particle filter will be based on the assumption that the observation times Tj are themselves realizations of random variables Tj. These variables model the uncertainty about exact observation times. In contrast to the observation variables yj, the observation times Tj are never observed (measured). We assume that all information available to us is their probability distribution on the half axis [t0,), while in the case of the observations yj, we know both the densities g j ( y j | x t j , t j ) and the observed values yj.

In this article, we will only consider the simplest case where each variable Tj is independent of all others. Dependencies between the Tj’s, especially concerning the order of the observation times, may be considered natural but would lead to more complicated algorithms. However, order dependencies can easily be introduced via restrictions on the support of the variables. In general, the probability distribution of every single variable Tj shall be given by a density γj(tj) with respect to the Lebesgue measure λ [ t 0 , ) on the interval [t0,).

In the following, we will consider the two cases mentioned, where either all Tj are deterministic and known or all Tj are random and unknown. Note that the first case formally coincides with the case that Tj is random but observed. We will therefore stick to the notation g j ( y j | x t j , t j ) for the observation densities in both cases.

Standard case: measurement times deterministic and known

We will first consider the standard case, where the observation times Tj are known. For simplicity, we assume here that the observation times t1:M are strictly ordered increasingly, i.e. t0<t1⋯<tM.

The standard case of the particle filter is usually formulated for discrete-time Markov processes X t 0 : M : = ( X t j ) j { 0 , , M } with general state space where the state variables are only defined at the initial time t0 and at the times t1,…,tM when measurements occur. Nevertheless, this case is included in our more general framework where Xt is defined for all tt0. One just focuses on the state variables for those times only. In view of the later generalization to random observation times, we will consider the fixed values Tj as realizations of random variables Tj and condition all occurring densities on them. As mentioned above this assumption leads to the same results as if we assumed the values Tj to be given deterministically.

Full model and filter model

The full model is given by the joint density of the variables X t 0 : M and Y1:M (conditioned on the observation times T1:M=t1:M) with respect to the product measure L X t 0 : M j = 1 M μ Y j :

f X t 0 : M , Y 1 : M | T 1 : M ( x t 0 : M , y 1 : M | t 1 : M ) : = j = 1 M g j ( y j | x t j , t j ) . (1)

The filter at a given time tk is based on a reduced model. This model is given by the joint density of the variables X t 0 : k and Y1:k (conditioned on T1:M=t1:M) with respect to the product measure L X t 0 : k j = 1 k μ Y j :

f X t 0 : k , Y 1 : k | T 1 : M ( x t 0 : k , y 1 : k | t 1 : M ) : = j = 1 k g j ( y j | x t j , t j ) . (2)

This density is based on the state sequence X t 0 : k . In contrast, we can focus on the single state X t k by considering the joint density of the variables X t k and Y1:k (given T1:M=t1:M) with respect to L X t k j = 1 k μ Y j . It can be computed by marginalization as follows:

f X t k , Y 1 : k | T 1 : M ( x t k , y 1 : k | t 1 : M ) : = x ~ t 0 : k X t 0 : k : x ~ t k = x t k f X t 0 : k , Y 1 : k | T 1 : M x ~ t 0 : k , y 1 : k | t 1 : M × d L X t 0 : k x ~ t 0 : k (3)

and the filter density at time tk with respect to L X t k can then be computed with Bayes’ theorem:

f X t k | Y 1 : k , T 1 : M ( x t k | y 1 : k , t 1 : M ) : = f X t k , Y 1 : k | T 1 : M ( x t k , y 1 : k | t 1 : M ) f Y 1 : k | T 1 : M ( y 1 : k | t 1 : M ) (4)

with

f Y 1 : k | T 1 : M ( y 1 : k | t 1 : M ) : = X t 0 : k f X t 0 : k , Y 1 : k | T 1 : M x t 0 : k , y 1 : k | t 1 : M d L X t 0 : k x t 0 : k . (5)

For general (non-linear) models, the practical computation of the filter density is very difficult. Nevertheless, the particle filter computes a Monte Carlo approximation using the fact that the filter densities f X t k | Y 1 : k , T 1 : M can be computed recursively. This is done in two steps. We consider the filter distribution at time tk-1 given by the probabilities

P X t k - 1 B | Y 1 : k - 1 = y 1 : k - 1 , T 1 : M = t 1 : M = B f X t k - 1 | Y 1 : k - 1 , T 1 : M x t k - 1 | y 1 : k - 1 , t 1 : M d L X t k - 1 x t k - 1 (6)

for each set B B X t k - 1 . We then get first the prediction distribution, i.e. the distribution of X t k given the data Y1:k-1 (and t1:M), by use of the kernel K t k - 1 , t k :

P ( X t k B | Y 1 : k - 1 = y 1 : k - 1 , T 1 : M = t 1 : M ) = B X t k - 1 f X t k - 1 | Y 1 : k - 1 , T 1 : M x t k - 1 | y 1 : k - 1 , t 1 : M × d L X t k - 1 x t k - 1 K t k - 1 , t k x t k - 1 , d x t k (7)

for each set B B X t k . Then we use Bayes’ theorem to get the filter distribution at time tk:

P ( X t k B | Y 1 : k = y 1 : k , T 1 : M = t 1 : M ) = B g k ( y k | x t k , t k ) f Y k | Y 1 : k - 1 , T 1 : M ( y k | y 1 : k - 1 , t 1 : M ) · X t k - 1 f X t k - 1 | Y 1 : k - 1 , T 1 : M ( x t k - 1 | y 1 : k - 1 , t 1 : M ) × d L X t k - 1 x t k - 1 K t k - 1 , t k x t k - 1 , d x t k (8)

for each set B B X t k , with normalizing constant

f Y k | Y 1 : k - 1 , T 1 : M ( y k | y 1 : k - 1 , t 1 : M ) : = X t k g k ( y k | x t k , t k ) · X t k - 1 f X t k - 1 | Y 1 : k - 1 , T 1 : M x t k - 1 | y 1 : k - 1 , t 1 : M × d L X t k - 1 x t k - 1 K t k - 1 , t k x t k - 1 , d x t k . (9)

Importance sampling

Another ingredient for the particle filter is sequential importance sampling. We assume that a second Markov chain X ~ t 0 : M on the same state space is given with pushforward measures L X ~ t j and kernels K ~ t j - 1 , t j x t j - 1 , d x t j for j=1,…,M. We assume that for each x t j - 1 X t j - 1 , the measure K t j - 1 , t j x t j - 1 , · is absolutely continuous with respect to the measure K ~ t j - 1 , t j x t j - 1 , · . It follows that the Radon-Nikodym derivative (written as conditional density)

ϱ t j | t j - 1 ( x t j | x t j - 1 ) : = K t j - 1 , t j ( x t j - 1 , d x t j ) K ~ t j - 1 , t j ( x t j - 1 , d x t j )

exists. We further assume that the pushforward measure L X t 0 under X t 0 is absolutely continuous with respect to the corresponding pushforward measure L X ~ t 0 under X ~ t 0 with Radon-Nikodym derivative

ϱ t 0 ( x t 0 ) : = d L X t 0 ( x t 0 ) d L X ~ t 0 ( x t 0 ) .

For sequential importance sampling, we need to be able to sample from the initial measure L X ~ t 0 and from the kernels

K ~ t j - 1 , t j x t j - 1 , ·

for each x t j - 1 X t j - 1 , and to compute ϱ t 0 ( x t 0 ) as well as ϱ t j | t j - 1 ( x t j | x t j - 1 ) pointwise.

Using

K t k - 1 , t k x t k - 1 , d x t k = ϱ t k | t k - 1 x t k | x t k - 1 K ~ t k - 1 , t k x t k - 1 , d x t k ,

we can then write the recursive formula (8) for the filter distribution at time tk as

P ( X t k B | Y 1 : k = y 1 : k , T 1 : M = t 1 : M ) = B g k ( y k | x t k , t k ) f Y k | Y 1 : k - 1 , T 1 : M ( y k | y 1 : k - 1 , t 1 : M ) · X t k - 1 f X t k - 1 | Y 1 : k - 1 , T 1 : M x t k - 1 | y 1 : k - 1 , t 1 : M ϱ t k | t k - 1 x t k | x t k - 1 × d L X t k - 1 x t k - 1 K ~ t k - 1 , t k ( x t k - 1 , d x t k ) (10)

for each B B X t k . The direct computation of the normalizing constants f Y k | Y 1 : k - 1 , T 1 : M ( y k | y 1 : k - 1 , t 1 : M ) (while Y1:M is assumed to be fixed) is not necessary. Sequential importance sampling is performed as follows. Draw a number N of realizations x t 0 i from L X ~ t 0 and compute the corresponding unnormalized weights

w t 0 i : = ϱ t 0 x t 0 i for all i = 1 , , N .

Then, for all k=1,…,M, sample realizations x t k i from the kernel K ~ t k - 1 , t k ( x t k - 1 i , d x t k ) for each i=1,…,N and compute the unnormalized weights

w t k i : = ϱ t k | t k - 1 x t k i | x t k - 1 i g k y k | x t k i , t k w t k - 1 i for all i = 1 , , N .

For suitable integrable functions h (e.g. fulfilling some mild restrictions on how fast h may increase with x, see [23] for details), the expectation of h with respect to the filter density conditioned on the observations Y1:k=y1:k, given by

E h X t k | Y 1 : k = y 1 : k , T 1 : M = t 1 : M : = E f X t k | Y 1 : k = y 1 : k , T 1 : M = t 1 : M ( · | y 1 : k , t 1 : M ) h ( X t k ) = f X t k | Y 1 : k , T 1 : M ( x t k | y 1 : k , t 1 : M ) h ( x t k ) d L X t k x t k , (11)

can then be approximated by

E t k , N h X t k | Y 1 : k = y 1 : k , T 1 : M = t 1 : M : = i = 1 N w t k i h x t k i i = 1 N w t k i (12)

where N is the number of particles. In fact, it can be shown that as N approaches infinity, these empirical expectations converge to the filter expectations:

lim N E t k , N h ( X t k ) | Y 1 : k = y 1 : k , T 1 : M = t 1 : M = E h X t k | Y 1 : k = y 1 : k , T 1 : M = t 1 : M . (13)

Note that if we can sample from the Markov kernels of x t j , we can choose X ~ t j = X t j (at least in law), whence ϱ t 0 ( x t 0 ) 1 and ϱ t j | t j - 1 ( x t j | x t j - 1 ) 1 . This is a standard choice, but in terms of efficiency of the particle filter algorithm not always the best one. On the other hand, finding a suitable Markov chain X ~ t 0 : M different from X t 0 : M is not an easy task.

Resampling

If the number N of samples through time is fixed, the samples obtained by sequential importance sampling quickly degenerate since most of the normalized weights decrease rapidly towards 0. The degree of degeneracy is often measured by an estimate of the so-called effective sample size (ESS). This estimate at time t is given by

n ESS : = 1 i = 1 N w ~ t i 2 (14)

where

w ~ t i : = w t i ν = 1 N w t ν (15)

are the normalized weights. It obtains its maximal value N if all weights are equal, and it approaches 1 if the variance of the weights and thus the degree of degeneracy increases. To avoid this degeneration of the samples, a resampling step needs to be done when the ESS drops below a threshold NThreshold (which is usually chosen to be N/2).

Resampling at some time s is based on given non-negative (unnormalized) selection weights v s i for each particle index i: One repeatedly selects particles with probabilities p i given by the normalized selection weights

p i : = v s i ν = 1 N v s ν . (16)

This is multinomial resampling. There exist procedures where each single particle is still selected with probability p i , but with reduced overall variance, for instance stratified resampling or systematic resampling which should be preferred [24,25]. In any case, resampling defines a selection function ι:II on the index set I: ={1,…,N}. Resampling is then done by replacing the state samples x s i i = 1 , , N by the selected state samples x s ι ( i ) i = 1 , , N . Since before selection the probability that the particle i will be chosen is p i for each draw, the expected number of times that particle i has been chosen after N draws is N p ι ( i ) . To correct for the introduced bias, the normalized weight w ~ s i for each selected particle i needs then to be corrected by replacing it by the weight

w ~ s ι ( i ) N p ι ( i ) / ν = 1 N w ~ s ι ( ν ) N p ι ( ν ) = w s ι ( i ) v s ι ( i ) / ν = 1 N w s ι ( ν ) v s ι ( ν ) (17)

(using (16)). The necessary correction is therefore achieved if the unnormalized weights ( w s i ) i = 1 , , N are replaced by the corrected unnormalized weights ( w s ι ( i ) / v s ι ( i ) ) i = 1 , , N .

Note that in the original particle filter, the selection weights v s i at time s are chosen to be the particle weights (before the replacement), i.e.

v s i = w s i for i = 1 , , N ,

such that after the resampling step the unnormalized weights are all equal to 1. Nevertheless, in general their choice is free and may be based on the observations (which is used in the so-called auxiliary particle filter [26]).

Particle filter algorithm

The particle filter computes the state realizations and weights recursively through time. In its standard form, the particle filter can be stated as in algorithm 1.

Note that if one chooses X ~ [ t 0 , ) = X [ t 0 , ) (in law), then ϱ t k | t k - 1 ( x t k i | x t k - 1 i ) 1 and the update of the weights simplifies to

w t k i = g k y k | x t k i , t k w t k - 1 i .

Data Likelihood

Model validation or discrimination is generally based on the data likelihood

Z t k ( t 1 : M ) : = f Y 1 : k | T 1 : M ( y 1 : k | t 1 : M ) = X t 0 : k f X t 0 : k , Y 1 : k | T 1 : M x t 0 : k , y 1 : k | t 1 : M d L X t 0 : k x t 0 : k = E f X t 0 : k , Y 1 : k | T 1 : M ( · , y 1 : k | t 1 : M ) (18)

Algorithm 1 Standard particle filter

for given observations Y1:k. Without resampling, the data likelihood could be approximated by the empirical mean of the unnormalized weights, i.e. by

Z ̂ t k ( t 1 : M ) : = 1 N i = 1 N w t k i (19)

because this is the empirical estimate for the above expectation. After a resampling step, this is not valid any longer. Nevertheless, in any case, the data likelihood can be computed recursively by the following estimate of the ratio Z t k ( t 1 : M ) / Z t k - 1 ( t 1 : M ) :

Z t k ( t 1 : M ) ̂ Z t k - 1 ( t 1 : M ) : = i = 1 N ϱ t k | t k - 1 x t k i | x t k - 1 i g k y k | x t k i , t k w t k - 1 i i = 1 N w t k - 1 i , (20)

with initial estimate Z ̂ t 0 ( t 1 : M ) = 1 (see e.g. [27]).

MTU particle filter: Uncertain measurement times

We now assume that each observation time Tj is a realization of a random variable Tj. Its distribution is expressed via densities γj with respect to the Lebesgue measure λ [ t 0 , ) . The observation times Tj themselves are not observed.

Full model

The full model in this case will include complete continuous state paths, since the observation times are now distributed over the complete time axis [t0,), and the observations may potentially depend on every state x t j for tj∈[t0,). Consider therefore the joint density of the variables X [ t 0 , ) , Y1:M and t1:M, with respect to the product measure L X [ t 0 , ) j = 1 M μ Y j j = 1 M λ [ t 0 , ) :

f X [ t 0 , ) , Y 1 : M , T 1 : M x [ t 0 , ) , y 1 : M , t 1 : M = j = 1 M g j ( y j | x t j , t j ) j = 1 M γ j ( t j ) . (21)

Filter model

The filter at a given time tt0 is again based on a reduced model. This model is given by the joint density of the following variables: x [ t 0 , t ] , denoting the state paths until time t; further only those variables yj for which Tjt; and finally t1:M. This density is given with respect to the product measure L X [ t 0 , t ] j = 1 M μ Y j j = 1 M λ [ t 0 , ) by:

f ̂ t X [ t 0 , t ] , Y 1 : M , T 1 : M ( x [ t 0 , t ] , y 1 : M , t 1 : M ) = j = 1 t j t M g j ( y j | x t j , t j ) j = 1 M γ j ( t j ) . (22)

Note that we cannot use the simple notation of the standard case where for filtering only the first k observations are taken into consideration at time tk, since neither the observations are ordered in time nor the times Tj are fixed in advance. For this reason we have to include all measurements Y1:M also into the filter model. Note that even though we use the complete data Y1:M=y1:M in the notation, only those yj have to be known at time t for which Tjt holds. To avoid confusion, we mark all densities connected to the filter model at time t by a hat superscript (and by the index t).

We will now derive formulas for the filter density. Since we assume that the observation times t1:M are not observed, we use marginalization to get the joint density for x [ t 0 , t ] and Y1:M only, which is

f ̂ t X [ t 0 , t ] , Y 1 : M ( x [ t 0 , t ] , y 1 : M ) = t 0 t 0 f ̂ t X [ t 0 , t ] , Y 1 : M , T 1 : M ( x [ t 0 , t ] , y 1 : M , t 1 : M ) d t 1 d t M = t 0 t 0 j = 1 t j t M g j ( y j | x t j , t j ) j = 1 M γ j ( t j ) d t 1 d t M (23)

with respect to the product measure L X [ t 0 , t ] j = 1 M μ Y j . We will further simplify this density. If we define

g ̄ j , t y j | x t j , t j : = g j ( y j | x t j , t j ) if t j t 1 otherwise ,

then

j = 1 t j t M g j y j | x t j , t j = j = 1 M g ̄ j , t y j | x t j , t j

and further

f ̂ t X [ t 0 , t ] , Y 1 : M ( x [ t 0 , t ] , y 1 : M ) = t 0 t 0 j = 1 t j t M g j y j | x t j , t j j = 1 M γ j ( t j ) d t 1 d t M = t 0 t 0 j = 1 M g ̄ j , t y j | x t j , t j γ j ( t j ) d t 1 d t M = j = 1 M t 0 g ̄ j , t ( y j | x t j , t j ) γ j ( t j ) d t j , (24)

where the last step is possible because the factor indexed by j does not depend on t j for jj. For each j, we can split the integration by Tj at the time point t into two parts and get

t 0 g ̄ j , t y j | x t j , t j γ j ( t j ) = t 0 t g j y j | x t j , t j γ j ( t j ) d t j + t γ j ( t j ) d t j = 1 + t 0 t g j ( y j | x t j , t j ) - 1 γ j ( t j ) d t j

where the last step follows from the fact that γj is a probability density and therefore

t 0 t γ j ( t j ) d t j + t γ j ( t j ) d t j = t 0 γ j ( t j ) d t j = 1

holds. Inserting this into (24), we get

f ̂ t X [ t 0 , t ] , Y 1 : M x [ t 0 , t ] , y 1 : M = j = 1 M 1 + t 0 t g j ( y j | x t j , t j ) - 1 γ j ( t j ) d t j .

With a further marginalization, we get the joint density of Xt and Y1:M for the filter model,

f ̂ t X t , Y 1 : M ( x t , y 1 : M ) : = x ~ [ t 0 , t ] X [ t 0 , t ] : x ~ t = x t f ̂ t X [ t 0 , t ] , Y 1 : M ( x ~ [ t 0 , t ] , y 1 : M ) d L X [ t 0 , t ] ( x ~ [ t 0 , t ] ) = x ~ [ t 0 , t ] X [ t 0 , t ] : x ~ t = x t j = 1 M 1 + t 0 t g j ( y j | x ~ t j , t j ) - 1 γ j ( t j ) d t j d L X [ t 0 , t ] ( x ~ [ t 0 , t ] ) (25)

which is with respect to the product measure L X t j = 1 M μ Y j . From this density, we finally can compute the filter density with respect to L X t by applying Bayes’ theorem:

f ̂ t X t | Y 1 : M ( x t | y 1 : M ) = f ̂ t X t , Y 1 : M ( x t , y 1 : M ) f ̂ t Y 1 : M ( y 1 : M ) (26)

where

f ̂ t Y 1 : M ( y 1 : M ) = X t f ̂ t X t , Y 1 : M ( x ~ t , y 1 : M ) d L X t ( x ~ t ) (27)

is the data likelihood with respect to the measure j = 1 M μ Y j .

Effective computation of the filter distributions

In the following paragraph, we will show how the densities of the filter distributions given by (26) can be effectively computed. This is the basis for the formulation of our MTU particle filter method.

Let the observations y 1 : M Y 1 : M with f Y 1 : M ( y 1 : M ) > 0 be given. For each time t∈ [t0,) and for each j∈{1,…,M}, we define random variables

W j , t : Ω R 0 and W t : Ω R 0

by the following system of ODEs

d W 1 , t ( ω ) = g 1 ( y 1 | X t ( ω ) , t ) - 1 γ 1 ( t ) d t d W M , t ( ω ) = g M ( y M | X t ( ω ) , t ) - 1 γ M ( t ) d t (28)

for each ωΩ with initial values

W 1 , t 0 ( ω ) = = W M , t 0 ( ω ) = 1 , (29)

and by

W t = j = 1 M W j , t . (30)

We will show that for each set A in the σ-algebra generated by the variable Xt, it holds that

X t ( A ) f ̂ t X t | Y 1 : M ( x t | y 1 : M ) d L X t ( x t ) = A W t ( ω ) d P ( ω ) Ω W t ( ω ) d P ( ω ) (31)

where f ̂ t X t | Y 1 : M is the filter density. That means we can use the processes Wj,t and Wt to compute the filter distributions through time. From this, it follows immediately that we can also compute filter expectations. Indeed, for any real-valued measurable function h on X such that E[|h(Xt)|]<, it holds that the expectation of h(Xt) given Y1:M=y1:M with respect to the filtered state Xt defined by

E ̂ t [ h ( X t ) | Y 1 : M = y 1 : M ] : = E f ̂ t X t | Y 1 : M ( · | y 1 : M ) [ h ( X t ) ] = X t f ̂ t X t | Y 1 : M ( x t | y 1 : M ) h ( x t ) d L X t ( x t ) , (32)

is given by the following equation:

E ̂ t h ( X t ) | Y 1 : M = y 1 : M = Ω W t ( ω ) h ( X t ( ω ) ) d P ( ω ) Ω W t ( ω ) d P ( ω ) . (33)

To show our assertion, we consider the processes Wj,t for j=1,…,M. According to (28) and (29), each Wj,t is defined as

W j , t = 1 + t 0 t g j ( y j | X t j , t j ) - 1 γ j ( t j ) d t j , (34)

so, with (30),

W t = j = 1 M W j , t = j = 1 M 1 + t 0 t g j ( y j | X t j , t j ) - 1 γ j ( t j ) d t j (35)

holds. Thus, to prove (31), we have to show that for each set A from the σ-algebra generated by Xt,

A W t ( ω ) d P ( ω ) Ω W t ( ω ) d P ( ω ) = X t ( A ) x ~ [ t 0 , t ] X [ t 0 , t ] : x ~ t = x t j = 1 M 1 + t 0 t g j ( y j | x ~ t j , t j ) - 1 γ j ( t j ) d t j × d L X [ t 0 , t ] ( x ~ [ t 0 , t ] ) d L X t ( x t ) ) / f ̂ t Y 1 : M ( y 1 : M )

holds (see (25) and (26)). It is enough to show the equality for the numerator, i.e.

A W t ( ω ) d P ( ω ) = X t ( A ) x ~ [ t 0 , t ] X [ t 0 , t ] : x ~ t = x t j = 1 M 1 + t 0 t g j ( y j | x ~ t j , t j ) - 1 γ j ( t j ) d t j × d L X [ t 0 , t ] ( x ~ [ t 0 , t ] ) d L X t ( x t )

since the equality of the denominator follows then immediately from the special case A=Ω and from the fact that

f ̂ t Y 1 : M ( y 1 : M ) = X t x ~ [ t 0 , t ] X [ t 0 , t ] : x ~ t = x t j = 1 M 1 + t 0 t g j ( y j | x ~ t j , t j ) - 1 γ j ( t j ) d t j × d L X [ t 0 , t ] x ~ [ t 0 , t ] d L X t ( x t ) .

Using the variable transformation ( X [ t 0 , t ] ( ω ) , X t ( ω ) ) = ( x ~ [ t 0 , t ] , x t ) , we get

A W t ( ω ) d P ( ω ) = A j = 1 M 1 + t 0 t g j ( y j | X t j ( ω ) , t j ) - 1 γ j ( t j ) d t j d P ( ω ) = X t ( A ) x ~ [ t 0 , t ] X [ t 0 , t ] : x ~ t = x t j = 1 M 1 + t 0 t g j ( y j | x ~ t j , t j ) - 1 γ j ( t j ) d t j × d L X [ t 0 , t ] ( x ~ [ t 0 , t ] ) d L X t ( x t ) .

This is what we wanted to show.

Weights

Since for each t and for each j=1,…,M the random variables Wj,t and Wt depend only on the process x [ t 0 , t ] until time t, we can define functions w j , t : X [ t 0 , t ] ( Ω ) R 0 and w t : X [ t 0 , t ] ( Ω ) R 0 by setting for each x [ t 0 , t ] X [ t 0 , t ] ( Ω ) X [ t 0 , t ] :

w j , t ( x [ t 0 , t ] ) : = W j , t ( ω ) and w t ( x [ t 0 , t ] ) : = W t ( ω ) for some ω Ω with X [ t 0 , t ] ( ω ) = x [ t 0 , t ] .

It follows from (34) that

w j , t ( x [ t 0 , t ] ) = 1 + t 0 t g j ( y j | x t j , t j ) - 1 γ j ( t j ) d t j (36)

for each j and from (35) that

w t ( x [ t 0 , t ] ) = j = 1 M w j , t ( x [ t 0 , t ] ) . (37)

The values of Wt will serve as weights in the MTU particle filter. We will call Wj,t the partial weights. Since in each discretization scheme which is applied to solve the integral in the formula (36) for Wj,t the integrand has to be evaluated, we may run into practical problems if we use it as it is written in the formula. If the density g j ( y j | x t j , t j ) evaluates to a very small value and if we work with fixed-precision numbers, the subtraction of 1 will result in a value which may be practically equal to -1. If this error accumulates over time, we may end up with wrong values for w j , t ( x [ t 0 , t ] ) . To reduce the computational error, the integral could be split up in the following way:

w j , t ( x [ t 0 , t ] ) = 1 - t 0 t γ j ( t j ) d t j + t 0 t g j ( y j | x t j , t j ) γ j ( t j ) d t j = 1 - γ ̄ j , t + w ̄ j , t ( x [ t 0 , t ] ) (38)

where the cumulative distribution function

γ ̄ j , t : = t 0 t γ j ( t j ) d t j (39)

is independent of x [ t 0 , t ] and where the part

w ̄ j , t ( x [ t 0 , t ] ) : = t 0 t g j ( y j | x t j , t j ) γ j ( t j ) d t j (40)

depends on the path x [ t 0 , t ] . It is even more convenient to compute γ ̄ j , t by evaluating the antiderivative of γj, if it is computationally available.

Note that the definition of the filter distribution is dependent on the reference measure μ Y j . A suitable change of this measure may help to further increase the efficiency of the algorithm. This issue still has to be explored.

Resampling

Special attention is needed for the computation of the weights after resampling steps have been applied. As mentioned earlier, resampling at time s is done by randomly generating a selection function ι:II (with index set I={1,…,N}) based on given non-negative (unnormalized) selection weights v s i for each particle index i. The state samples x s i i = 1 , , N have then to be replaced by the selected state samples x s ι ( i ) i = 1 , , N , and the unnormalized weights w s i i = 1 , , N by the corrected weights w s ι ( i ) / v s ι ( i ) i = 1 , , N . We assume that resampling steps at times s1,…,s with t0s1⋯<st have occurred, states and weights have been replaced at these times, and within this paragraph, we denote them by x t s 1 , , s ; i and w t s 1 , , s ( x [ t 0 : t ] s 1 , , s ; i ) , respectively, for each particle i. By definition, we have at t=s

x s s 1 , , s ; i = x s s 1 , , s - 1 ; ι ( i ) ,

and

w s s 1 , , s x [ t 0 : s ] s 1 , , s ; i = w s s 1 , , s - 1 x [ t 0 : s ] s 1 , , s ; i v s ι ( i ) .

For each time ts and for each particle i, the corrected weights are then recursively given by

w t s 1 , , s x [ t 0 : t ] s 1 , , s ; i = w t s 1 , , s - 1 x [ t 0 : t ] s 1 , , s ; i v s ι ( i ) = w t x [ t 0 : t ] s 1 , , s ; i λ = 1 v s λ ι λ : ( i ) (41)

with

ι λ : ( i ) = ι λ ι ( i ) .

Since the process Wt computes the uncorrected weights w t ( x [ t 0 : t ] s 1 , , s ; i ) , we have to correct the weights explicitly in the algorithm by dividing them by the cumulative product

v ̄ t s 1 , , s ; i : = λ = 1 v