Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695, USA

Abstract

Background

Experimental design approaches for biological systems are needed to help conserve the limited resources that are allocated for performing experiments. The assumptions used when assigning probability density functions to characterize uncertainty in biological systems are unwarranted when only a small number of measurements can be obtained. In these situations, the uncertainty in biological systems is more appropriately characterized in a bounded-error context. Additionally, effort must be made to improve the connection between modelers and experimentalists by relating design metrics to biologically relevant information. Bounded-error experimental design approaches that can assess the impact of additional measurements on model uncertainty are needed to identify the most appropriate balance between the collection of data and the availability of resources.

Results

In this work we develop a bounded-error experimental design framework for nonlinear continuous-time systems when few data measurements are available. This approach leverages many of the recent advances in bounded-error parameter and state estimation methods that use interval analysis to generate parameter sets and state bounds consistent with uncertain data measurements. We devise a novel approach using set-based uncertainty propagation to estimate measurement ranges at candidate time points. We then use these estimated measurements at the candidate time points to evaluate which candidate measurements furthest reduce model uncertainty. A method for quickly combining multiple candidate time points is presented and allows for determining the effect of adding multiple measurements. Biologically relevant metrics are developed and used to predict when new data measurements should be acquired, which system components should be measured and how many additional measurements should be obtained.

Conclusions

The practicability of our approach is illustrated with a case study. This study shows that our approach is able to 1) identify candidate measurement time points that maximize information corresponding to biologically relevant metrics and 2) determine the number at which additional measurements begin to provide insignificant information. This framework can be used to balance the availability of resources with the addition of one or more measurement time points to improve the predictability of resulting models.

Background

Costly materials, limited resources, and lengthy experiments are constraints that hinder our ability to acquire quantifiable measurements from biological systems. Experimental design approaches are computational techniques for extracting the most useful information from experiments yet to be performed

The development and application of experimental design has a rich history spread across a wide range of fields. An excellent review article by Pronzato has condensed the underlying concepts behind the most widely used techniques of experimental design for nonparametric and parametric models

Typically, parameter estimation problems begin by claiming that observations **g**(**x**, **
θ
***) by an error

where **x**
_{
i
}are the model states at **
k
**different times or experimental conditions,

Experimental design aims to maximize information, or minimize uncertainty, about unknown model parameters by exploring experimental configurations such as the sampling times where new measurements should be acquired, the desired number of measurements to add, which system components should be measured, etc. The criteria used to evaluate the information of a design are derived from scalar functions of the FIM ^{-1}), or equivalently minimizes of the sum of squared lengths of the axes of asymptotic confidence ellipsoids for **
θ
**. E-optimality refers to designs where the longest axis of asymptotic confidence ellipsoids for

Although there is a large body of work dedicated to experimental design using statistical methods

A key aspect of experimental design for bounded-error models is how to characterize the set of parameter values that are consistent with all data measurements. Initial methods for constructing this set use conservative bounding approaches based on ellipsoids to characterize the parameter sets. More precise parameter set estimations can be obtained using interval analysis

In this work, we develop an experimental design framework that utilizes interval analysis to generate the set of parameters and state bounds consistent with all data measurements. This approach leverages many of the recent advances in bounded-error parameter and state estimation methods

Methods

In this section, we define a specific experimental design problem and outline how our framework is used to determine the number of additional measurements that are warranted and at what time points these measurements should be taken. The relevant interval arithmetic algorithms for parameter and state estimation used throughout this process are briefly presented. We show how to select a set of candidate time points based on the estimated state bounds of a proposed model given initial data measurements and provide a method to estimate the corresponding candidate measurement bounds. Techniques for determining the effect of adding multiple candidate time points on parameter and state estimations are discussed. We define several biologically relevant metrics, which are scalar functions of the parameter and state estimations after incorporating estimated candidate time point measurements. These metrics can convey information such as the activity of specific enzyme kinetic parameters or bounding values for the estimation of unmeasured component concentrations.

Problem statement

Consider the following ordinary differential equation (ODE) model of a biological system:

Where **x **∈ ℝ^{n }
**y **∈ ℝ^{m }
**
θ
**∈ ℝ

_{i }

We use the method outlined in the left half of Figure

Experimental design method

**Experimental design method**. This figure outlines a block diagram of the experimental design approach. The process is outlined in four major steps (shown on the left). A novel approach for estimating measurement bounds at candidate time points is implemented.

Bounded estimation

These methods use interval analysis to computationally guarantee a valid bounded-error solution to the system of ODEs by employing interval box enclosures that bound the states during integration steps. Methods have been introduced in the literature to address overestimation due to wrapping effect

Uncertainty propagation

Interval analysis is a form of guaranteed computing and can be used to generate solutions to ODEs through the use of interval boxes and inclusion functions **g**, which maps a state interval box [**x**] to the corresponding image in the data space **g**([**x**]). Here the interval box [**x**] represents the Cartesian product of **x**] = [_{1}] × [_{2}] × ⋯ × [_{n}
_{i}
**g**([**x**]).

Computing the solution of ODEs for _{0 }≤ _{N }
^{th}-order Taylor expansion of the model ODEs are inflated by 1 ± α. Evaluation of the Taylor expansion is performed using the Extended Mean Value (EMV) algorithm proposed by Rihm **x**], at a time where data measurements,

Set inversion

SIVIA is able to determine solution sets for unknown quantities **u **from a functional relationship **q(u) **= [**y**]. An **u **is recursively explored using SIVIA to determine a guaranteed enclosure of the solution space. The resulting solution space is comprised of feasible and indeterminate boxes. These boxes, **[u]**, are determined from the following relations: if **q([u]) **⊆ **[y] **then **[u] **is **q([u]) **∩ **[y] = **
**[u] **is **[u] **is

Parameter and state estimation

The methods presented in this paper leverage the works of Jaulin for state estimation

Estimating candidate measurements

The measurements at a given time are characterized by an upper and lower bound such that _{j }
**C**
_{j}
**R**
_{j }
**C**
_{j }
**y **(_{j}
**R**
_{j}
**/**2. We estimate the center points and ranges of candidate measurements using the bounds of adjacent data measurements and the estimated bounds on component concentration trajectories generated by the EMV algorithm. Once estimated, each candidate measurement is added to the original _{j}
**C**
_{j}
**R**
_{j }

To simplify notation in this subsection, we assume that one or more of the states can be directly measured (**y **= **x**). This will allow for direct comparison between estimated state bounds and measurement values. This is a common assumption made for biological systems **y **via SIVIA.

Time point and range estimation

For a given state, time points for candidate measurements are chosen by first identifying all times _{
i
}and _{
i+1}, whose estimated range (generated by the EMV algorithm) is greater than or equal to both of the measurement uncertainties at times _{
i
}or _{
i+1}. This presents a worst case scenario because we are selecting candidate time points with the most possible uncertainty. Alternative time points can be selected based on practical experimental limitations or first principles knowledge. The set of time intervals,

where _{p }

where _{j }

Center point selection

Center point estimation is conservatively implemented to reduce the chance of erroneously eliminating valid kinetic parameters and component concentrations. We introduce a novel approach for estimating the corresponding center point of each candidate time point. This approach estimates the position of the center point _{j }
_{j }
_{j}
_{j }
_{4 }is bounded between the range _{4 }= 1.5, the resulting shifted candidate measurements at time _{4 }would have bounds [3,4.5], [3.75,5.25] and [4.5,6]. Second, bounded parameter estimation is performed for each of the _{p }
_{j}
_{j}
_{j }
_{j }
_{j}

Combining measurements

The ability to investigate the effects of adding multiple measurements is often desirable when designing biological experiments. Employing a brute-force method for assessing the impact of all combinations of candidate measurements at _{c }

The estimated parameter space for a combination of candidate time points, **x**
_{c}, is then determined using the resulting intersected parameter boxes.

Metrics

Scalar functions of the estimated parameter set and state bounds are used as metrics to predict the impact of adding measurements at candidate times _{j }

Parameter volume

We will evaluate the parameter volume as a means to compare our new metrics to traditional V- and D-optimality design criteria

where ^{
th
}dimension of the ^{th }

Parameter bounds

This metric can be customized for predicting candidate time points based on the uncertainty of a single parameter or a subset of parameters. Single parameter values are compared using the width of the uncertainty for the parameter of interest, e.g.

State bounds

This metric utilizes estimated state bound information and allows the experimenter to see how estimated ranges of unmeasured states are affected by additional measurements. This may be of interest when constraining the range of state values is more important than parameter information. Also, the information provided by this metric is biologically meaningful because it provides a predicted limit on state values such as component concentrations. This metric is computed similarly to the parameter bounds metric but with the parameter uncertainties replaced by the maximum ranges of estimated states. Other custom metrics are also possible; for example, designing a metric to select the time points that minimizes the maximum value of a specific state.

Results and discussion

In this section, the proposed experimental design method is applied to an example problem. We evaluate our set-based experimental design approach by performing a proof of concept on a model that has been used in the literature to evaluate several other set-based approaches

Problem setup

The model under examinations is the Lotka-Volterra predator prey model, which is a canonical biological ODE model

where _{1 }is the prey population, _{2 }is the predator population, _{1 }is the prey birth rate, _{2 }is the decrease in prey population due to encounters with predators, _{3 }is the predator death rate, and _{4 }is the increase in predator population due to encounters with prey. This model was used by Raïssi et al. to demonstrate their bounded parameter estimation algorithm when data measurements of the prey population are available for all N = 1,400 time points between _{0 }= 0 and _{N }

Initial data measurements were simulated by first generating model state values using exact inputs to the EMV algorithm and then adding uncertainty. The underlying state values, **x***, were generated using the same initial state values, model parameters and EMV algorithm settings as those used by Raïssi et al.: x_{1}(_{0}) = 50, _{2}(_{0}) = 50, _{1 }= 1, _{2 }= 0.01, _{3 }= 1, _{4 }= 0.02, α =0.005, _{2}, was assumed to be unmeasurable while for the first state, _{1}, measurements were generated by adding error intervals as follows: _{i }
_{i }

The assigned task is to determine at what times additional measurements would provide useful information with regards to the previously defined metrics and how many measurements would be beneficial. It was assumed that the initial conditions of both populations and parameters _{1 }and _{3 }were exactly known. We first wish to estimate the set of parameters _{2 }and _{4}, along with the range of the unmeasured state _{2 }for 0 ≤ _{1}.

Initial parameter and state estimation

Bounded estimates of parameters _{2 }and _{4 }and states _{1 }and _{2 }were calculated using the initial measurements _{2 }and _{4 }and indeterminate boxes were bisected until a minimum box width of ε = 10^{-5 }was obtained. This resulted in the generation of ~20 k indeterminate and feasible boxes shown in Figure **x**
_{
est
}, shown in Figure **x*** are the grey waveforms, **x**
_{
est
}are the black dashed waveforms.

Initial parameter estimate

**Initial parameter estimate**. This figure shows the feasible and infeasible boxes in the parameter space that result from the SIVIA algorithm. No distinction between feasible and infeasible is shown.

Initial estimated state bounds

**Initial estimated state bounds**. The true state values resulting from _{1}(_{0}) = _{2 }(_{0}) = 50 and **x***). The initial measurement set is shown as uncertainty bounds in x_{1 }at _{1}. The dashed lines show the results of the uncertainty propagation of the estimated parameter boxes in Figure 2 (**x**_{est}). The dotted lines show the positions of candidate times points (_{j}

Estimating candidate measurements

The initial data measurements were compared to the estimated state bounds for _{1 }to generate the interval set _{p }
_{j }
_{j }
_{j }
_{j }
_{j }
^{2 }values. Bounded parameter estimations were performed for the _{p }
^{2 }values greater than 0.99. We were then able to identify an estimate of the center point that maximized this curve.

Combining time points

We were able to establish independence between candidate time points by showing that the brute-force estimates using all possible permutations and the intersected parameter sets cover identical parameter regions. The brute-force combinations and the intersections of parameter sets for all combinations of two candidate time points were compared and found to produce both the same parameter volumes and parameter bounds with a tolerance of 10^{-12}. Parameter intersections were then computed for combinations of up to _{c }
_{2 }
_{6 }
_{2}, light grey for _{6}, and black for the brute-force combination which is used to depict the intersected parameter space.

Parameter space intersection

**Parameter space intersection**. This figure shows the estimated parameter uncertainty assuming a candidate measurement at _{2 }was added (_{6 }was added (

Estimates of state bounds were computed from the intersected parameter sets. An example estimate of state bounds is shown in Figure _{2 }and _{6}. The underlying state values **x*** are the solid grey waveforms, the combined estimated state bounds **x**
_{c }are the solid black waveforms and the estimated state bounds **x**
_{2 }and **x**
_{6}, corresponding to the results obtained from adding candidate measurements at _{2 }and _{6}, respectively, are the dashed black and dashed grey waveforms, respectively. The decrease in uncertainty for state _{2 }during 1 ≤

Combination of estimated state bounds

**Combination of estimated state bounds**. This figure shows the estimated state bounds assuming a candidate measurement at _{2 }was added (**x**_{2}, dashed black lines) and the estimated state bounds assuming a candidate measurement at _{6 }was added (**x**_{6}, dashed grey lines). The estimated state bounds for the combined candidate measurements, **x**_{c}, are the black lines, while the underlying true state values, **x***, are the solid grey lines.

Applying metrics

We tested whether the estimated candidate measurements generated by our algorithm could effectively be used to predict where the most appropriate measurements should be placed to reduce model uncertainty. With this in mind, we generated a set of true measurements at each candidate time point using the underlying state values, **x***, as the true center points, _{j }

Parameter information

The prediction of the best time point locations, given the set of candidate measurements, for several parameter metrics are shown in Figure _{j }
_{j }
_{2 }(Figure _{1 }= 1.25. However, to minimize the uncertainty of parameter _{4 }a measurement at time _{6 }= 2.25 would be more beneficial. If there are resources available for three additional measurements they would best be placed at times _{2 }
_{6 }= 2.75, and _{8 }= 3.25 to obtain additional information on both unknown parameters. We emphasize the established consistency between the best candidate time points selected based on _{j}
_{j }
_{c }

Best candidate measurements for parameter metrics

**Best candidate measurements for parameter metrics**. This figure illustrates the location of the best candidate measurement (x-axis) given the number of potential measurements that can be added (y-axis) for a given metric. The index value of predicted time points are represented by solid squares for _{j}_{2 }and _{4 }

The point at which additional measurements will not provide any additional information about the system can be predicted by observing the metric values for combinations of time points. This is especially beneficial for conserving resources that would otherwise be spent on experiments that yield no new information. The values of the four parameter metrics are shown in Figure _{4}. Estimating the impact of adding multiple measurements leads to the clear conclusion that a single additional measurement is all that is required. Similarly, reducing the uncertainty of the consistent parameter set volume may require 2 or 3 additional measurements. These metric value curves can be combined with cost functions to determine a design that efficiently utilizes experimental resources.

Parameter metric values

**Parameter metric values**. Plots of parameter metric values vs number of additional measurements. These plots demonstrate the decrease in parameter uncertainty with additional measurements. The point of diminishing return is indicated by the elbow of the curve for the respective metric. This shows that additional measurements will no longer decrease uncertainty associated with that metric.

State information

Two metrics were applied to the unmeasured state, _{2}, to determine how its uncertainty is impacted when candidate measurements are applied to state _{1 }using center points _{j}
_{2}. The second metric, _{2 }over the simulation time 0 ≤

Best candidate measurements and metric values for state metrics

**Best candidate measurements and metric values for state metrics**. Best candidate measurements for state metrics and corresponding metric values. Candidate time point locations are indicated by open circles for center points _{j}_{1 }on the estimated state bounds of _{2}.

Comparison with FIM D-optimality

Scalar metrics of the Fisher Information Matrix (FIM) are often used to perform experimental design for many conventional problems _{i}
_{1}, {74, 80, 89, 95}_{2 }and {74, 89, 94, 95}_{3 }show three likely data sets containing four data points from experimental replicates for sample time _{i}
_{1}, _{2}, and _{3 }corresponding to 81, 84.5, and 88, respectively. Given that the use of the FIM inherently assumes the use of Gaussian distributions

We looked at three possible Gaussian distributions for each of the original measurement times, _{i }
_{i }

Comparison with D-optimal design

**Comparison with D-optimal design**. This figure compares our set-based _{1 }= 2, _{2 }= 4 and _{3 }= 6). The three distributions for each sample time are characterized by left shifted, center shifted, and right shifted means. (b) Time index of predicted time points given the number of additional measurements that can be made. The figure shows a comparison of time point selection for the following: solid squares--set-based method, circle ** θ***, solid black line--set-based method, dashed black line

We calculated the Maximum Likelihood (ML) estimate of the parameters _{i }

where _{i}
_{1 }= 2, the right shifted distribution at time _{2 }= 4, and the center distribution at time _{3 }= 6. We computed the sensitivity matrix,

in combination with (6). Here, the (i, j)^{th }element of these variables are _{
i,j
}= ∂_{i}
_{j}, J_{i,j }
_{i }
_{j }
_{
i,j
}= ∂_{i }
_{j}

where _{j}
_{j }

We computed D-optimal designs for the 9 distribution combinations and compared the selected candidate time points with our set-based method. The prediction of the best time point locations, given the set of candidate measurements, for our method (solid squares) and several D-optimal designs

Conclusions

Developing accurate models is crucial for understanding, predicting and ultimately controlling biological processes. The limitation of costly resources and lengthy experiments associated with the study of biological systems promotes an experimental design approach for model development. Stochastic experimental design methods rely on correctly characterizing the distribution of uncertainty in the model, often requiring a large number of data measurements. This requirement is difficult to fulfill for many biological systems and alternative set-based experimental design approaches are more appropriate in these situations. In addition to the method used to characterize uncertainty, biological interpretations of experimental design metrics are important because they provide a logical link between physical resources and mathematical constructs.

We have developed a novel experimental design framework using bounded-error methods and biologically relevant design metrics to select desirable time point locations where additional measurements will be collected for the purpose of improving resource allocation for biological experiments. Our method propagates the uncertainty resulting from a small collection of data measurements, which may contain information for only a subset of the model states, through time to estimate parameter and state bounds for a given system model. We used these bounded-error results to estimate candidate measurement time points, center points and ranges. We proposed a method for combining candidate time points and present several biologically meaningful design metrics.

Measurement estimation is an important component of this method. We used a set-based approach to estimate measurements at time points where no information was available. We were able to estimate measurement bounds at candidate time points by combining information from the initial data measurement bounds with the estimated state bounds generated by the EMV algorithm. Our method resulted in a good estimate when compared to true measurements for the purpose of identifying where additional measurements should take place. The granularity of candidate time points can be made as fine as desirable at the cost of additional computation time. The computational expense to search all possible time points may make identifying globally optimal time point locations impractical using this method. However, the accuracy of when measurements are collected during biological experiments is often on the order of minutes, hours or days and locally optimal time points from an experimentally feasible set of time points is often sufficient.

The ability to estimate the effects of adding measurements at multiple time points is often desirable. A brute force method to explore all combinations of time points is computationally expensive. However, we found that the parameter estimation for a combination of time points can be directly obtained by intersecting the individual estimated parameter spaces. Estimated state bounds can then be determined using the intersected parameter space. The experimenter can determine when additional measurements will provide little or no additional information by exploring the effects of adding multiple measurements and will not needlessly spend limited resources on experiments that yield no additional information.

The framework presented here can be used to predict at what time additional measurements should be made to maximize information based on biologically relevant metrics and to determine the number at which additional measurements being to provide insignificant information. Problems of this sort are often faced by biologists when modeling biological processes. Selecting an appropriate metric is made more straightforward by associating it with biologically relevant information. For example, the uncertainty of a parameter may be associated with specific characteristics of an engineered enzyme, while the limitations on the uncertainty of estimated state bounds can provide critical bounds on unmeasured component concentrations, allowing systems to maintain chemical and physiological phenotypes.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SM designed the study and prepared the manuscript. CW participated in the design and in revising the draft. Both authors read and approved the final manuscript.

Acknowledgements

The authors acknowledge financial support from NCSU startup funds.