Bioprocess Engineering Group, Spanish National Research Council, IIM-CSIC, 36208, Vigo-Spain

Abstract

Background

Mathematical models provide abstract representations of the information gained from experimental observations on the structure and function of a particular biological system. Conferring a predictive character on a given mathematical formulation often relies on determining a number of non-measurable parameters that largely condition the model's response. These parameters can be identified by fitting the model to experimental data. However, this fit can only be accomplished when identifiability can be guaranteed.

Results

We propose a novel iterative identification procedure for detecting and dealing with the lack of identifiability. The procedure involves the following steps: 1) performing a structural identifiability analysis to detect identifiable parameters; 2) globally ranking the parameters to assist in the selection of the most relevant parameters; 3) calibrating the model using global optimization methods; 4) conducting a practical identifiability analysis consisting of two (

Conclusions

The presented procedure was used to iteratively identify a mathematical model that describes the NF-

Background

Biological systems are mainly composed of genes that encode the molecular machines that execute the functions of life and networks of regulatory interactions specifying how genes are expressed, with both operating on multiple, hierarchical levels of organization

The modeling and simulation of biochemical networks (e.g. metabolic or signaling pathways) has recently received a great deal of attention

Currently, the most typical approach to representing biochemical networks is through a set of coupled deterministic ordinary differential equations intended to describe the network and the production and consumption rates for the individual species involved in the network

Unfortunately, with model details come parameters, and most parameters are generally unknown, thereby hampering the possibility for obtaining quantitative predictions. Modern experimental techniques, such as time-resolved fluorescence spectroscopy or mass-spectrometry-based techniques, can be used to obtain time-series data for the biological system under consideration. The goal of model identification is then to estimate the non-measurable parameters so as to reproduce, insofar as is possible, the experimental data. Although apparently simple, non-linear model identification is usually a very challenging task, due to the usual lack of identifiability, either practical or, in the worst case, structural. In fact, several authors have reported difficulties in assessing unique and meaningful values for the parameters from given sets of experimental data since broad ranges of parameter values result in similar model predictions (see for example,

This problem has motivated the development of iterative procedures for model identification, such as those proposed by Feng and Rabitz

It is important to note, however, that, in most cases, only a limited number of components in the network can be measured, usually far fewer components than incorporated in the model; only specific stimuli are available, and the system may only be stimulated in very specific ways (for example, via sustained or pulse-wise stimulation); the number of sampling times is usually rather limited, and finally, the experimental data are subject to substantial experimental noise. These constraints, together with the dynamic and typically non-linear character of the models under consideration result in identifiability problems, i.e. in the impossibility of providing a unique solution for the parameters.

Our research describes a novel general iterative identification procedure, extending the one originally outlined in Balsa-Canto et al.

With this aim in mind, the iterative identification procedure presented here involves the following steps:

• Analysis of structural identifiability. This step, which is often disregarded, evaluates whether the parameters may be assigned unique values from a given pair model and observables, under ideal experimental conditions, and assesses - when this is possible - the reformulation of a given model or the implementation of an iterative procedure for model calibration.

• Global ranking of parameters. This step helps decide which parameters are the most relevant to model output. In the case of lack of structural identifiability, global ranking may be used to make decisions as to reformulate the model or which parameters to estimate.

• Model calibration using global optimization methods. The model calibration problem can be formulated as a non-linear optimization problem. Unfortunately, since it is usually the case that several sub-optimal solutions are possible, the use of global optimization methods is necessary to somehow guarantee that the best possible solution is located.

• Practical identifiability analysis. Complementary to the structural identifiability test, the practical identifiability analysis enables an evaluation of the possibility of assigning unique values to the parameters from a given set of experimental data or experimental scheme, subject to experimental noise. In this paper we distinguish between two types of practical identifiability analyses: firstly, the expected quality of a given experimental scheme is analyzed

• Optimal experimental design via dynamic optimization. The purpose of this step is to design dynamic experiments with the aim of maximizing data quality and quantity (as measured by the Fisher information matrix) for the purpose of model calibration.

To illustrate the difficulties that may be faced when identifying a nonlinear dynamic biological model and the performance of the proposed identification procedure we consider the mathematical model that describes the NF-

Methods

Model building

A mathematical model has three important functions: first, it helps to better understand the biological phenomenon studied; secondly, it enables experiments to be specifically designed to make predictions of certain characteristics of the biological system that can then be experimentally verified; and finally, it summarizes the current body of knowledge in a format that can be easily communicated. Devising such a model involves a number of steps (Figure

Model building loop

**Model building loop**.

The purpose of the model will condition the selection of the modeling framework and the information that should be included in the model. Only elements that might have an impact on the questions to be addressed by the model should be included. In this regard, account should be taken of the fact that reaction models can only include a small subset of all reactions taking place within a cell. Thus, assumptions must be made about the extent to which the species included in the model evolve independently of the species excluded from the model, and also about the species that are crucial for the purpose of the model. At this stage it is possible to define the network architecture and decide which type of modeling framework may be the most appropriate (deterministic generalized mass action based models, power-law models, stochastic models, partial differential equations, etc.)

In the next step, an initial mathematical model structure (or battery of model structures) is proposed. New experimental information must then be used to verify hypotheses, and to discriminate, if possible, among different model alternatives. The candidates will often depend on a number of unknown non-measurable parameters that can be computed by means of experimental data fitting (identification).

This crucial step provides the mathematical structure with the capacity to reproduce a given data set, make predictions and discriminate among different model candidates.

The last step is validation, which essentially means reconciling model predictions with any new data observed. This process is likely to reveal defects, in which case a new model structure and/or new (optimal) experiment is planned and implemented. This process is repeated iteratively until validation is considered to be complete and satisfactory.

Note that the success of this model-building loop relies on being able to perform experiments under a sufficient number of conditions to extract a rich ensemble of dynamic responses, to accurately measure such responses and to iterate in order to improve the predictive capabilities of the model without a significant cost.

Since model identification is a task that consumes large amounts of experimental data, an iterative identification procedure is proposed which is intended to accurately compute model unknowns while reducing experimental cost.

Optimal identification procedure

The proposed iterative identification procedure is depicted in Figure

Model building procedure incorporating the proposed model identification scheme

**Model building procedure incorporating the proposed model identification scheme**.

If there are several model candidates two extra steps should be included in the loop, one to analyze structural distinguishability among candidates and the other to design experiments for model discrimination

Mathematical model formulation

We will assume a biological system described by the vector of state variables **x**(

where **
θ
**∈ Θ ⊂

Moreover, given an experimental scheme, with _{
e
}experiments, **y**
^{
e, o
}∈

where ^{
th
}sampling time for observable

Structural identifiability analysis

Once the structure of the state-space representation, Eqns. (1)-(3), has been established, the structural identifiability problem is concerned with the possibility of calculating a unique solution for the parameters while assuming perfect data (noise-free and continuous in time and space). Structural identifiability is thus related to the model structure and possibly to the type of stimulation and independent of the parameter values.

There are, at least, two obvious reasons to asses structural identifiability: first, the model parameters have a biological meaning, and we are interested in knowing whether it is at all possible to determine their values from experimental data; second, is related with the problems that a numerical optimization approach may find when trying to solve an unidentifiable model.

There are a few methods for testing the structural identifiability of nonlinear models

Details of the **Taylor series approach **can be found in **generating series approach **

The observables can be expanded in series with respect to time and stimuli in such a way that the coefficients of this series are **g**(**x**, **
θ
**,

where _{
fg
}is the Lie derivative of **g **along the vector field **f**, given by:

with _{
j
}the jth component of **f**.

If **s**(**
θ
**) regards the vector of all the coefficients of the series, a sufficient condition for the model to be identifiable is that there exists a unique solution for

The tableau represents the non-zero elements of the Jacobian of the series coefficients with respect to the parameters. It consists of a table with as many columns as parameters and with as many rows as non-zero series coefficients, in principle, infinite, as shown in Figure

Minimum identifiability tableau for the generating series method

**Minimum identifiability tableau for the generating series method**. A cross in the coordinates (_{j}. Green crosses represent those parameters that can be computed from a single equation of the system. Green circles correspond to those parameters that may be uniquely identified, i.e. only one solution exist. Red crosses represent possible identifiability problems, i.e. sets of parameters that require more than 2 equations to be identified if possible. Red boxes and arrows represent sets of equations that result in an unique solution for the parameters. Numbers represent the order in which the equations were solved.

If the Jacobian is rank deficient, i.e. the tableau presents empty columns, the corresponding parameters may be unidentifiable. Note that since the number of series coefficients may be infinite, unidentiability may not be fully guaranteed unless higher order series coefficients are demonstrated to be zero.

If the rank of the Jacobian coincides with the number of parameters, then it will be possible to, at least, locally identify the parameters. In this situation a careful inspection of the tableau will help to decide on an iterative procedure for solving the system of equations, as follows:

• The number of non-zero coefficients is usually much larger than the number of parameters. In practice this means that we should select the first _{
θ
}rows that guarantee the Jacobian rank condition. The tableau helps to easily detect the necessary coefficients and to generate a "minimum" tableau.

• A unique non-zero element in a given row of the minimum tableau means that the corresponding parameter is structurally identifiable. If any, the parameters in this situation can be computed as functions of the power series coefficients and can be then eliminated from the "minimum" tableau to generate a "reduced" tableau. Subsequent reductions may lead to the appearance of new unique non-zero elements and so on. Thus all possible "reduced" tableaus should be built first.

• Once no more reductions are possible, one should try to solve the remaining equations. Since it is often the case that not all remaining power series coefficients depend on all parameters, the tableau will help to decide on how to select the equations to solve for particular parameters.

• If several meaningful solutions exist for a given set of parameters, then the model is said to be locally identifiable.

If the model turns out not to be completely identifiable, identifiability may be recovered by extending the set of observables, however this may not be accessible in practice. Alternatively one may consider fixing some parameters

Global ranking of parameters

Observables will depend differently on different parameters and this may be used to rank the parameters in order of their relative influence on model predictions. Such influence may be quantified by the use of parametric sensitivities.

Local parametric sensitivities for a given experiment

They may be numerically computed by using the direct decoupled method within a backward differentiation formulae (BDF) based approach, as implemented in e.g. ODESSA

The corresponding relative sensitivities,

Of course, the values of the parameters are not known

The simplest approach is to apply a Monte Carlo sampling. By sampling repeatedly from the assumed joint-probability density function of the parameters and by evaluating the sensitivities for each sample, the distribution of sensitivity values, along with the mean and other characteristics, can be estimated. This approach yields reasonable results if the number of samples is quite large, requiring a great computational effort.

An alternative that can yield more precise estimates is Latin hypercube sampling (LHS). This method selects _{
lhs
}different values for each of the parameters, which it does by dividing the range of each parameter into _{
lhs
}non-overlapping intervals on the basis of equal probability. Next, from each interval one value for the parameters is selected at random with respect to the probability density in the interval.

The _{
lhs
}values thus obtained for the first parameter are then paired in a random manner (equally likely combinations) with the _{
lhs
}values for the second and successive parameters. This method allows the overall parameter space to be explored without requiring an excessively large number of samples. The importance factors will then read:

where _{
D
}= _{
lhs
}
_{
e
}
_{
o
}
_{
s
}, ^{
msqr
}and ^{
mabs
}quantify how sensitive a model is to a given parameter considering ^{
mabs
}interactions between parameters. ^{
max
}and ^{
min
}indicate the presence of outliers and provide information about the sign. ^{
mean
}provides information about the sign of the averaged effect a change in a parameter has on the model output.

Ordering the parameters according to these criteria, preferably in decreasing order, results in a parameter importance ranking. This information may be useful to decide on reformulating the model or to fix the less relevant parameters to improve either structural or practical identifiability.

Note that the summations will, in general, hide the different effects from the different experiments and observables unless they are in the same order of magnitude. Similar analyses may be performed for experiments and observables, thus providing information on the parameters that are more relevant to a particular observable in a particular type of experiment.

Model calibration

Given the measurements, the aim of model calibration or parameter identification is to estimate some or all of the parameters **
θ
**in order to minimize the distance among data and model predictions. The maximum-likelihood principle yields an appropriate cost function to quantify such distance, which, for the case of Gaussian noise with known or constant variance, reads as the widely used weighted least-squares function:

where

Parameter identification is then formulated as a non-linear optimization problem, where the decision variables are the parameters and the objective is to minimize **
θ
**) subject to the system dynamics in Eqns. (1)-(3) and also, possibly, to some algebraic constraints that define the feasible region Θ.

This problem has recently received a great deal of attention in the literature. Jaqaman and Danuser presented a guide for model calibration in the context of biological systems

To deal with first difficulty several authors have proposed the use of global optimization methods

Practical identifiability analysis

As already mentioned in the introduction, practical identifiability analysis enables an evaluation of the possibility of assigning unique values to parameters from a given set of experimental data or experimental scheme subject to experimental noise. We distinguish between practical identifiability a priori, which anticipates the quality of the selected experimental scheme in terms of what we will call the expected uncertainty of the parameters, and practical identifiability

It is important to note that the major difference between the two analyses is that,

Possibly the simplest approach to perform such analyses given a set of simulated (**
θ
**) by pairs of parameters. This will help detect typical practical identifiability problems, such as strong correlation between parameters, the lack of identifiability for some parameters when the contours extend to infinity, or the presence of sub-optimal solutions.

To quantify the expected uncertainty of the parameters and/or the confidence region, we rely on a Monte Carlo-based sampling method **
μ
**) and either maximum expected uncertainty (

The obtained expected uncertainty of the parameters will allow the different experimental designs to be compared

The confidence intervals obtained for the parameters will enable a decision to be made on the need to perform further experiments to improve the quality of the parameter estimates and, thus, the predictive capabilities of the model.

Optimal experimental design

A crucial aspect of experimental data is data quantity and quality. As mentioned in the previous section, a given set of data may result in practical identifiability problems. This is why data generation and modeling have to be implemented as parallel and interactive processes, thereby avoiding the generation of data that may eventually turn out to be unsuited for modeling.

In addition, the use of model-based (

The model identification loop is complemented with an optimal experimental design step. The aim is to calculate the best scheme of measurements in order to maximize the richness (quantity and quality) of the information provided by the experiments while minimizing, or at least, reducing, the experimental burden

The richness of the experimental information may be quantified by the use of the Fisher Information Matrix (ℱ)

where **
μ
**presumably close to the optimal solution

The optimal experimental design is then formulated and solved as a general dynamic optimization problem, see details in

Regarding the selection of the scalar measure of the ℱ, several alternatives exist all of them related to the eigenvalues of the ℱ and thus related to the shape and size of the associated hyper-ellipsoid. The most popular are probably the D-optimality and E-optimality criteria, the former corresponding to the maximization of the determinant of the ℱ and the latter corresponding to the maximization of the minimum eigenvalue. From previous studies

Results and Discussion

The NF-

NF-

The model considered in this work was proposed by Lipniacki et al.

The model involves two compartment kinetics of the activators IKK and NF-

The scheme of the pathway is illustrated in Figure

where IKKn represents the cytoplasmic concentration of neutral form of IKK kinase; IKKa, the cytoplasmic concentration of active form of IKK; IKKi, the cytoplasmic concentration of inactive IKK; I_{
n
}, the nuclear concentration of I_{
t
}, the concentration of I_{
R
}is a logical variable representing the presence or absence of signal; _{
v
}is the ratio of cytoplasmic to nuclear volumes.

The NF-

**The NF- κB module**. Network model as in

Results/Discussion

In their paper, Lipniacki et al. (2004) fixed some of the model parameters by using values from the literature. To fit the unknown parameters, they used experimental data from previous works by Lee et al.

Lipniacki et al. concluded that several different sets of parameters are capable of reproducing the data. This lack of identifiability may originate either in the structure of the model and observables selected (lack of structural identifiability) or in the type of experiments performed and the experimental noise (lack of practical identifiability). Our aim was to determine the origin of the problem and to use the model identification loop presented here to improve the quality of the parameter estimates.

Structural identifiability analysis

To perform the analysis we take into account that Lee et al. _{
t
}), total IKK (IKKn+IKKa+IKKi), activated IKK (IKKa), total cytoplasmic I_{
t
}) and free nuclear NF-_{
n
}), and also that Hoffmann et al. _{
n
}) and the cytoplasmic I

The following is assumed:

• Only the concentrations measured by Lee et al.

• Initial conditions correspond to those for wild type cells after resting.

• The TNF stimulus is activated.

• Only the set **
θ
**in Eqn. are considered all the other parameters are assumed to be fixed, see details in Table

Nominal value for the parameters in the NF-

**Parameter**

**Nominal value ( θ*)**

**Comments**

_{1}

0.5

Fixed

_{2}

0.2

Fixed

_{1}

0.1

To be identified

_{3}

1

Fixed

_{2}

0.1

To be identified

_{1a}

5 × 10^{-7}

Fixed

_{2a}

0.0

Fixed

_{3a}

4 × 10^{-4}

To be identified

_{4a}

0.5

To be identified

_{5a}

1 × 10^{-4}

Fixed

_{6a}

2 × 10^{-5}

Fixed

_{1}

5 × 10^{-7}

Fixed

_{2}

0.0

Fixed

_{3}

4 × 10^{-4}

Fixed

_{4}

0.5

Fixed

_{5}

3 × 10^{-4}

To be identified

_{1}

2.5 × 10^{-3}

To be identified

_{2}

0.1

To be identified

_{3}

1.5 × 10^{-3}

To be identified

_{
prod
}

2.5 × 10^{-5}

To be identified

_{
deg
}

1.25 × 10^{-4}

To be identified

_{
F
}

0.06

Fixed

_{
v
}

5

Fixed

_{1}

2.5 × 10^{-3}

To be identified

_{2a}

0.01

To be identified

_{1a}

1 × 10^{-3}

To be identified

_{1a}

5 × 10^{-4}

Fixed

_{1c}

5 × 10^{-7}

Fixed

_{2c}

0.0

Fixed

_{3c}

4 × 10^{-4}

Fixed

The size of the model under consideration, the number of observables and the number of parameters make the application of the similarity transformation and the differential algebra approaches rather complex, thus the power series expansions will be used here.

In a first approximation to the structural identifiability problem the Taylor series approach was applied. From the analysis of the resultant tableau it is possible to asses that _{1}, _{1}, _{3a
}and _{1a
}are structurally identifiable. Unfortunately the complexity of the remaining equations prevents to draw clear conclusions for the rest of parameters.

The application of the generating series approach resulted, as expected, in a simpler system of equations. In fact it was possible to obtain as many coefficients as necessary to guarantee full rank Jacobian, the corresponding (full) tableau is presented in the Additional file

**Further details on the application of the identification procedure to the mathematical model of the NF- κB regulatory module**. Additional file

Click here for file

Identifiability tableau for the NF-

**Identifiability tableau for the NF- κB model**.

Ranking of parameters

The parameters were ranked globally considering three different experimental schemes for wild-type cells. The first experiment corresponded to a persistent TNF stimulation and the second and third experiments corresponded to 1

However, deciding the range of parameters is often a quite difficult task. In practice large bounds are defined so as to somehow guarantee that the real solution will lie within. Unfortunately, this approach often results in very large flat areas in the search space that make calibration extremely difficult. In addition, global analyses may lead to wrong conclusions, since the probability of considering sets of parameters that are far from the real sets increases rapidly. Whenever possible, one should use knowledge about the system to define reasonable bounds.

For this particular example we selected a reference parameter vector

The reference was then used to select different bounds for the parameters. Three different tests were performed: i) within the range (**
θ
**; ii) within the range (

Results obtained for all cases for the criterion _{
msqr
}are presented in Figure _{3a
}, _{4a
}, _{
prod
}and _{
deg
}and almost insensitive to _{2a
}, _{2 }and _{1}, indicating possible practical identifiability problems.

Ranking of parameters for the NF-

**Ranking of parameters for the NF- κB example**. Parameters are ordered by decreasing

In general, different ranking criteria may lead to different conclusions. In this example all criteria drive same results regarding the lack of influence of _{2a
}, _{2 }and _{1 }(see Additional file

As already mentioned before, the summations over experiments and observables may hide some relevant information. For example, from Figure

Sensitivity analysis in the range (

**Sensitivity analysis in the range **(^{msqr }measures for the different combinations of parameters and observables for the three different experiments.

From the figures it may be concluded that certain observables become more sensitive to certain parameters under short pulse-wise stimulation (Experiment 2). This is the case, for example, when looking at the sensitivities with respect to _{3a
}, _{4a
}or _{1}. Note that only the measurements of total cytoplasmic I_{1 }and _{1a
}and also the fact that we obtain almost no information about _{2}, _{1 }and _{2a
}.

It is important to underline that for the case of _{1}, experiments under sustained stimulation appear not to be relevant whereas the model becomes more sensitive to _{5 }or _{2 }under sustained stimulation. It can thus be expected that using an experimental scheme combining a sustained stimulation experiment with (optimally designed) pulse-wise stimulation experiments will increase overall sensitivity and thus improve identifiability properties.

Taking into account the results the vector of parameters **
θ
**is partitioned into two new vectors

The components of **
θ
**

Practical identifiability analysis

To establish a basis for comparison we first consider the problem as addressed by Lipniacki et al., i.e. with all parameters in set **
θ
**and the experimental scheme available from Lee et al.

For this purpose we can perform a battery of hundreds of

To perform the quantitative analysis according to the Monte Carlo approach the model calibration problem was solved for all sets of data by using the recently developed global optimization method based on Scatter Search (SSm,

Table _{1}, _{2 }and _{2a
}but also for _{2}, _{3}, _{
prod
}, _{
deg
}for which the relative distance is over the 20%. If we take a look at the illustrative examples of the confidence intervals in Figure _{1 }the objective function seems to be noisy and therefore the solution is hard to find even for global optimization methods and for _{2a
}the objective function seems to be flat therefore the optimization method may achieve any solution in the allowed range but with a significant tendency to get trapped in the bounds. For the case _{2 }and all other parameters, with influence on the observables, there is one unique solution and the solver is able to find it in all runs.

Practical identifiability analysis for the full set

**Practical identifiability analysis for the full set θ**. Illustrative examples of the histograms of the solutions achieved with the Monte-Carlo based approach for

Practical identifiability analysis for the experimental scheme ^{REF }is the parameter mean value computed by the Monte-Carlo based approach; ^{REF }is the relative distance between the mean and the nominal computed as ^{REF }in %.

**Parameter**

**
μ
**

^{REF }(in %)

**(in %)**

_{1}

0.10

1.77

1680

1.79

100.7

_{2}

0.10

6.16

6060

3.03

49.1

_{3a}

4.00 10^{-4}

4.00 × 10^{-5}

3.09

2.80 × 10^{-5}

6.90

_{4a}

0.50

0.50

0.60

0.08

15.9

_{5}

3.00 × 10^{-4}

3.07 × 10^{-4}

2.49

1.02 × 10^{-4}

33.1

_{1}

2.50 × 10^{-3}

2.45 × 10^{-3}

2.04

5.34 × 10^{-4}

21.7

_{2}

0.10

0.13

33.3

0.08

60.2

_{3}

1.50 × 10^{-3}

1.18 × 10^{-3}

21.1

8.08 × 10^{-4}

68.3

_{
prod
}

2.50 × 10^{-5}

3.25 × 10^{-5}

29.9

3.19 × 10^{-5}

98.3

_{
deg
}

1.25 × 10^{-4}

1.63 × 10^{-4}

33.4

1.62 × 10^{-4}

99.9

_{1}

2.50 × 10^{-3}

2.40 × 10^{-3}

3.85

6.38 × 10^{-4}

26.5

_{2a}

0.01

4.74 × 10^{-3}

374

5.30 × 10^{-3}

110.9

_{1a}

1.00 × 10^{-3}

9.74 × 10^{-4}

0.75

2.42 × 10^{-4}

24.3

Results obtained justify the fact addressed by Lipniacki et al (2004)., the origin of multiple equivalent solutions is the poor practical identifiability originated in the lack of influence of some parameters in the available observables.

If we compare the results with the ones obtained considering only the set **
θ
**

Practical identifiability analysis for the experimental scheme ^{ES1 }is the parameter mean value computed by the Monte-Carlo based approach; ^{ES1 }is the relative distance between the mean and the nominal computed as ^{ES1 }in %.

**Parameter**

**
μ
**

^{ES1}(

**(in %)**

_{3a}

4.00 × 10^{-4}

4.00 × 10^{-5}

0.02

2.20 × 10^{-5}

5.40

_{4a}

0.50

0.50

0.66

0.046

9.07

_{5}

3.00 × 10^{-4}

3.01 × 10^{-4}

0.26

1.23 × 10^{-4}

40.8

_{1}

2.50 × 10^{-3}

2.49 × 10^{-3}

0.46

5.01 × 10^{-4}

20.1

_{2}

0.10

0.10

1.97

0.04

44.0

_{3}

1.50 × 10^{-3}

1.49 × 10^{-3}

0.95

5.00 × 10^{-4}

33.7

_{
prod
}

2.50 × 10^{-5}

2.60 × 10^{-5}

2.90

1.40 × 10^{-5}

53.7

_{
deg
}

1.25 × 10^{-4}

1.29 × 10^{-4}

3.41

7.80 × 10^{-5}

60.8

_{1}

2.50 × 10^{-3}

2.49 × 10^{-3}

0.26

4.22 × 10^{-4}

16.9

_{1a}

1.00 × 10^{-3}

1.00 × 10^{-3}

0.27

1.82 × 10^{-4}

18.1

_{3a
}and _{4a
}can be already be appropriately estimated. The **
μ
**value is less than a 1% relative distance to the nominal ("real") value. In addition the expected uncertainties are less than a 10% which is in the order of the experimental error. As a consequence

Optimal experimental design

In order to improve the identifiability properties of

• Initial conditions correspond to those for wild type cells after resting.

• The TNF stimulus is activated and may be pulse-wise. In order to make the experiments more easily implementable in practice a maximum of two pulses is allowed.

• The maximum number of sampling times will be 15 and they may be optimally located.

• The experimental noise corresponds to a maximum variance of the 10%.

• The reference value for the parameters in the ℱ (Eqn. 14) corresponds to the **
μ
**

Regarding the ℱ based criteria for optimal experimental design, the D- and E-optimality criteria are the usually preferred ones. For this particular example, and attending to the eccentricity values corresponding to

The new experiment consists of performing two pulses and 15 optimally located sampling times (see Figure **
μ
**

Experiments performed throughout the identification procedure

**Experiments performed throughout the identification procedure**.

The estimations of _{3}, _{1 }and _{1a
}are now satisfactory with less than 0.5% error with respect to the nominal value and expected uncertainties of around the 10%. Next step is to compute a new optimal experimental design for the remaining parameters by using **
μ
**

Table _{5 }with a value of around 17% which is quite reasonable. In addition the maximum eccentricity is now of 5.6, thus being the correlation among the parameters substantially reduced from the first experiment. Figure

Summary of the practical identifiability analysis for the successive experimental schemes: a) Predicted maximum uncertainty of the given parameter in %, b) Relative distance between the mean and the nominal value of the parameters in %.

**a)**

**b)**

_{3a}

**5.40**

_{3a}

**0.02**

_{4a}

**9.07**

_{4a}

**0.66**

_{5}

40.8

- 32.3

**
**

_{5}

0.26

0.38

**0.6**

_{1}

20.1

18.0

**10.7**

_{1}

0.46

0.19

**0.18**

_{2}

44.0

14.9

**7.85**

_{2}

1.97

- 0.51

**0.25**

_{3}

33.7

**5.47**

_{3}

0.95

**0.10**

_{
prod
}

53.7

23.8

**13.2**

_{
prod
}

2.90

0.42

**0.05**

_{
deg
}

- 60.8

26.3

**15.6**

_{
deg
}

- 3.41

0.44

**0.03**

_{1}

16.9

**10.4**

_{1}

0.26

0.12

_{1a}

18.1

**8.94**

_{1a}

0.27

0.40

Underlined values represent the worst value for the given experimental scheme. Bold face values represent the best value achieved for each parameter at the end of the identification procedure.

Figure

Expected uncertainties for all parameters at the end of the identification procedure

**Expected uncertainties for all parameters at the end of the identification procedure**. Red line indicates the nominal value of the parameter, blue line indicates the mean value for the given experiment and yellow line indicates the estimated expected uncertainty.

Illustrative examples of the evolution of the robust uncertainty ellipses for several pairs of parameters

**Illustrative examples of the evolution of the robust uncertainty ellipses for several pairs of parameters**. _{prod}-_{deg }the most correlated parameters in all experimental schemes, _{1}-_{1a }the less correlated parameters in _{1}-_{2 }the less correlated parameters in

Conclusions

It has been largely recognized that solving the solution of parameter identification problems becomes harder with the size of the problem, particularly when the ratio between the number of observables and experimental data and the number of parameters is low, since these induce multimodality and lack of structural and/or practical identifiability.

This research describes an iterative identification procedure for non-linear dynamic biological models that is intended to improve parameter identification, i.e. to reduce the dimensionality of the problem when possible and to improve identifiability properties, and therefore to avoid premature (and probably wrong) conclusions about the explanatory and predictive capabilities of a particular model. The procedure involves the following steps: structural and practical identifiability analysis, global ranking of parameters, parameter estimation using efficient global optimization techniques and optimal experimental design.

As an illustrative example, we considered parameter estimation of the model describing the NF-

The methodology described here has been implemented in a software toolbox, AMIGO, which is available from the authors upon request.

Authors' contributions

EBC and JRB contributed to the conception and design of the work. EBC implemented the iterative identification procedure, performed the computations and drafted the manuscript. AAA and JRB gave valuable advises and helped to draft the manuscript. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by the Spanish MICINN project "MultiSysBio", Ref. DPI2008-06880-C03-02.