Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA

Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi, Bangkok 10150, Thailand

Departments of Computer Science and Mathematics, Virginia Tech, Blacksburg, Virginia 24061, USA

Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, Virginia 24061, USA

Abstract

Background

Parameter estimation from experimental data is critical for mathematical modeling of protein regulatory networks. For realistic networks with dozens of species and reactions, parameter estimation is an especially challenging task. In this study, we present an approach for parameter estimation that is effective in fitting a model of the budding yeast cell cycle (comprising 26 nonlinear ordinary differential equations containing 126 rate constants) to the experimentally observed phenotypes (viable or inviable) of 119 genetic strains carrying mutations of cell cycle genes.

Results

Starting from an initial guess of the parameter values, which correctly captures the phenotypes of only 72 genetic strains, our parameter estimation algorithm quickly improves the success rate of the model to 105–111 of the 119 strains. This success rate is comparable to the best values achieved by a skilled modeler manually choosing parameters over many weeks. The algorithm combines two search and optimization strategies. First, we use Latin hypercube sampling to explore a region surrounding the initial guess. From these samples, we choose ∼20 different sets of parameter values that correctly capture wild type viability. These sets form the starting generation of differential evolution that selects new parameter values that perform better in terms of their success rate in capturing phenotypes. In addition to producing highly successful combinations of parameter values, we analyze the results to determine the parameters that are most critical for matching experimental outcomes and the most competitive strains whose correct outcome with a given parameter vector forces numerous other strains to have incorrect outcomes. These “most critical parameters” and “most competitive strains” provide biological insights into the model. Conversely, the “least critical parameters” and “least competitive strains” suggest ways to reduce the computational complexity of the optimization.

Conclusions

Our approach proves to be a useful tool to help systems biologists fit complex dynamical models to large experimental datasets. In the process of fitting the model to the data, the tool identifies suggestive correlations among aspects of the model and the data.

Background

The challenges facing molecular systems biologists include the development of accurate mathematical models of complex biological processes

Our focus in this study is parameter estimation of a nonlinear and high-dimensional ODE model (>100 model parameters) that is constrained by a large number of dissimilar experimental observations. The non-differentiable nature of our objective function (described in the next section) led to our choice of a stochastic global optimization approach

Parameter estimation is not only about finding an “optimal” set of parameter values for fitting a collection of experimental observations. During the course of the global optimization procedure, we expect to find many different parameter vectors that do equally well (or nearly as well) as the best one. Working with this sample of “quite good” sets of parameter values, we can quantify how well the experimental data constrain individual parameter values. We can distinguish critical parameters (highly constrained by the data) from irrelevant parameters (those that have little bearing on optimization of the objective function)

Our research group has been interested for many years in the molecular mechanisms controlling the cell division cycle of budding yeast. The main events of the cell cycle (DNA synthesis and mitosis) are controlled in budding yeast, and indeed in all eukaryotic cells, by a family of protein kinases called cyclin-dependent kinases (CDKs)

In this study, we present an optimization procedure to maximize the number of strains for which the model correctly captures viability or inviability (Figure

Schematic of the parameter optimization and model reduction approach

**Schematic of the parameter optimization and model reduction approach.**

Methods

Problem formulation

In this paper, we focus on biochemical reaction networks modeled by nonlinear ODEs. Typical models of these networks that are considered high dimensional, at the present time, consist of 10–100 ODEs defined in terms of ∼100 (or more) rate constants and other numerical parameters. The models are developed in light of and the parameters are constrained on the basis of large collections of experimental data, which characterize the behavior of cells under a wide variety of experimental conditions. The data are rarely replicate measurements of time courses of biochemical variables; the sort of ideal data assumed in many optimization methods. Rather, the data are often a disparate collection of quantitative measurements and qualitative observations on a number of different mutant strains under a wide variety of conditions. In this context, a data-fitting algorithm must be able to search a high-dimensional parameter space for parameter vectors that are consistent with as much of the data as possible. In our case, we characterize the data as a set of ^{
th
} constraint (_{
i
}=1) or not (_{
i
}=0), and the total objective function that we seek to maximize is

Using this collection of optimal (or near optimal) parameter vectors, our second aim is to characterize the roles of specific parameters and specific experiments in the data-fitting exercise. Looking at the sensitivity of experimental constraints with respect to parameter variations, we distinguish “critical” parameters, which have strong effects on the total objective function, from “dispensable” parameters, which have little or no effect on the total objective function. We also distinguish “fragile” phenotypes, which are most often broken (i.e., incorrectly simulated) under parameter variations, from “robust” phenotypes, which are correctly simulated even when parameter values are widely perturbed. These distinctions provide insights into the relationships between the model and the data, and they also allow us to reduce the complexity of the model (by eliminating dispensable parts) and the computational demands of the algorithm. Finally, we look at competition (negative correlations) between experimental constraints (phenotypes). If two phenotypes compete with each other, then it is difficult for the model to account simultaneously for both. The list of most competitive phenotypes suggests places where the structure of the model may be incorrect or the experimental observations may be suspect.

A mathematical model of the budding yeast cell cycle

The cell cycle is the sequence of events by which a growing cell replicates all its components and divides them more-or-less equally between two daughter cells, so that the daughters inherit all the machinery and information necessary to repeat the process

A mathematical model of this reaction network was developed by Chen et al.

We provide descriptions of the 126 model parameters and 26 model variables (Additional file

**Supplementary Tables.** This pdf file includes the supplementary tables referred to in the main text.

Click here for file

**viarray.txt, integrate.cpp, runOPTIMAL.m, runTL.m, integrate.mexw64, OPTIMAL.txt and TLset.txt.** “viarray.txt” holds the viability array of the 119 strains (array values of 1 for viable strains and 2 for inviable strains), “integrate.cpp” is the C subroutine for the ODEs and, integrate.mexw64 is the mex file that allows Matlab to use the C subroutine for solving the ODEs. Altogether, these files reproduce the number of hits with the parameter values from the best performing DE run (OPTIMAL.txt from Additional file

Click here for file

The parameter estimation algorithm

We start our search of parameter space from a point supplied by the modeler (initial guess). We assume that the starting point is a reasonable (but not particularly good) estimate of the parameters. That is, the starting parameter values are consistent with some but not all experimental constraints, and we expect that a much better parameter vector is in the neighborhood. In our case, the initial guess is consistent with 60% of the mutant phenotypes, and we plan to search in a hypercube (e.g. ±40% or ±90%) around the starting point. First, we explore this domain by Latin Hypercube (LH) sampling, as described in detail in the Additional file

**Supplementary Text.** This pdf file includes a detailed description of Latin Hypercube sampling.

Click here for file

For the second phase of the search, we use differential evolution (DE) to improve the performance of the LH-derived population of parameter vectors

To be precise, let **x** be a vector of parameter values, with components _{
i
}, **x** includes both the 126 kinetic constants in the model and the 26 ODE initial conditions described above; hence, **x**) is an integer-valued function that counts the number of phenotypes that are correctly captured by the model given the parameter values in the vector **x**. (Notice that we sometimes refer to a particular parameter vector as a “set of parameter values”).

During DE, parameter vectors are propagated from generation to generation by processes of diversification and selection. Each generation (indexed by **x**
_{
j
}(_{
i,j
}(^{
th
} parameter in the ^{
th
} parent in the ^{
th
} generation. Let **u**
_{
j
}(^{
th
} parent in the ^{
th
} generation. The components of this vector, _{
i,j
}(**x**
_{
j
}(**u**
_{
j
}(

The specific rules are:

1. Mutation: First, we create a “mutant” vector **v**
_{
j
}(**x**
_{
j
}(

By analogy to biological evolution, we might let the components of **d**
_{
j
}(^{′} and ^{′′}, chosen at random from the

where 0 <

2. Crossover: Next we allow for crossover between the parental parameter vector **x**
_{
j
}(**v**
_{
j
}(**u**
_{
j
}(

where rand (0,1) is a random number chosen uniformly from the interval [0,1]. We choose

3. Selection: The objective function determines whether, **x**
_{
j
}(**u**
_{
j
}(

With the “non-greedy” version, the selection condition is **u**
_{
j
}(**x**
_{
j
}(

In a few hundred generations, DE produces an elite set of parameter vectors that reproduce the behavior of nearly all the experimental constraints despite the suboptimal performance of the starting point of the optimization.

All computations were performed in the Advanced Research Computing lab at Virginia Tech. The computational time was ∼4 minutes for a single generation of DE (19 parameter vectors and 119 simulations per vector) and ∼20 minutes for 100 LH samples (12 seconds per sample). Computation time could be significantly reduced by parallel computing, e.g., 500 generations of DE, which took ∼33 hours in our code, could be completed in ∼1 hour by using 33 processors in parallel. Such a reduction may be important in the future when we impose additional constraints on the model.

In concluding this section we note that, in addition to varying the values of

Results and discussion

Rapid evolution to high-scoring parameter vectors

We performed LH sampling around the starting point (initial guess) in a hypercube formed by ±40% perturbations on each parameter value. To create 100 sample points inside this hypercube, each parameter range is divided into 100 subintervals (see Additional file **x**
_{
j
}(0),

First step of the optimization: Latin Hypercube sampling

**First step of the optimization: Latin Hypercube sampling.** Feasible and infeasible parameter vectors during LH sampling (green and black dots), and the starting point of the parameter search (red dot). This is a projection of the whole parameter space onto the axes of two model parameters (basal Pds1 synthesis and basal Cdc15 inactivation rates).

Second step of the optimization: differential evolution

**Second step of the optimization: differential evolution.** Starting population (green dots, from Figure

Evolution of the objective function during optimization

**Evolution of the objective function during optimization.** Increase in the value of the objective function (number of mutants captured) during DE runs with population size _{max} = 119). Green and blue lines: Number of mutants captured in two independent DE runs with the same initial population of 19 parameter vectors. Different random number sequences in mutation and crossover operations in these runs.

Varying the settings of the optimization procedure

The initial phase of LH sampling can be quite variable in its outcome. For example, when we resampled the ±40% hypercube around the initial guess with 50 LH samples, we found

Effects of population size and criterion used in LH sampling on the optimization

**Effects of population size and criterion used in LH sampling on the optimization.** Increase in the number of mutants captured during independent DE runs for population sizes _{max} = 119). Red line: WT viability in glucose is not enforced during LH sampling phase.

We also investigated how the size of the LH affects the performance of DE, by starting with hypercubes generated by ±20%, ±40% and ±90% perturbations around the initial guess. In each case, we started the DE with a population of 19 parameter vectors generated by LH sampling without enforcing the viability of wild type cells in glucose. As illustrated in Figure

Effects of hypercube size used in LH sampling on the optimization

**Effects of hypercube size used in LH sampling on the optimization.** Increase in the number of mutants captured with initial populations generated by ±20% (red lines), ±40% (green lines) and ±90% (blue lines) perturbations around the starting parameter vector. Once again, the independent runs for each perturbation setting have identical initial populations, but different random number sequences used in the mutation and crossover operations in DE.

**Supplementary Figures.** This pdf file includes two additional figures.

Click here for file

Another variation in the algorithm is the criterion we use for deciding when a trial parameter vector can replace a parent. As shown in Figure

Effects of selection strategy on the optimization

**Effects of selection strategy on the optimization.** Performance comparison of greedy and non-greedy selection rules. Solid lines: non-greedy selection. Dashed lines: greedy selection. All six runs start with the same initial population but with different random number sequences in mutation and crossover operations.

We also investigated the effect of the starting point on the performance of our optimization procedure. Starting from the initial guess, we ran DE for 200 generations without enforcing any improvement (random walks) and randomly picked a parameter vector from the last generation as a potential starting point. Repeating this process two more times gave us three new starting points for optimization with 54, 69 and 57 hits. These new starting points differed from the initial guess by ∼25% across all parameter values. Starting from these three points, we used both 40% and 90% LH sampling, followed by DE with non-greedy selection. The success rates of these runs (Table

**Initial search point**

**Initial # hits**

**Final # hits**

**Final # hits**

**indicator**

**(40% LH)**

**(90% LH)**

Optimization results with different initial search points used in LH sampling.

0

72

105–108

104–109

(3 runs, Figure

(3 runs, Figure

1

69

107

103

2

57

104

107

3

54

101

106

A further variation of our parameter estimation study involved stopping DE at a particular generation, grabbing the best population member, resampling around it with the LH approach and continuing DE. Here, we focused on the ten worst performing runs in Additional file

Robustness of the model

As an indicator of the model’s robustness with respect to the phenotype of a particular yeast strain we introduce the “acceptance ratio” for an experimental constraint, which is simply the fraction of sample parameter vectors that are consistent with an observed phenotype

By comparison, if we optimize overall success rate, then we find that WT viability is an extremely robust property of the model. For example, we maximize the total number of hits on a population of 19 parameter vectors over 1000 generations, without enforcing WT viability in the LH and DE stages. In the collection of 19,000 samples (trial parameter vectors, some of which did not replace the parents), the acceptance ratio for “WT viability in glucose” was 0.9964.

For these three sample sets (19,000 DE samples, 19,000 LH samples and 100 LH samples) we computed the acceptance ratios of all 119 experimental strains and sorted them in ascending order, as shown in Figure

Overall acceptance ratios of phenotypes with DE and LH sampling.

**Overall acceptance ratios of phenotypes with DE and LH sampling.** The

Competition between the experimental constraints

Given the high dimensionality of the model, one may think that it is relatively easy to capture the biological behavior of the majority of the mutants

where _{
i,j
}=1 if parameter vector _{
i,j
}=0. Then, we compute the

where _{
k,l
} is the covariance of the acceptance values of phenotypes

Here, _{
k,l
} quantifies the correlation between the ^{
th
} and ^{
th
} phenotypes. For each phenotype, we compute an overall correlation value ^{
th
} phenotype during the optimization against the remaining phenotypes.

Next, we identify the strongly anticorrelated mutants and explain how they influence the search for model parameters while we maximize the fraction of the phenotypes that are captured. Our focus is on the DE run that resulted in 111 phenotypes being captured (the blue line in Figure

Among the 3415 top-performing parameter vectors in the DE run with 111 hits, there is a common set of eight phenotypes that are not captured. Seven of these missed phenotypes (#46, 48, 55, 66, 67, 74, and 117 listed in Additional file

The eighth missed phenotype (# 12: ^{
th
} most competitive in the 19,000 DE samples. On the other hand, within the 100 LH samples, strain 12 is non-competitive, as illustrated in Additional file ^{
th
}). Additional file

Some of the eight non-captured phenotypes considered here are also among the “most troublesome” strains identified by us during trial-and-error parameter estimation. For example, it is difficult to explain the inviability of strains 66 (

Some of the other non-captured phenotypes point out limitations of our model and/or objective function. Strain 74 (

Using phenotype competitiveness to accelerate the evolutionary algorithm

By ignoring non-competitive phenotypes we can reduce the number of phenotypes that need to be simulated during optimization. To illustrate, we identified the 50 least competitive phenotypes from the

**Group**

**# phenotypes in selection**

**# phenotypes captured**

Optimization results (Four individual runs with groups 1–5) with reduced sets of phenotypes.

0

119

105–111

1

109

106–108

2

99

104–109

3

89

102–109

4

79

66–105

5

69

58–67

Of five individual non–greedy DE runs (which produced 105–111 hits out of 119), we found that they all captured 95 of the 119 phenotypes. Of the 24 phenotypes that were missed by at least one of the five DE runs, none were among the 30 least competitive phenotypes identified by the 100 LH samples. Furthermore, three phenotypes (# 67, 74, and 117 listed in Additional file

Order of events

During DE we have applied only the most basic phenotypic constraint on the simulated strains, namely viability or inviability. In our simulations, viability means that cells divide periodically and that cell mass at division converges to a specific value (±5%). Inviability means that the cell mass exceeds 25, which only happens when the cell becomes arrested in the cell cycle, never dividing but continuing to grow. (Other behavior, such as double-period oscillations, was considered neither inviable nor viable).

Additional constraints could be introduced. For example, for a cell to be viable, not only must it divide periodically at a characteristic size, but also it should execute cell cycle events (origin relicensing, origin activation, spindle alignment, Esp1 activation and cell division) in the correct order. And inviable cells should be checked to see that they have arrested in the observed phase of the cell cycle. In addition, we could check other commonly measured cell cycle properties of mutants: for example, cell size at division, and duration of the unbudded phase of the cell cycle. Although we intend to examine these additional constraints in later studies, in this study we have checked the order of events in all 119 strains produced by the optimum parameter vectors. These parameter vectors reproduced the five events for all experimentally viable mutants in the correct order with the following exceptions:

1. In three viable strains that had no copy of PDS1 (

2. In simulations of three other “viable strains” (

As a result, even with the event-order constraints, our number of hits is only reduced from 111 to 110. Hence, it appears that by selecting according to our simple definition of viability/inviability, the model (in the majority of cases) automatically reproduces the correct sequence of events in mutant strains. This property of the model is an indication that it is correctly representing the sequence of dependencies in the molecular mechanism underlying cell cycle progression in budding yeast.

Sensitivity analysis of the model

Sensitivity analysis is widely applied to study biological systems in order to quantify the robustness of biological behavior to changes in model parameters, to determine the most sensitive model parameters and experimental constraints, and to guide further experimental work and model refinement

Fragile and robust phenotypes

Our objective function is the number of phenotypes successfully simulated by the model for a particular set of parameter values. In this section, we identify the effects of single parameter perturbations on this function in order to identify those parameters to which the model is most sensitive. When perturbed, these “critical” parameters cause the loss of already captured phenotypes more frequently than non-critical parameters. In addition, because the objective function encompasses all experimental constraints, we can look for links between individual parameters and individual genetic strains. For large and complex networks, such as the budding yeast cell cycle, the identification of such input-output relationships can be challenging and counterintuitive

Our approach to sensitivity analysis is to produce a large sample of perturbations away from the best parameter vectors identified by DE. We then ask of this sample which parameters—when perturbed—cause the most drastic loss of correctly simulated phenotypes; these are the critical parameters. The phenotypes most often incorrectly simulated are the fragile phenotypes. To generate the sample of parameter vector perturbations, we must first choose a representative collection of parameter vectors that are most successful in capturing phenotypes. The most successful DE run (non-greedy Run 1, in Figure

Recall that each parameter vector consists of 126 kinetic constants (Additional file

Starting from each of the 15 successful parameter vectors chosen from the DE run, we introduced one of the parameter perturbations to create a “new” parameter vector. We then simulated WT cells in glucose (all our parameter vectors at this point define models that reproduce WT viability). Next, we simulated each of the 118 other strains by adjusting the parameter values in the “new” parameter vector according to the rules that mimicked these strains. In total, from the 15 initial parameter vectors we created 20115 new parameter vectors (15 sets × 9 perturbation levels × 149 varying model parameters) and ran 119 simulations per vector. Out of these 2,393,685 simulations, we discarded the ones where setting a parameter to zero was not sensible, leaving us with 2,393,461 to assess model robustness. The total number of hits among the 20115 parameter vectors generated after perturbations in individual parameters ranged from 54 to 111. Although some of the eight phenotypes that were missed by optimization were captured by some of these perturbations, the overall number of hits never exceeded 111. These eight phenotypes (# 12, 46, 48, 55, 66, 67, 74, and 117 listed in Additional file

We computed the number of times each of the 111 captured phenotypes was lost after a parameter perturbation, and then we ranked the 111 phenotypes according to their frequency of loss (“fragility”). The 20 most fragile phenotypes (18% of the 111 phenotypes) accounted for 46% of the total number of losses, and each contributed at least 1.5% to this total. Only four of these phenotypes are experimentally inviable, which indicates that viability is more vulnerable to parametric perturbations. On the other hand, out of the 33 single-mutation phenotypes captured before perturbations, only four of them are among the most fragile 20 phenotypes. This prediction aligns with the increasing fragility observed in biological systems with increasing number of structural changes (mutations)

**Strain name**

**Phenotype**

**Percent of total losses**

These are the phenotypes that are most often lost (i.e., incorrectly simulated) when perturbations are applied to individual model parameters. Fragility decreases from top to bottom.

Viable

3.38

Viable

3.19

Viable

3.16

Viable

2.85

Inviable

2.84

Viable

2.74

Inviable

2.70

Viable

2.40

Viable

2.39

Viable

2.35

Viable

2.31

Viable

2.24

Viable

2.03

Viable

1.93

Viable

1.84

Inviable

1.72

Viable

1.69

Inviable

1.67

Viable

1.54

Viable

1.53

The 33 most robust phenotypes are all inviable. The first viable phenotype that is also robust is ranked 34^{
th
} on the list, and there are nine other robust viable phenotypes among the top 45 robust phenotypes (see Table

**Rank**

**Strain name**

**Phenotype**

**Percent of total losses**

These are the viable phenotypes that are least often lost (i.e., incorrectly simulated) when perturbations are applied to individual model parameters. Robustness decreases from top to bottom.

34

Viable

0.51

36

Viable

0.53

37

Viable

0.53

39

Multicopy

Viable

0.57

40

Viable

0.60

41

WT in glucose

Viable

0.61

42

WT in galactose

Viable

0.62

43

Viable

0.62

44

Viable

0.62

45

Viable

0.62

Critical and dispensable parameters

We used the same data set to analyze model robustness with respect to parameters. For each model parameter, we counted the number of times a perturbation caused the loss of a phenotype, and ranked parameters according to their ability to affect our objective function (the total number of captured phenotypes). “Critical” parameters are parameters that when perturbed cause frequent losses of phenotypes. On the other hand, parameters that can be perturbed with little or no change in our objective function are clearly “dispensable” parameters to the optimization process considered here. The 20 most critical parameters, which account for 44% of the total losses, are listed in Table

**Parameter name**

**Percent of losses**

Most critical model parameters that had the largest effects on the objective function upon perturbations. Criticality decreases from top to bottom.

Total amount of Cdc14

3.14

SPN synthesis rate

2.81

Total amount of Esp1

2.60

Total amount of Net1

2.57

Degradation rate of Cdc20

2.53

PPX inactivation by Esp1

2.51

Efficiency of Cdc14-Net1 complex (RENT) formation

2.50

Time scale for protein activation

2.41

Net1 phosphorylation by Clb2

2.30

Total amount of Mcm1

2.25

Transcriptional activation of

2.17

Transcriptional activation of

2.06

Sigmoidicity of protein activation

1.99

Degradation rate of Swi5

1.99

CKI phosphorylation rate

1.96

Cdh1 inactivation rate

1.83

Total amount of SBF

1.79

Clb2 degradation by active Cdc20

1.76

Polo activation by Clb2

1.67

Synthesis rate of Bck2

1.66

Twelve of the 20 most critical parameters are involved in the EXIT module. Whereas EXIT module parameters are critical in terms of capturing phenotypes, viable strains with mutations in EXIT module genes are highly robust to perturbations when the effects are summed over all parameters in the model. These contrasting results underscore the difference between critical parameters and robust strains. The evaluation of model parameters and phenotypes is performed by looking into different sets of outputs. A robust strain (which is insensitive to changes in a majority of the parameters) is identified by taking perturbations in all parameters into account. On the other hand, a critical parameter (causing the loss of more phenotypes than other parameters once it is perturbed) is identified by taking all phenotypes into account.

We also identified the 70 least critical model parameters, i.e., those parameters with little or no effect on the objective function. To determine if some of these parameters are dispensable as far as our optimization problem is concerned, we constructed a series of “reduced models” by setting more and more of these least-critical parameters to zero. The results, presented in Table

**Reduced model #**

**# Model parameters**

**# Hits**

Optimization results with the full model and the reduced models (two individual runs for each reduced model). The least sensitive 50 parameters are set to zero in Reduced Model 1, whereas the least sensitive 60 and 70 parameters are set to zero in Reduced Models 2 and 3, respectively.

0

152 (full model)

105–111

1

102

107–107

2

92

93–97

3

82

94–96

In an independent repeat of this sensitivity analysis, 49 of the top 50 least critical parameters were the same, and in another repeat after an independent DE run that produced 1803 parameter vectors with 110 hits, 44 of the 50 were the same. It is noteworthy that 25 initial conditions (all except the initial mass) and 6 BUD related parameters always formed the least critical 31 parameters. In hindsight, this is not surprising, because the initial conditions are used only to start up the simulation of a WT cell in glucose. If the WT cell is viable, we replace the “initial” ICs by the values of all variables in a newborn WT cell for all further strain simulations. So the “initial” ICs have no bearing on the calculation of the objective function. Initial mass plays a different role, because if it is too large or too small, then the WT cell may not be correctly simulated. Regarding BUD-related parameters, the BUD variable is part of the model in order to time the appearance of the bud in relation to the onset of DNA synthesis (the ORI variable), but the BUD variable has no effect on further progress through the cell cycle. Hence, the BUD-related parameters have no effect on the viability or inviability of simulated strains. In future versions of the model, where the timing of budding events will enter into the objective function, the BUD-related parameters will no longer be dispensable. Of the remaining 50 least critical parameters (Additional file

Strongly connected phenotype-parameter pairs

We also computed the number of times a model parameter, upon perturbation, caused the loss of a specific captured phenotype. The ten most strongly connected mutant-parameter pairs are listed in Table

**Phenotype**

**Perturbed model parameter**

**Probability of phenotype loss**

Upon a perturbation, with at least 0.87 probability, these parameters caused the loss of the corresponding phenotype (all phenotypes are experimentally viable).

Total amount of Cdc14

1.00

Total amount of Net1

0.99

Basal SBF dephosphorylation

0.93

SBF-dependent Cln2 synthesis

0.90

Total amount of Esp1

0.89

CKI phosphorylation rate

0.88

Efficiency of Cdc14-Net1 complex (RENT) formation

0.88

PPX inactivation by Esp1

0.87

Degradation rate of Cdc20

0.87

Total amount of Net1

0.87

Novel phenotypes predicted by the elimination of phosphorylation/dephosphorylation reactions

It is interesting to note that there are only two phosphorylation/dephosphorylation rates among the 20 most critical parameters in Table

**Eliminated reaction**

**Impacted single mutation strains that are**

**viable (inviable) before (after) perturbation**

Upon setting phosphorylation/dephosphorylation rate constants to zero (left column), viability is lost in several single mutation strains (right column).

Whi5 phosphorylationby Bck2

CKI phosphorylation by Cln2

CKI phosphorylation by Clb2

CKI dephosphorylation by Cdc14

Whi5 phosphorylation by Cln3

SBF phosphorylation by Clb2

Whi5 phosphorylation by Cln2

Whi5 dephosphorylation by Cdc14

Net1 dephosphorylation by PPX

Multicopy

The three genetic strains that are most commonly affected by elimination of these nine specific phosphorylation/dephosphorylation reactions are

Conclusions

The physiological characteristics of a living cell—for example, how it progresses through the cell division cycle, or how it responds to external stimuli, or how it develops within a specialized tissue—depend ultimately on the dynamical properties of macromolecular regulatory networks. The dynamics of these networks can be described accurately by systems of differential equations (in a deterministic setting) or sets of reaction probabilities (in a stochastic setting). In principle these systems of equations can be simulated numerically and the results compared to the behavior of living cells under a variety of experimental conditions. Unfortunately this vision of a grand theory of molecular cell biology is subverted by the “curse of parameter space”. Any realistic model of a functional cellular control system will contain dozens of interacting genes, proteins and metabolites and many dozens of rate constants (or reaction probabilities), which are generally unknown. A major part of the challenge of model building in molecular systems biology is to estimate the system parameters from the available experimental data and, in the process, to assess how well the data constrains the model and how well the parameterized model accounts for the data and makes reliable predictions of future experiments. It is in this context that systems biologists need practical approaches to parameter estimation. Brute force exploration of parameter space is not an option. For a model with 100 parameters, even to evaluate parameter vectors at each corner of a hypercube bounding a feasible region of parameter space would take approximately 10^{30} evaluations, a number well beyond foreseeable computational power. In light of this fact, any optimization procedure must propose ways to trade off wider exploration of parameter space for denser sampling of more promising regions found by previous sampling.

Furthermore, the optimization problem is itself a moving target. New experimental observations are continuously being reported, forcing the model to adapt and change. There is no point in employing a heavy duty optimization procedure that provides a single “optimal” solution (at great expense) for a problem that may be outdated tomorrow. What modelers need are computationally light optimization approaches that can help them make quick and flexible progress. The approach we are proposing, based on Latin hypercube sampling and differential evolution, appears well suited to the task. In addition, our approach can be of great use to modelers by highlighting parts of the model where the structure may be insufficient (or overly complex) to explain the observed data.

We believe that our approach is a practical and informative strategy for studying how the assignment of parameter values affects the ability of a complex reaction network to account for extensive data sets. Of course, other approaches are possible, and it is difficult to compare the relative merits and liabilities of various approaches. The only study directly comparable to ours, in the sense of estimating a large number of cell cycle parameters based on mutant phenotype data, is a paper by Panning et al.

In conclusion, we have presented a parameter estimation approach for high dimensional ODE models with a large number of experimental constraints. Our approach, which makes use of established parameter sampling and optimization tools, is quite successful in locating points in the parameter space with high model performance, even when the starting point of the search is quite “suboptimal”. Sensitivity analysis of the objective function provides additional information on “critical” and “dispensable” parameters and on fragile and robust phenotypes; information that suggests directions for model refinement. We also used random sampling to measure “competition” between experimental constraints, information that is useful to streamline the number of simulations needed for parameter estimation and to assess trade-offs in a model’s ability to reproduce a given set of experimental observations. Finally, we identified rare parameter–mutant combinations to highlight particular fragilities in the system to illustrate usefulness of our approach to predict novel phenotypes and design new experiments. To conclude, we demonstrated that these methods of parameter estimation and analysis can be a powerful tool to propel systems biology research forward.

Abbreviations

APC: Anaphase promoting complex; CDK: Cyclin-dependent kinase; DE: Differential evolution; IC: initial conditions; LH: Latin hypercube; ODE: Ordinary differential equation.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Conceived and designed the model: CO, TL, KCC, WTB, JJT. Performed the optimization, model reduction and analysis: CO. Wrote the paper: CO, WTB, LTW, JJT. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by an NIH Grant (R01 GM078989-06) to JJT and WTB. We are grateful to the Advanced Research Computing Lab at Virginia Tech for computing resources. We also thank Tyson Lab members, especially Dr. Alida Palmisano and Dr. Janani Ravi for useful discussions. Finally, we thank the anonymous reviewers for their comments that improved the manuscript. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.