Abstract
Background
Statistical approaches to describing the behaviour, including the complex relationships between input parameters and model outputs, of nonlinear dynamic models (referred to as metamodelling) are gaining more and more acceptance as a means for sensitivity analysis and to reduce computational demand. Understanding such inputoutput maps is necessary for efficient model construction and validation. Multiway metamodelling provides the opportunity to retain the blockwise structure of the temporal data typically generated by dynamic models throughout the analysis. Furthermore, a clusterbased approach to regional metamodelling allows description of highly nonlinear inputoutput relationships, revealing additional patterns of covariation.
Results
By presenting the Nway Hierarchical Clusterbased Partial Least Squares Regression (Nway HCPLSR) method, we here combine multiway analysis with regional clusterbased metamodelling, together making a powerful methodology for extensive exploration of the inputoutput maps of complex dynamic models. We illustrate the potential of the Nway HCPLSR by applying it both to predict model outputs as functions of the input parameters, and in the inverse direction (predicting input parameters from the model outputs), to analyse the behaviour of a dynamic model of the mammalian circadian clock. Our results display a more complete cartography of how variation in input parameters is reflected in the temporal behaviour of multiple model outputs than has been previously reported.
Conclusions
Our results indicated that the Nway HCPLSR metamodelling provides a gain in insight into which parameters that are related to a specific model output behaviour, as well as variations in the model sensitivity to certain input parameters across the model output space. Moreover, the Nway approach allows a more transparent and detailed exploration of the temporal dimension of complex dynamic models, compared to alternative 2way methods.
Keywords:
Parameterphenotype map; Dynamic models; Metamodelling; Nway Partial Least Squares Regression; Hierarchical analysis; HCPLSR; Clusteranalysis; Inputoutput relationships; Circadian clockBackground
Dynamic models in systems biology as well as in other fields become increasingly complex as more detailed knowledge is incorporated. The massive presence of nonlinear relationships between their highdimensional parameter and solution spaces is a key characteristic of such systems. Moreover, dynamic models typically generate multidimensional blocks of temporal data. Clearly it is very challenging to obtain a comprehensive overview of the behavioural repertoires of such models across the highdimensional input parameter space, including the sensitivity of the model output to changes in the various input parameters, as well as interactions between input parameters and correlation patterns between model outputs. For dynamic model construction and validation, sound handling of such information is crucial. Since most of the existing methods for parameter estimation and sensitivity analysis are appropriate only for systems of relatively low output dimensionality and typically focus on one output variable at a time 12, a generic methodology for analysis of model behaviour that is able to handle the entire range of model complexities and give a comprehensive overview of the relationships between the input parameters and all model outputs, is sorely needed.
Statistical approaches are gaining acceptance as a means for analysis of inputoutput relationships of complex dynamic models 2345678910, and statistical emulation of dynamic models (metamodelling 11) has been demonstrated to be a useful tool both for speeding up computations 12 and as a basis for sensitivity analysis 2313 and uncertainty assessment 141516. Multiway (Nway) methods have previously been shown to be effective for data integration in e.g. systems biology 1718. We therefore hypothesise that Nway approaches will be especially advantageous for metamodelling of dynamic models due to the capability of integrating temporal data from several output state variables simultaneously while retaining the information about which state trajectory that corresponds to which state variable throughout the analysis (with 2way methods, this information is lost when concatenating the trajectories for the different state variables prior to the analysis). Consequently, a more detailed exploration of the temporal dimension of dynamic models is possible. This is important in order to obtain a comprehensive overview of how variation in the input parameters is manifested in the model output. Moreover, methods utilising several model outputs simultaneously have already been demonstrated to reduce the model sloppiness by imposing more constraints on the system 5.
The Nway Hierarchical Clusterbased Partial Least Squares Regression (Nway HCPLSR) presented here is designed for efficient handling of blockwise nonlinear data structures, and works by combining several regional Nway Partial Least Squares Regression (NPLSR) 19 models within which the mappings between input parameters and output state variables can be more adequately represented than in a global NPLSR model. The nonlinear capabilities of this metamodelling approach are obtained by combining clustering and generation of local linear metamodels for the various cluster regions. This is an Nway extension of our previously published method Hierarchical Clusterbased Partial Least Squares Regression (HCPLSR) 8. HCPLSR is based on separate (2way) PLSR 20212223 analyses of distinct regions of the parameter space (where the resulting regression coefficients are measures of the model sensitivity to the different input parameters), while in Nway HCPLSR, the separated regions are defined according to the dynamic behaviour of the output state variables and the output from the dynamic model is represented as an Nway array (in our example the number of ways or modes N=3; observations×state variables×time points of the state trajectories (see Figure 1)). A common metamodel based on all state variable trajectories can thereby be generated. This allows simultaneous analysis of nonlinear relationships between all model outputs and input parameters of complex dynamic models, in a lowdimensional subspace spanned by estimated latent variables (called NPLSR factors). The NPLSR factors represent the features that are most important for the covariance between the inputs and outputs (see Additional file 1: Section S1 for a description of the NPLSR methodology). The method is therefore suited for visualising covariance structures both within and between the input parameters and the model outputs, and thereby also useful for finding and removing possible redundancies (leading to model reduction) and prioritizing experiments for e.g. model validation by identification of key inputs and outputs describing the system behaviour. Hence, this can also guide experimentalists in the choice of quantities to measure when studying a biological system. Moreover, the NPLSR approach provides considerable dimension reduction possibilities through projection of the data into a lowdimensional subspace. In our opinion, this methodology should be considered as particularly useful in future multivariate metamodelling for analysis of complex spatiotemporal models. In Nway metamodelling, the three spatial dimensions can be included as separate modes in the Nway analysis, in addition to the temporal dimension on which we focus in this paper. The spatial structure of the data can thereby be kept throughout the analysis. We hypothesise that this will be a great advantage in the analysis of spatiotemporal models.
Additional file 1. S1. Description of the multivariate metamodelling methodology; S2. Statistics of the global classical and inverse metamodels of the mammalian circadian clock model; S3. Supplementary sensitivity analyses of the mammalian circadian clock model; S4. Results from the method benchmarking. Additional file 1 includes references 578111920212223293435363738404142434445464748495051525354555657.
Format: PDF Size: 1MB Download file
This file can be viewed with: Adobe Acrobat Reader
Figure 1. Illustration of the Nway data structure used in Nway HCPLSR. Illustration of the data structure used in Nway metamodelling. Here the number of modes (ways) N = 3, where the first mode is the different simulations carried out using varying parameter combinations and/or initial conditions, the second mode is the various state variables of the analysed dynamic model and the third mode is the trajectories of the state variables. Hence, the data is here represented as a 3way array. However, using more than three modes is possible. The decomposition of the 3way data is described and illustrated in Additional file 1: Section S1.
Traditionally, metamodelling is carried out in the causal direction, predicting model outputs as functions of the input parameters using e.g. regression methods. Application of metamodelling in the reverse direction is, however, also of potential interest 5. The two modelling directions can be understood as extensions of the classical/inverse calibration modelling 22. Accordingly, we refer to the causal direction as classical metamodelling, and the reverse direction as inverse metamodelling, and in our application of the Nway HCPLSR we demonstrate how their combination provides more detailed insight into the complexity of the mapping between input parameters and model outputs. Inverse metamodelling may also facilitate fitting of nonlinear models to large amounts of experimental data. Given that the results from the computations can be substituted with relevant experimentally measured data or quantities calculated from measurements, these can be used to predict corresponding parameters. Moreover, combinations of classical and inverse metamodelling can identify the key metrics to measure in order to validate the models. A more comprehensive introduction to the multivariate metamodelling methodology is given in Additional file 1: Section S1.
As long as they handle highdimensional data with nonlinear relationships and yield interpretable representations, a wide variety of statistical methods can be effectively used for multivariate metamodelling. We have recently shown that multivariate metamodelling based on PLSR and our nonlinear extension HCPLSR 8 provides good approximations of the inputoutput mappings 8 as well as informative insight into complex interaction patterns between parameters 9 of advanced nonlinear dynamic models. PLSR can use multiple response variables simultaneously and utilise intercorrelations between them for model stabilisation. PLSR analysis has been shown to effectively reveal covariation patterns in large and complex data sets, and extract correlations between possibly noisy and partially redundant input variables and outputs 6. The success of PLSR in the context of sensitivity analysis and for constraining input parameter values from dynamic model outputs has also been demonstrated by Sobie et al. 56. Highly nonlinear inputoutput structures may, however, be difficult to model adequately with linear models such as PLSR, even with polynomial extensions. To confront these problems, HCPLSR was introduced 8. Heterogeneity in model sensitivity to certain parameters between various regions in the parameter space of a dynamic model of the mouse ventricular myocyte was identified by HCPLSRbased sensitivity analysis in 9. Similarly, zooming into different regions of the state variable behavioural domain provides the opportunity to identify regions where the relationship between certain parameters and the model output is less ambiguous, indicating that these parameters are especially important for defining a specific type of temporal model behaviour. In cases where variation in the input parameters can be directly related to genotypic variations, this may provide valuable information about how a specific genotype can be of particular importance for the manifestation of certain phenotypic characteristics.
Here, we combine three different aspects of multivariate metamodelling: 1) Description of highly nonlinear inputoutput relationships by regional metamodelling, 2) NPLSR, allowing a retention of a tensor data structure throughout the analysis and 3) Inverse metamodelling in addition to the classical approach, providing more confident conclusions and a more comprehensive model overview. Moreover, particularly complex details are pursued by more detailed metamodelling of individual outputs and their relationships to the varied input parameters. Altogether, this provides a powerful, robust and efficient approach to exploration of the behavioural repertoire of complex dynamic models.
We illustrate our methodology by an application to a complex dynamic model of the mammalian circadian clock developed by Leloup and Goldbeter 24, which is a wellestablished and validated model. Models of circadian rhythms have e.g. been used for identifying mechanisms of chronotolerance and chronoefficacy for anticancer drugs 25. The dynamic model we analyse describes circadian oscillations of cellular activity in conditions of continuous darkness, and consists of 16 coupled ordinary differential equations (ODEs) describing the dynamics of three genes through intertwined positive and negative feedback loops. By combining the classical and inverse approaches of the Nway HCPLSR, we capture several interesting parts of the present complex inputoutput relationships, which are difficult to deduce directly from the model’s differential equations.
Results
In silico data set
The analysed mammalian circadian clock model consisted of 16 linear and nonlinear ODEs coupled together through numerous feedback mechanisms. To analyse the behaviour of this complex nonlinear dynamic model, nine of the model input parameters were systematically varied at eight equally spaced levels each in an Optimised Multilevel Binary Replacement (OMBR) design 726, using the ranges given in Table 1. This resulted in 8192 simulations with the circadian clock model, 99.3% of which (8135 simulations) converged to a stable limit cycle. The results of these 8135 simulations, represented as a 3way array of 8135 observations x 16 state variables (corresponding to the 16 coupled differential equations in the dynamic model) x 200 time points in each trajectory, were related to the matrix of input parameter combinations (8135 x 9) in the metamodelling study described below.
Table 1. Description and range of the parameters varied in the mammalian circadian clock model simulations
A separate test set based on 8192 parameter combinations found by random Monte Carlo sampling 2728 within the same parameter levels as used in the calibration set was also generated, resulting in 8125 converging simulations.
Results from the Nway HCPLSR metamodelling of the mammalian circadian clock model
A combined classical (parameter matrix as X, 3way state trajectory array as Y) and inverse (3way state trajectory array as X, parameter matrix as Y) metamodelling with Nway HCPLSR was carried out, in order to assess the complex inputoutput map of the mammalian circadian clock model. The developed methodology is illustrated in Figure 2 and described in more detail in the Methods section and in Additional file 1: Section S1. The Nway HCPLSR metamodels each contain both a global NPLSR model, and several regional NPLSR models calibrated within clusters corresponding to different model behaviour. Here we present the results from the global NPLSR metamodelling first, and the additional insights provided through the regional metamodelling thereafter. The statistics of the global metamodels can be found in Additional file 1: Section S2.
Figure 2. Illustration of the combined classical and inverse Nway HCPLSR metamodelling. The inverse metamodelling was carried out first, defining the clusters to use also in the classical metamodelling. The classification of the test set observations to be predicted in the classical metamodelling was based on , predicted from (see Additional file 1, eq. S12c for a definition) using second order polynomial Ordinary Least Squares (OLS) regression (called function (.)). See Additional file 1 sections S1.5S1.7 for a more comprehensive description of this methodology, including predicting equations for test set observations. *See Additional file 1, equation S9b. ^{**}C_{A }and C2_{A }were calculated by equation S12b in Additional file 1.
The low percentage explained Yvariance (Additional file 1: Section S2, Figure S3) showed that the global metamodelling was inadequate in both the classical and inverse direction. However, before proceeding to improved, regional metamodelling, the dominating relationships and patterns between the 9 input parameters and the 16 output state trajectories of the mammalian circadian clock model were assessed using the global approximations.
Inputoutput map characteristics revealed by the global classical and inverse metamodels
The dominating inputoutput covariation patterns
In NPLSR, like in other subspace regression methods, the highdimensional data is projected into a lowdimensional subspace spanned by estimated latent variables that represent the most relevant patterns of input (regressor)output (response or regressand) covariation (see Additional file 1: Section S1). The couplings between the original variables and the latent variables are called loadings. The global relationship patterns between the 9 varied mammalian circadian clock parameters (Table 1) and the 16 state variables (Table 2) were assessed through plots of the first three global second mode NPLSR loadings for the output state variables and the input parameters (Figure 3). Variables placed close to each other in the loading plots are positively correlated, while variables placed opposite each other are negatively correlated in the NPLSR factor space.
Table 2. Description of the mammalian circadian clock model state variables
Figure 3. Maps of the global covariance patterns between the circadian clock state variables and input parameters. Global NPLSR second mode loadings (Fac 1Fac 3) for the state variables (red dots) and the parameters (blue dots) from A) the inverse metamodelling and B) the classical metamodelling.
Within the parameter space analysed here, the parameters v_{mB} (maximum rate of Bmal1 mRNA degradation), v_{mC} (maximum rate of Cry mRNA degradation) and v_{mP} (maximum rate of Per mRNA degradation) had the highest correlation to the circadian clock state variables along the first three global NPLSR factors, both in the inverse (Figure 3A) and classical (Figure 3B) metamodelling. The same overall covariation patterns could be seen both in the inverse and the classical global metamodelling, since the orthogonal design in the parameters used in the global metamodelling has no dominant covariance directions, and both the inverse and classical NPLSR metamodels will consequently be dominated by the covariance structures of the state variables (but restricted to the parameter design). The input parameter v_{mB} were e.g. negatively correlated with the state variables B_{C}, B_{N}, B_{NP} and B_{CP} in the NPLSR factor space, which was not surprising since these state variables represent the dynamics of the phosphorylated and nonphosphorylated protein Bmal1 concentrations in the cytosol and nucleus 24. Similarly, v_{mP} was negatively correlated with the state variables M_{P} (dynamics of Per mRNA) and P_{CP} (dynamics of phosphorylated Per protein concentration in the cytosol), while v_{mC} was negatively correlated with M_{C} (dynamics of Cry mRNA) and PC_{CP} (dynamics of phosphorylated PerCry complex in the cytosol). These patterns were all in concordance with our intuition of the mammalian circadian clock model.
Prediction results from the global inverse metamodelling
The test set prediction results from the inverse metamodelling shown in Figure 4A, indicated that the input parameters v_{mB}, v_{mC}, v_{mP} and k_{5} (rate constant for entry of the Bmal1 protein into the nucleus) were predicted with reasonably high accuracy (correlation coefficient (R^{2})values higher than 0.8) from the circadian clock state trajectories, indicating that the circadian clock model was highly sensitive to changes in these input parameters and that the relationship between these parameters and the model output was quite linear. For the input parameters v_{dPCN}, v_{dIN}, k_{1}, k_{3} and k_{7}, the prediction error was high using global NPLSR metamodelling.
Figure 4. Test set prediction results from the global NPLSR metamodelling of the mammalian circadian clock model. A) Results from the test set validation of the inverse metamodelling. Correlation coefficient (R^{2})values from the global NPLSR test set prediction of the parameters from the state variable trajectories are shown, using 19 factors in the global NPLSR model. B) Results from the test set validation of the classical metamodelling. R^{2}values from the global NPLSR test set prediction of the state variable trajectories from the parameters are shown, using 8 factors in the global NPLSR model.
Prediction results from the global classical metamodelling
The results from the test set prediction of the state variable trajectories from the input parameters in the classical metamodelling shown in Figure 4B, indicated that the temporal behaviour of the following state variables could be predicted with high accuracy from the input parameters using global NPLSR: B_{N}, M_{C}, M_{B}, B_{C}, B_{CP} and B_{NP}, while the prediction error was especially high for the state variables P_{C}, PC_{C}, C_{C}, PC_{N}, PC_{NP} and I_{N}.
Analogous to the results from the inverse metamodelling described above, the matrix plot of the global NPLSRestimated sensitivities (estimated as products between the X and Y loadings) of the model output state variables to the nine varied parameters in Figure 5 indicated that the circadian clock model was only sensitive to the input parameters v_{mB}, v_{mC}, v_{mP} and k_{5}. However, the predictive ability obtained with the global metamodelling was not adequate, and important patterns of variation were therefore left undescribed. Hence, the parameterstate variable map of the circadian clock model was relatively complex and nonlinear. Since the analysed model was deterministic, we assumed that a higher number of state trajectories would be predicted with high accuracy from the input parameters using a nonlinear metamodel. We therefore hypothesised that hierarchical clusterbased metamodelling would reveal more details of the inputoutput relationship of the mammalian circadian clock model through regional metamodelling.
Figure 5. Mammalian circadian clock model sensitivities estimated from the global classical NPLSR metamodel. Model sensitivities to variations in the nine varied input parameters calculated as the products between the second mode Xfactors (Xloadings) and the transpose of the second mode Yfactors (Yloadings) from the global classical NPLSR metamodel.
Separately analysed output space regions in the hierarchical clusterbased metamodelling
To facilitate comparison, it was decided to use the same grouping (clustering) of the observations in both classical and inverse metamodelling. The state variable NPLSR Xfactors from the inverse metamodelling were more directly related to the state variable behaviour than the Yfactors representing the state variables in the classical metamodelling due to the asymmetric nature of the NPLSR models (defined primarily based on the Xscores, not the Yscores). The inverse metamodelling was therefore carried out first, i.e. the clustering of the 8135 calibration set observations was carried out on the inverse metamodelling Xfactors obtained from the output state trajectories (T_{Output,A,Inverse}, see Figure 2), thereby ensuring that the clusters represented different model behaviours. The same clusters were then also used in the classical Nway HCPLSR metamodelling. This was chosen due to that clustering on the Xfactors or the predicted Yfactors () in the classical metamodelling (as would be a more traditional procedure) would both make the clustering more related to the designed parameter combinations instead of the state variable behaviour, since the predicted Yfactors were here predicted as linear combinations of the Xfactors that are related to the parameter combinations (see Additional file 1, equation S12c). Predicted Yfactors would have to be used in the clustering instead of the Yfactors directly calculated from the state variable data in the classical metamodelling, since otherwise the classification of new observations (for which state variable data are not available) would not be possible on variables equivalent to those used to cluster the calibration set observations.
Based on an assessment of the ability to constrain parameters from the state trajectories using from 1 to 20 clusters (Figure 6), using six clusters was considered optimal in order to balance between predictive ability and interpretational complexity. As seen from Figure 6, some of the parameters and state variables could be predicted even more accurately using a higher number of clusters in the Nway HCPLSR, but that would lead to a more complex model that would be more difficult to interpret in a sensitivity analysis. Keeping the number of clusters as low as possible also helps avoiding overfitting of the data. We therefore assumed that the most important inputoutput map characteristics could be revealed using six regional NPLSR models in the Nway HCPLSR.
Figure 6. Optimalisation of the number of clusters in hierarchical metamodelling of the mammalian circadian clock model. A) Results from inverse hierarchical metamodelling using from 1–20 clusters in the Nway HCPLSR. Left: Mean parameter prediction correlation coefficient (R^{2})values within the calibration set, over the nine varied circadian clock model input parameters vs. the number of clusters used in the Nway HCPLSR metamodelling. The calibration set observations were here treated as "new observations" (see Figure 2), and classified in the prediction stage. Using six clusters was considered optimal. Right: Parameter prediction R^{2}values within the calibration set for the nine different circadian clock model input parameters vs. the number of clusters used in the Nway HCPLSR metamodelling. B) Results from classical hierarchical metamodelling using from 1–20 clusters in the Nway HCPLSR. Left: Mean state variable prediction R^{2}values within the calibration set, over the 16 circadian clock model state variables vs. the number of clusters used in the Nway HCPLSR metamodelling. The calibration set observations were here treated as "new observations", and classified in the prediction stage. Right: State variable prediction R^{2}values within the calibration set for the 16 circadian clock state variables vs. the number of clusters used in the Nway HCPLSR metamodelling. Using six clusters was considered optimal.
The clustering of the calibration set observations used in the final Nway HCPLSR metamodelling is illustrated in Figure 7 both in the NPLSR factor spaces from the inverse (Figure 7A) and classical (Figure 7B) metamodelling and in the original state variable trajectory space (Figure 7C). Figure 7B illustrates that the NPLSR Yfactors from the classical metamodelling were (as expected) highly related to the designed parameter combinations, and hence did not give as good representation of the state variable behaviour as the Xfactors from the inverse metamodelling. As described above, the clustering was therefore based on the latter both in the classical and the inverse metamodelling. Figure 7C confirmed that the six clusters represented different types of dynamic behaviour for the mammalian circadian clock model. For example, Cluster 1 was characterised by e.g. an especially large spread in the values of the state variable C_{C}, while Cluster 2 was characterised by high values of several of the circadian clock state variables (especially M_{P}, P_{C}, P_{CP}, PC_{C}, PC_{N}, PC_{NP} and I_{N}). The parameter ranges for the clusters are given in Table 3, and showed that the clustering of the observations had a close relation to the values of the parameters v_{mB}, v_{mC} and v_{mP}, which also spanned the first three global NPLSR factors both in the inverse and classical metamodelling.
Figure 7. Clustering results used in the Nway HCPLSR metamodelling with six clusters. A) Plot of the factors (Fac 1Fac 3) from the global inverse metamodelling (=T_{Output,A,Inverse}). The observations are coloured according to cluster memberships. Cluster1=blue, cluster2=red, cluster3=yellow, cluster4=green, cluster5=magenta, cluster6=cyan. X is the 3way state variable trajectory array, while Y is the parameters. The clustering was done on the factors explaining a significant amount of the variation in the state variable space, that is the 19 first factors. B) Plot of the predicted Yscores (see Additional file 1, eq. S12c) from the global classical metamodelling, colour coded according to the cluster memberships of the observations found using T_{Output,A,Inverse}. The classification of the test set observations to be predicted in the classical metamodelling was based on , predicted from using second order polynomial OLS regression. This OLS prediction model was calibrated in the calibration step of the classical metamodelling, based on the factors from the inverse metamodelling (=, plotted in panel A) and calibration set (plotted here). C) Circadian clock state trajectories for the observations belonging to each cluster, coloured according to cluster memberships from the inverse Nway HCPLSR. Cluster1 = blue, cluster2 = red, cluster3 = yellow, cluster4 = green, cluster5 = magenta, cluster6 = cyan. All state variables are given in nM units.
Table 3. Parameter ranges and mean values for the observations in the six Nway HCPLSR clusters
Additional inputoutput map characteristics revealed by the regional classical and inverse metamodelling
Prediction results from the hierarchical inverse metamodelling
The test set prediction results from the hierarchical inverse metamodelling shown in Figure 8A indicated that the two input parameters k_{1} (rate constant for entry of the PerCry complex into the nucleus) and k_{3} (rate constant for the formation of the PerCry complex) were predicted with considerably higher accuracy in the hierarchical metamodelling compared to the global metamodelling. Figure 6 indicated that increasing the number of clusters in the Nway HCPLSR had a large effect on the prediction accuracy for these two parameters; R^{2}values higher than 0.8 could be achieved using 20 clusters. However, the increase in prediction accuracy obtained also using only six clusters indicated that the circadian clock model was sensitive to these two parameters, in contrast to what the global metamodelling indicated. Hence, the hierarchical metamodelling could provide additional insights into the inputoutput map of the analysed model.
Figure 8. Prediction results from the hierarchical inverse and classical metamodelling. A) R^{2}values from the hierarchical NPLSR test set prediction of the parameters from the state variable time series using six regional regression models, using 18, 19, 19, 18, 19 and 17 NPLSR factors, respectively. The clustering was done on the global factors, using 19 factors. B) R^{2}values from the hierarchical NPLSR test set prediction of the state variable trajectories from the parameters using six regional regression models, all using 9 NPLSR factors. The same clusters as in the inverse metamodelling were used.
The three parameters v_{dPCN} (maximum rate of degradation of nuclear phosphorylated PerCry complex), v_{dIN} (maximum rate of degradation of nuclear PerCryClockBmal1 complex) and k_{7} (rate constant for the formation of the inactive PerCryClockBmal1 complex) were predicted with low accuracy also in the inverse hierarchical metamodelling. This indicated that either the circadian clock model was relatively insensitive to variations in these input parameters, or our metamodelling was not able to describe the complex relationships between these parameters and the model outputs. This has been assessed in more detail below.
Prediction results from the hierarchical classical metamodelling
Several of the circadian clock state variable trajectories could be predicted with considerably higher accuracy in classical metamodelling using Nway HCPLSR compared to the global NPLSR (Figure 8B). Only the state variables PC_{C} (concentration of the PerCry protein complex in the cytosol), PC_{N} (concentration of the PerCry protein complex in the nucleus) and I_{N} (concentration of the inactive complex between PerCry and ClockBmal1 in the nucleus) were predicted with low accuracy (R^{2}values below 0.4), indicating that the metamodelling with Nway HCPLSR was able to capture the main features of the inputoutput mappings for most of the 16 circadian clock state variables.
Figure 7 indicated that the data for the three above mentioned state variables contained extreme outliers (especially in Cluster 1 and 2), which may have caused the low prediction accuracy for these state variables. Both for the calibration set and the test set, being an outlier seemed to be closely associated to having the lowest level of the parameter v_{mP}, but this was not itself sufficient to generate extreme values of these three state variables.
Detailed interpretation of the revealed model sensitivity patterns
The parameter and state variable prediction results within each of the regional NPLSR metamodels, shown in Figure 9 and Figure 10, indicated clear regional differences in the state variable space with regard to the prediction accuracy for the different parameters and state trajectories. This may be used to identify the parameters that are especially important for certain types of model behaviour. We recently showed that regional differences in model sensitivity to the input parameters also represent complex interaction patterns between the parameters 9. In order to illustrate the usefulness of the methodology in providing biological insight, we give below some examples of detailed interpretations of the sensitivity patterns illustrated in Figures 9, 10 and 11.
Figure 9. Prediction results from hierarchical inverse metamodelling within each regional regression model in the Nway HCPLSR. The R^{2}values from test set prediction of the parameters from the state variables are shown for regional model 1–6, corresponding to the clusters used in the Nway HCPLSR. The regional models use 18, 19, 19, 18, 19 and 17 NPLSR factors, respectively. The clustering was done on the global factors, using 19 factors.
Figure 10. Prediction results from hierarchical classical metamodelling within each regional regression model in the Nway HCPLSR. The R^{2}values from test set prediction of the state variable trajectories from the parameters are shown for regional model 1–6, corresponding to the clusters used in the Nway HCPLSR. All regional models use 9 NPLSR factors. The same clusters as in the inverse metamodelling were used.
Figure 11. Mammalian circadian clock model sensitivities estimated from each of the regional classical NPLSR metamodels. Model sensitivities to variations in the nine varied input parameters calculated as the products between the second mode Xfactors and the transpose of the second mode Yfactors from regional NPLSR model 1–6 in the Nway HCPLSR metamodelling.
As shown in Figure 10, the temporal behaviours of the state variables PC_{C} and PC_{N} were predicted with higher accuracy by all regional metamodels except regional model 1 and 2 (corresponding to clusters containing outliers for these state variables), compared to the global NPLSR. However, the state variable I_{N} was predicted with very low accuracy in all the regional metamodels, and the parameters v_{dPCN}, v_{dIN} and k_{7} could not be well predicted in any of the regional inverse metamodels (Figure 9). Two of the parameters that were predicted with low accuracy, v_{dIN} and k_{7}, appeared in the differential equation corresponding to the state variable I_{N}. Hence, the low prediction accuracy was probably due to an insufficiently described mapping between I_{N} and these parameters in the Nway HCPLSR.
In order to reveal the sensitivity patterns for the state variable I_{N}, a separate sensitivity analysis of the relationship between I_{N} and the circadian clock input parameters was carried out using 2way second order polynomial HCPLSR analogous to the analysis presented in 8, but with the parameter ranges given in Table 1. This showed that in order to be wellpredicted, the state variable I_{N} had to be logarithmised prior to the analysis. This might explain why this state variable trajectory could not be well described together with the other state variables in the Nway HCPLSR. The results are given in Additional file 1: Section S3.1, and indicated that the parameters having the largest effects on the I_{N} state trajectory were v_{mB}, v_{mC}, v_{mP}, v_{dIN}, k_{1} and k_{7}. Several interactions between these input parameters were also identified.
The parameter k_{7} (rate constant for the formation of the inactive PerCryClockBmal1 complex) was also involved in the differential equations representing the dynamics of the concentration of nonphosphorylated Bmal1 protein in the nucleus (state variable B_{N}) 24 and the concentration of the nonphosphorylated PerCry protein complex in the nucleus (state variable PC_{N}), in addition to I_{N}. PC_{N} was not well described by the classical metamodelling, but B_{N} was predicted with high accuracy from the parameters (Figure 8B and Figure 10). Hence, the low prediction accuracy for k_{7} indicated that the state variable B_{N} was relatively insensitive to the parameter k_{7} according to our analysis (within the analysed parameter range), even though the differential equation for this state variable involved k_{7}. This was confirmed by the plot of the model sensitivities estimated from the regional metamodels shown in Figure 11 (only in regional model 3 a sensitivity was identified, but this was also low), as well as the separate sensitivity analysis for the state variable B_{N} described in Additional file 1: Section S3.2 (carried out in the same way as for I_{N}). This illustrates how a combination of a classical and inverse metamodelling can provide more confident conclusions about model behaviour and sensitivity patterns.
The third parameter that could not be constrained from the state variable data was v_{dPCN} (maximum rate of degradation of nuclear phosphorylated PerCry complex), which was only involved in the differential equation describing the dynamics of the concentration of the phosphorylated PerCry complex in the nucleus (state variable PC_{NP}). This state variable was predicted with an R^{2}value of approximately 0.7 in the classical metamodelling, which is not particularly low. Thus our results indicated a low sensitivity of PC_{NP} to the rate of degradation of the corresponding protein. This result was confirmed by the results shown in Figure 11 and the separate sensitivity analysis for the state variable PC_{NP} described in Additional file 1: Section S3.3. This was not straightforward to explain, and calls for a more comprehensive analysis of the relationship between the state variable PC_{NP} and the input parameter v_{dPCN}. Possible explanations might be that our analysis did not cover the relevant range for this parameter, causing the model sensitivity to this parameter not to be detected, or that its inputoutput relationship is very complex.
As seen from Figure 11, the parameter v_{dPCN} seemed to have negative impact on the state variable P_{C} (concentration of nonphosphorylated Per protein in the cytosol) in regional NPLSR model 2, even though neither this parameter nor the state variable PC_{NP} (concentration of the phosphorylated PerCry complex in the nucleus, for which this parameter was involved in the differential equation), appeared in the differential equation representing P_{C}. Both P_{C} and PC_{NP} were related to the Per protein, though. In this region of the state variable space, the effect of v_{dPCN} on P_{C} was either more pronounced, or the relationship between these variables was less complex and therefore more visible in the analysis. As seen from Figure 7C, Cluster 2 was characterised by high values of several of the circadian clock state variables (especially M_{P}, P_{C}, P_{CP}, PC_{C}, PC_{N}, PC_{NP} and I_{N}).
The input parameter k_{3} (rate constant for the formation of the PerCry complex) had a large negative effect on C_{CP} that was visible only in Cluster 5 (Figure 11). This could be explained by the fact that k_{3} was involved in the equation for C_{C}, which was part of the differential equation for C_{CP}. Furthermore, v_{dIN} (maximum rate of degradation of nuclear PerCryClockBmal1 complex) seemed to have a slight negative effect on PC_{NP} in Cluster 1. This result was not easily deducible from the equation structure of the circadian clock model, and could not be detected in the global metamodelling. Cluster 1 was characterised by e.g. an especially large spread in the values of the state variable C_{C}.
The parameter k_{7} (rate constant for the formation of the inactive PerCryClockBmal1 complex) seemed to have a positive effect on the state variable C_{C} (nonphosphorylated Cry protein in the cytosol) in regional metamodel 1. However, since the inactive PerCryClockBmal1 complex represses the Per and Cry genes in the nucleus, a positive effect of k_{7} on C_{C} seemed unlikely. An additional sensitivity analysis was therefore carried out by adding eight simulations to the data set, keeping all parameters except k_{7} constant at the mean values found for Cluster 1. The results are shown in Additional file 1: Section S3.4, and indicated that increasing k_{7} resulted in a very small decrease in C_{C} and C_{CP}, had a clear positive effect on I_{N} (as expected from the differential equation for I_{N}), and a negative effect on PC_{N} and PC_{NP}. In order to try to explain the positive effect of k_{7} on C_{C} seen in Cluster 1, a separate 2way PLSRbased sensitivity analysis was therefore also carried out for the state variable C_{C} in Cluster 1 (see Additional file 1: Section S3.4). The effects of v_{mB}, v_{mC}, v_{mP} and k_{3} on C_{C} indicated for Cluster 1 were also manifested in the 2way PLSR analysis, but a positive main effect of k_{7} was not confirmed. However, several interaction terms involving k_{7} seemed to have effects on C_{C}, such as the interaction between v_{mP} and k_{7} (which had a positive effect). Since crossterms between the input parameters were not included in the Nway PLSR analysis, confounding of these interaction effects with the main effect of k_{7} may explain the positive sensitivity to k_{7} indicated by the Nway PLSR. The indication of a positive effect of k_{7} on C_{C} could also have been caused by other sources of uncertainties in the NPLSR analysis.
Analogous to the increased prediction accuracy obtained for the two parameters k_{1} and k_{3}, model sensitivity to these parameters could be revealed in several local regions of the state variable space (Figure 11), even though this was not evident from the global metamodelling (Figure 5). According to Figure 11, the model seemed to be insensitive to these parameters in the region represented by Cluster 6. However, the inverse metamodelling indicated that also in Cluster 6 these parameters could be predicted with a much higher accuracy than by the global NPLSR metamodel, indicating model sensitivity to these parameters also in this region of the state variable space. This illustrates the importance of carrying out both classical and inverse metamodelling to gain a more detailed insight into the sensitivity patterns of a complex model.
Discussion
The main traditional approach to analysis of inputoutput relationships has been to use aggregated outputs derived from the state trajectories, representing the dynamics of the state variables. For instance, in their original publication of the mammalian circadian clock model 24, the authors employed a sensitivity analysis of only one aggregated output – the circadian clock period– a very important trait, but too aggregated to give sufficient overview of the entire model behaviour. Multivariate metamodelling has, at least in principle, the capacity to reveal the relationships between all input parameters and all model outputs simultaneously. This has here been illustrated for the nine input parameters assumed to be most interesting for the mammalian circadian clock and the 16 state variables of the model, where the generated Nway metamodels allowed flexible quantitative inputoutput regressions yielding informative graphical insight into the main underlying inputoutput map characteristics. In our example N=3, but the analysis can be extended to more than three modes.
Our analysis confirmed the main conclusions from the original classical sensitivity analysis of the circadian clock period carried out by Leloup and Goldbeter 24, namely that the mammalian circadian clock model was highly sensitive to parameters related to synthesis and degradation of the protein Bmal1 and its mRNA. However, our analysis improved the overview of the inputoutput relationships on which the circadian clock period is based. The main patterns found in our previous analysis of the same model, using conventional (2way) PLSR 7, were also confirmed in the global NPLSR metamodelling. However, the present clusterbased Nway analysis revealed additional aspects of the inputoutput relationships, for example the negative effect of increasing v_{dPCN} on the state variable P_{C} in the part of the model output space defined by Cluster 2. Hence, the Nway HCPLSRbased metamodelling worked as intended in this illustration example. In the example used here, the focus was on oscillating state variables. Other types of behaviour of nonlinear dynamic systems such as multiple steady states could potentially lead to additional nonlinearities in the inputoutput mapping, probably increasing the gain of using a clusterbased approach compared to a global analysis.
An alternative to using NPLSR would be to unfold the state variable trajectory array by concatenating all the trajectory data into one 2way matrix and use 2way HCPLSR to analyse the data. However, the information about related trajectories for different state variables would then be left unused, leading e.g. to loss of the opportunity to visualise covariance structures. In order to evaluate the gain of keeping the 3way structure in the data, the same analysis was carried out using 2way HCPLSR on unfolded state trajectory data as well as on aggregated outputs calculated from the state trajectories. The clustering results from these analyses (shown in Additional file 1: Section S4) indicated that the increased resolution achieved using Nway HCPLSR could not be achieved when the state trajectory array was unfolded into two dimensions or transformed into aggregated outputs, due to a less reasonable clustering of the observations. Hence, using the NPLSR factors seems to enable identification of more relevant clusters in which to carry out regional metamodelling. The global parameter prediction accuracies obtained were comparable to those obtained with the global inverse Nway PLSR. However, in the hierarchical metamodelling, neither the unfolded state trajectories nor the aggregated outputs could predict the circadian clock parameters with as high accuracy (on average) with 2way HCPLSR as with the Nway HCPLSR.
In contrast to the results obtained using Nway HCPLSR, our previously published metamodelling of each of the circadian clock state variables separately 8 showed that all circadian clock state variables could be predicted with high accuracy from the parameters (within the parameter space analysed in that publication, which was slightly different in the present analysis). However, there is a clear gain of using a common metamodel for all state variables in terms of obtaining overview of the inputoutput relationships as well as covariance patterns between the state variables. Nevertheless, as demonstrated here, a separate analysis of the inputoutput relationships for insufficiently described state variables should accompany this type of analysis in order to gain a more complete insight into the inputoutput relationships. This was illustrated in our application example for e.g. the state variable I_{N}, which had to be logarithmised and analysed separately in order for its relationships to the input parameters to be adequately described.
In NPLSR, relations between model outputs and input parameters are easily interpretable through plots of the loadings, in contrast to results produced e.g. by genetic algorithms which are often more difficult to interpret (although the latter can also handle multiple outputs). Moreover, due to the decomposition of the data into estimated latent variables, NPLSR can provide efficient dimension reduction possibilities in highdimensional systems. However, since the NPLSR models presented here used a high number of factors to explain the inputoutput covariance, the dimension reduction possibilities of NPLSR may not have been fully utilised. This was caused by the differences in the timetopeak for the different state variables, which the NPLSR uses many factors to describe. Hence, a more careful preprocessing of the data would probably result in NPLSR models using fewer factors, perhaps through shift correction as described by Westad and Martens 29. Work is in progress on testing whether this allows the NPLSR models to use fewer factors while still keeping the same predictive ability. However, even when using relatively many factors, the NPLSR models still enable great dimension reduction possibilities.
In regional regression modelling, there is a risk that the variance in some input or output variables is highly reduced in the regional models compared to the entire data set. Hence, the robustness of the predictions may decrease and the regression coefficients as well as the R^{2}values may be misleading for these variables. However, as shown in Table 4, the criterion used for defining local clusters (based on fuzzy clustering searching for spherical clusters) ensured that variances in the nine input parameters did not differ much between the clusters, and were about the same in the calibration set and the test set; the only parameters for which the variance decreased in the clusters compared to in the original datasets were v_{mB}, v_{mC} and v_{mP}. This was not surprising, since these three parameters had the largest impacts on the first three NPLSR factors of the global NPLSR models, and hence the clustering using the NPLSR factors was mostly based on these three parameters. However, since these three parameters were also predicted with high R^{2}values in the global inverse metamodel, high R^{2}values were not artefacts of low cluster variance in this study. This is primarily a problem occurring when using small test sets, and here the test set was of approximately the same size as the calibration set (more than 8000 simulations in each).
Table 4. Parameter value variances in the calibration and test set and the six Nway HCPLSR clusters
Since the selection of data subsets in Nway HCPLSR is based on fuzzy clustering, no prior knowledge about the structure of the data is needed. Hence, this method automatically detects regions of different model behaviour. The number of clusters to use in the hierarchical metamodelling was here specified in advance, based on exploration of the predictive ability of metamodels of varying complexity. However, using instead an optimisation algorithm to find the optimal number of clusters would make semiautomatic exploration of inputoutput relationships of computational models possible.
Conclusions
The Nway HCPLSR method presented here provides the opportunity to improve both prediction accuracy and analytical insight by identification of regional subsets of the data within which the relationships between input parameters and model outputs are more transparent than in a global regression analysis. This was exemplified by the model sensitivity to the two parameters k_{1} and k_{3} that was detected in the regional analysis but not in the global metamodelling.
Our results also indicate that analysing all state trajectories simultaneously using Nway methodology is more effective for identification of different behavioural domains for a system and regions where inputoutput mappings can be predicted with higher accuracy, than unfolding the state trajectory array into two dimensions or transforming state trajectories into aggregated outputs prior to the analysis. This is due to a more reasonable clustering of the observations. Moreover, application of the method for metamodelling in both the classical and the inverse direction represents a more comprehensive approach to the analysis of complex relationships between the model inputs and the temporal behaviour of the outputs, and allows more confident conclusions. Our results showed that the mammalian circadian clock model was highly sensitive to parameters related to the protein Bmal1, as previously found by Leloup and Golbeter 24, but in addition our approach revealed also more complex sensitivity patterns of the model.
Based on these results, we believe that the presented Nway HCPLSR method will be instrumental for effective construction and validation of complex models. Due to its efficient handling of Nway data structures, demonstrated here in the analysis of the temporal model behaviour, we hypothesise that Nway HCPLSR will be an especially useful tool for multivariate metamodelling of spatiotemporal models, a large future application area.
Methods
Generation of the in silico data set
A model of the mammalian circadian clock developed by Leloup and Goldbeter 24 was used to estimate the circadian oscillations of cellular activity in conditions of continuous darkness. The model consists of 16 coupled differential equations with state variables describing the dynamics of three key genes (Bmal1, Per and Cry), including their mRNA level, nonphosphorylated and phosporylated proteins as well as protein complexes. The model contains intertwined positive and negative feedback loops driving the circadian oscillations. A curated CellML implementation 303132 of the model was downloaded from http://models.cellml.org webcite. The integration was carried out in SUNDIALS 2.3 33 using a wrapper for PySundials (http://pysundials.sourceforge.net webcite) in the same way as in 8.
The parameter combinations in the calibration set were generated using an Optimised Multilevel Binary Replacement (OMBR) Design 26 of 9 variables with 8 equally spaced levels each (Table 1). This resulted in 8192 simulations with the circadian clock model. The ranges of each parameter are given in Table 1, and were found by an initial rangefinding experiment published in 7. A full factorial design of 9 variables Â´a 8 levels would require 8^{9}>134 million runs. Hence, the OMBR design was chosen, in order to explore the effects of as many parameters and parameter values as possible. In the OMBR design method, the values of a original parameters are replaced by multibit binary representations, and the binary factor bits are then combined in a fractional factorial design according to a chosen confounding pattern. Thereby drastically reduced experimental designs are obtained, yet maintaining reasonable coverage of the parameter space. The OMBR design has been compared to central composite designs and semirandom designs, and has been shown to give good predictive ability 7.
For each parameter combination the resulting differential equation model was solved from the original initial conditions (see 24) until convergence to a stable limit cycle. The test for convergence was done as follows: First the system was solved with rootfinding for variable M_{B} to extract two complete cycles. Convergence of the cycle period was tested by requiring that the period difference relative to the mean of the periods for the two cycles should be less than 0.001. Convergence to synchronous oscillations was tested by (i) interpolating all 16 state variables at 200 equally spaced time points for each cycle, (ii) linearly transforming each state variable such that the minimum and maximum values of each cycle was 0 and 1, respectively, and (iii) requiring that the sum of absolute difference between the two cycles across all the 3200 interpolated time points should be less than 0.0001.
The data set resulting from the simulations of the mammalian circadian clock consisted of sampled values for one period (here 200 timesteps) of 16 state variables (corresponding to the 16 differential equations in the model), for the set of 8192 combinations of values for the nine varied input parameters. This gave a 3way array of 8192x16x200 data points. A description of the mammalian circadian clock model state variables is given in Table 2. Due to the wide parameter ranges used, 57 (0.7%) of the 8192 simulations did not result in a stable limit cycle. These simulations were omitted in the following analysis. The parameters and state variables were meancentred and standardised globally in the calibration set prior to the metamodelling.
A separate test set based on 8192 parameter combinations found by random Monte Carlo sampling 2728 within the same parameter levels as used in the calibration set was used. This resulted in 8125 converging test set simulations. In the test set, the variables were preprocessed in the same way as for the calibration set, using the global calibration set means and standard deviations.
Nway HCPLSR
Our previously published method for nonlinear metamodelling, HCPLSR 8, has here been extended to enable use of Nway data by using NPLSR 1934, giving Nway HCPLSR. HCPLSR 8 includes regional analysis using subsets of the original data set generated by fuzzy Cmeans (FCM) clustering 35363738 (see Additional file 1: Section S1).
In Nway HCPLSR, a global NPLSR model comprising all observations is first generated, and FCM clustering (using Euclidian distance) on a chosen number of first mode (see Figure 1) factors (scores) of the global NPLSR Xfactors (or alternatively, Yfactors) is used to separate the observations into groups within which regional NPLSR models are made. The FCM fuzzifier parameter was here chosen equal to 2. To prevent possibly unstable regression models due to a small number of calibration observations, we postprocessed the clustering with the requirement that each cluster should contain at least ten observations. Smaller clusters (and their associated observations) were regarded as outliers, and not included in the subsequent regional regression analysis (but were still included in the global regression analysis).
The optimal number of factors to use in the global and regional NPLSR models, respectively, was here chosen according to the minimum crossvalidated mean squared error (MSE) of prediction of the response array Y, with the extra requirement that each included component accounts for at least 1% of the total crossvalidated Yvariance, in the same way as in 8. Here, 10fold crossvalidation where the ten segments were randomly chosen was used both in the global and the regional NPLS regressions. These ten segments were successively kept out of the NPLSR calibration, and predicted using an NPLSR model based on the remaining observations. NPLSR is described in Additional file 1: Section S1, and the MATLAB® Nway Toolbox v.3.11 34 was used here.
In our Nway HCPLSR implementation, Linear Discriminant Analysis (LDA) 39, Quadratic Discriminant Analysis (QDA) 40 (the MATLAB® function "classify" from the Statistics Toolbox™ v7.6) or Naive Bayes classification (the MATLAB® function "NaiveBayes" from the Statistics Toolbox™ v7.6) can be used for classification of new observations to be predicted (based on predicted NPLSR factors for new observations). The implementation contains two options for prediction: 1) Prediction using the local regression model calibrated in the most probable cluster, and 2) Prediction using a weighted sum of the local regression models, using the estimated cluster membership values as weights. The Nway HCPLSR was carried out in MATLAB® 41 Version 7.13 (R2011b), using inhouse code which can be obtained from the authors upon request.
Classical and inverse metamodelling of the mammalian circadian clock model
The 3way array of state variable data (observations x outputs x time points) was first used as regressor in a test set validated Nway HCPLSR using the parameter combinations as responsevariables (inverse metamodelling). To complement this analysis, analogous classical metamodelling with Nway HCPLSR was carried out, where the parameter combinations were used as regressor variables to predict the 3way state trajectory array. The calibration and test sets calculated with the mammalian circadian clock model described above were used. The methodology is illustrated in Figure 2. The most probable local regression model was chosen for prediction, since this gave higher prediction accuracy than using a weighted sum of the local models. No clusters containing less than ten observations were identified, so all calibration set observations were included in the regional regression analysis.
In the inverse metamodelling, the clustering (and classification in the prediction stage) of the observations was based on the global predicted first mode Xfactors, the Xscores ( in Additional file 1: Section S1), representing the state variable trajectory data. All factors found to explain a significant amount of the crossvalidated calibration set Yvariation were included. The number of clusters to use was chosen by evaluating the ability to constrain (i.e. correctly predict) the input parameters from the model output in the inverse metamodelling, using from 1 to 20 clusters within the calibration set. The calibration set observations were treated as "new observations" (see Figure 2) here, that is, the same procedure as for the test set observations was used in the prediction stage. Using crossvalidation to find the optimal number of clusters, as was done to find the optimal number of factors for the NPLSR models, would be too time consuming given the size of the datasets used here. The mean correlation coefficient (R^{2})values for the prediction of the parameters were used as selection criterion, and using 6 clusters was considered optimal in order to balance metamodel complexity against predictive ability.
The same clusters were also used for the classical metamodelling, since the Xfactors in the inverse metamodelling are more directly related to the model output state variables than the Yfactors from the classical metamodelling (a PLSR model is asymmetric). However, for the classification in the prediction stage of the classical metamodelling to be relevant, was predicted from the first mode predicted Yfactors, the Yscores (calculated using equation S12c in Additional file 1), since in this case the Yfactors represent the state variable trajectories. This was done using second order polynomial Ordinary Least Squares (OLS) regression (including square terms and crossterms), calibrated using calibration set from the classical metamodelling (* C_{A} (see Additional file 1, eq. S12c)) as regressors and calibration set T_{XA,NWay} from the inverse metamodelling () as response variables. Hence, , where is calibrated using polynomial OLS regression, and includes the calculation of from . The test set observations were then classified based on , according to the clusters found in the inverse metamodelling. See Additional file 1, sections S1.6 and S1.7 for a more comprehensive description of this methodology, including all predicting equations for test set observations.
QDA was chosen instead of LDA and Naive Bayes classification in this study, since LDA assumes the covariance matrix to be equal for all classes and Naive Bayes classification assumes that the presence of a particular feature of a class is unrelated to the presence of any other feature. In QDA, these assumptions are not made.
In the classical metamodelling, the sensitivity of each state variable to variations in the different parameters was estimated as the product of the second mode Xfactors and the transpose of the second mode Yfactors (also called X and Y loadings), using the optimal number of NPLSR factors for the corresponding NPLSR model. The NPLSR loadings calculated here were not orthogonal. Work is currently in progress to assess the effects of using nonorthogonal loadings to estimate the model sensitivities.
Additional sensitivity analyses
Some of the inputoutput relationships were not well described by the Nway HCPLSR. Additional separate sensitivity analyses were therefore carried out for some of the state variables using 2way second order polynomial HCPLSR with the parameters and their crossterms and second order terms as regressors and the state trajectories as response variables, analogous to the analysis presented in 8. The regressors were meancentred and standardised prior to the HCPLSR, while the state trajectories were only centred. Some of the state trajectories were logarithmised prior to the regression analysis.
The same clusters as in the Nway HCPLSR described above were used. QDA 40 on predicted PLSR Yscores (see Additional file 1, eq. S7d) were used for classification, and all PLS components (PCs) explaining a significant amount of the crossvalidated Yvariance were included. Both in the global and regional regression analyses the optimal number of PLS components to use was found by 10fold crossvalidation. In the regional regression models, the matrices of crossterms and second order terms were deflated with respect to the variation described by the first order terms (in an OLS regression) in order to better separate the effects of nonlinear terms and first order terms. The HCPLSR was carried out in MATLAB® 41 Version 7.13 (R2011b), using inhouse code that can be obtained from the authors upon request.
Method benchmarking
For comparison, the inverse metamodelling was carried out using 2way HCPLSR where the 3way state variable array was unfolded by concatenating the time series for all state variables, as well as by using aggregated outputs representing the state variable trajectories. The following aggregated outputs were derived from the state trajectories: period of oscillation, bottom, peak, timetobottom and timetopeak for each state variable curve (see Additional file 1: Section S4). This resulted in 65 aggregated outputs. Both parameters, state variables and aggregated outputs were meancentred and standardised prior to the HCPLSR.
The number of PLS components to use in the PLSR models was chosen based on the percent explained crossvalidated Yvariance as described in 8 and in the same way as for the Nway HCPLSR, clustering and classification were done on the global Xscores of all PCs explaining a significant amount of the crossvalidation variance. The same settings as described in 8 were used for the HCPLSR, except that QDA 40 (the MATLAB® function "classify" from the Statistics Toolbox™ v7.6) was used for classification as in the Nway HCPLSR. The same number of clusters as in the Nway HCPLSR (6 clusters) was used also in the 2way HCPLSR analyses.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
KT contributed to conception, wrote the MATLAB® code for the Nway HCPLSR pipeline, performed the data analysis and wrote the paper. UGI participated in debugging of the Nway HCPLSR code and in writing of the paper. ABG performed the computational experiments with the mammalian circadian clock model. SWO participated in writing the paper and HM contributed to conception and writing of the paper. All authors read and approved the final manuscript.
Acknowledgements
This study was supported by the National Program for Research in Functional Genomics in Norway (FUGE) (RCN grant no. NFR151924/S10) and by the Norwegian eScience program (eVITA) (RCN grant no. NFR178901/V30). Rasmus Bro is thanked for providing us with the newest version of The Nway Toolbox for MATLAB.
References

Lillacci G, Khammash M: Parameter Estimation and Model Selection in Computational Biology.
PLoS Comput Biol 2010, 6:e1000696. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, Tarantola S: Global Sensitivity Analysis: The Primer. Chichester: WileyInterscience; 2008.

Saltelli A, Chan K, Scott EM: Sensitivity Analysis. 1st edition. Chichester: Wiley; 2000.

Santner TJ, Williams BJ, Notz W: The design and analysis of computer experiments. New York, USA: Springer; 2003.

Sarkar AX, Sobie EA: Regression Analysis for Constraining Free Parameters in Electrophysiological Models of Cardiac Cells.
PLoS Comput Biol 2010, 6:e1000914. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sobie EA: Parameter sensitivity analysis in electrophysiological models using multivariable regression.
Biophys J 2009, 96:12641274. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tøndel K, Gjuvsland AB, Måge I, Martens H: Screening design for computer experiments: Metamodelling of a deterministic mathematical model of the mammalian circadian clock.
J Chemometr 2010, 24:738747. Publisher Full Text

Tøndel K, Indahl UG, Gjuvsland AB, Vik JO, Hunter P, Omholt SW, Martens H: Hierarchical Clusterbased Partial Least Squares Regression is an efficient tool for metamodelling of nonlinear dynamic models.
BMC Syst Biol 2011, 5:90. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Tøndel K, Vik JO, Martens H, Indahl UG, Smith N, Omholt SW: Hierarchical Multivariate Regressionbased Sensitivity Analysis Reveals Complex Parameter Interaction Patterns in Dynamic Models.

Vik JO, Gjuvsland AB, Li L, Tøndel K, Niederer SA, Smith N, Hunter PJ, Omholt SW: Genotypephenotype map characteristics of an in silico heart cell.

Kleijnen JPC: Design and Analysis of Simulation Experiments. 1st edition. New York, USA: Springer; 2007.

Conti S, O’Hagan A: Bayesian emulation of complex multioutput and dynamic computer models.
J Stat Plan Infer 2010, 140:640651. Publisher Full Text

Campbell K, McKay MD, Williams BJ: Sensitivity analysis when model outputs are functions.
Reliab Eng Syst Safe 2006, 91:14681472. Publisher Full Text

Cacuci DG: Sensitivity and Uncertainty Analysis: Theory v.1: Theory Vol 1. 1st edition. Boca Raton: Chapman and Hall/CRC; 2003.

Cacuci DG, IonescuBujor M, Navon IM: Sensitivity and Uncertainty Analysis: Applications to Largescale Systems Vol 2. 1st edition. Boca Raton: CRC Press; 2005.

Ayyub BM, Klir GJ: Uncertainty modeling and analysis in engineering and the sciences. Boca Raton: Chapman & Hall/CRC; 2006.

Yener B, Acar E, Aguis P, Bennett K, Vandenberg S, Plopper G: Multiway modeling and analysis in stem cell systems biology.
BMC Syst Biol 2008, 2:63. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Conesa A, PratsMontalbán JM, Tarazona S, Nueda MJ, Ferrer A: A multiway approach to data integration in systems biology based on Tucker3 and NPLS.
Chemometr Intell Lab
In Press.

Bro R: Multiway calibration. Multilinear PLS.
J Chemometr 1996, 10:4761. Publisher Full Text

Krishnaiah PR: Multivariate Analysis. New York: Academic Press Inc; 1967.

Wold S, Martens H, Wold H: The multivariate calibration method in chemistry solved by the PLS method. In Lecture notes in Mathematics, Matrix Pencils. Heidelberg: Springer; 1983:286293.

Martens H, Næs T: Multivariate calibration. Chichester, UK: John Wiley and Sons; 1989.

Martens H, Martens M: Multivariate Analysis of Quality: An Introduction. 1st edition. Chichester: Wiley; 2001.

Leloup JC, Goldbeter A: Modeling the mammalian circadian clock: Sensitivity analysis and multiplicity of oscillatory mechanisms.
J Theor Biol 2004, 230:541562. PubMed Abstract  Publisher Full Text

Altinok A, Lévi F, Goldbeter A: Identifying mechanisms of chronotolerance and chronoefficacy for the anticancer drugs 5fluorouracil and oxaliplatin by computational modeling.
Eur J Pharm Sci 2009, 36:2038. PubMed Abstract  Publisher Full Text

Martens H, Måge I, Tøndel K, Isaeva J, Høy M, Sæbø S: Multilevel Binary Replacement (MBR) design for computer experiments in highdimensional nonlinear systems.
J Chemometr 2010, 24:748756. Publisher Full Text

Kalos MH, Whitlock PA: Monte Carlo Methods Volume 1: Basics. 1st edition. New York, USA: Wiley; 1986.

Liu JS: Monte Carlo Strategies in Scientific Computing. New York, USA: Springer; 2008.

Westad F, Martens H: Shift and intensity modeling in spectroscopy–general concept and applications.
Chemometr Intell Lab 1999, 45:361370. Publisher Full Text

Lloyd CM, Halstead MDB, Nielsen PF: CellML: its future, present and past.
Prog Biophys Mol Bio 2004, 85:433450. Publisher Full Text

Cuellar AA, Lloyd CM, Nielsen PF, Bullivant DP, Nickerson DP, Hunter PJ: An Overview of CellML 1.1, a Biological Model Description Language.
SIMULATION 2003, 79:740747. Publisher Full Text

Lloyd CM, Lawson JR, Hunter PJ, Nielsen PF: The CellML Model Repository.
Bioinformatics 2008, 24:21222123. PubMed Abstract  Publisher Full Text

Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, Woodward CS: SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers.
ACM Trans Math Softw 2005, 31:363396. Publisher Full Text

Andersson CA, Bro R: The Nway Toolbox for MATLAB.
Chemometr Intell Lab 2000, 52:14. Publisher Full Text

Bezdek JC: Pattern Recognition with Fuzzy Objective Function Algorithms. Norwell: Kluwer Academic Publishers; 1981.

Berget I, Mevik BH, Næs T: New modifications and applications of fuzzy Cmeans methodology.
Comput Stat Data Anal 2008, 52:24032418. Publisher Full Text

Næs T, Isaksson T: Splitting of calibration data by cluster analysis.
J Chemometr 1991, 5:4965. Publisher Full Text

Næs T, Kubberød E, Sivertsen H: Identifying and interpreting market segments using conjoint analysis.
Food Qual Prefer 2001, 12:133143. Publisher Full Text

McLachlan GJ: Discriminant Analysis and Statistical Pattern Recognition. 1st edition. Hoboken: WileyInterscience; 1992.

Hastie T, Tibshirani R, Friedman JH: The Elements of Statistical Learning. New York, US: Springer; 2003.
Corrected edition.

MATLAB®, v. 7.13. 2011. PubMed Abstract

Martens H, Veflingstad S, Plahte E, Martens M, Bertrand D, Omholt S: The genotypephenotype relationship in multicellular patterngenerating models  the neglected role of pattern descriptors.
BMC Syst Biol 2009, 3:87. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Isaeva J, Sæbø S, Wyller JA, Nhek S, Martens H: Fast and comprehensive fitting of complex mathematical models to massive amounts of empirical data.
Chemometr Intell Lab
In press.

Isaeva J, Martens M, Sæbø S, Wyller JA, Martens H: The modelome of line curvature: many nonlinear models approximated by a single bilinear metamodel with verbal profiling.
Physica D 2012, 241:877889. Publisher Full Text

Tenenhaus M, Vinzi VE, Chatelin YM, Lauro C: PLS path modeling.
Comput Stat Data Anal 2005, 48:159205. Publisher Full Text

Pearson K: On lines and planes of closest fit to systems of points in space.
Philos Mag 1901, 2:559572. Publisher Full Text

Jolliffe IT: Principal Component Analysis. 2nd edition. New York, US: Springer; 2002.

Jolliffe IT: A note on the use of principal components in regression.

Hoerl AE, Kennard RW: Ridge regression: biased estimation for nonorthogonal problems.
Technometrics 1970, 12:5567. Publisher Full Text

Tibshirani R: Regression Shrinkage and Selection via the Lasso.

Zou H, Hastie T: Regularization and variable selection via the Elastic Net.
J Roy Stat Soc B 2005, 67:301320. Publisher Full Text

de Jong S: SIMPLS: an alternative approach to partial least squares regression.
Chemometr Intell Lab 1993, 18:251263. Publisher Full Text

Martens H: Nonlinear multivariate dynamics modelled by PLSR. In Proceedings of the 6th International Conference on Partial Least Squares and Related Methods. Beijing: Publishing House of Electronics Industry; 2009:139144.

Berglund A, Wold S: INLR, implicit nonlinear latent variable regression.
J Chemometr 1997, 11:141156. Publisher Full Text

Gath I, Geva AB: Unsupervised optimal fuzzy clustering.
IEEE Trans Pattern Anal Mach Intell 1989, 11:773780. Publisher Full Text

Frigui H, Krishnapuram R: A robust competitive clustering algorithm with applications in computer vision.
IEEE Trans Pattern Anal Mach Intell 1999, 21:450465. Publisher Full Text

Bro R: PARAFAC. Tutorial and applications.
Chemometr Intell Lab 1997, 38:149171. Publisher Full Text