Abstract
Background
Reverse engineering in systems biology entails inference of gene regulatory networks from observational data. This data typically include gene expression measurements of wild type and mutant cells in response to a given stimulus. It has been shown that when more than one type of experiment is used in the network inference process the accuracy is higher. Therefore the development of generally applicable and effective methodologies that embed multiple sources of information in a single computational framework is a worthwhile objective.
Results
This paper presents a new method for network inference, which uses multiobjective optimisation (MOO) to integrate multiple inference methods and experiments. We illustrate the potential of the methodology by combining ODE and correlationbased network inference procedures as well as time course and gene inactivation experiments. Here we show that our methodology is effective for a wide spectrum of data sets and method integration strategies.
Conclusions
The approach we present in this paper is flexible and can be used in any scenario that benefits from integration of multiple sources of information and modelling procedures in the inference process. Moreover, the application of this method to two case studies representative of bacteria and vertebrate systems has shown potential in identifying key regulators of important biological processes.
Background
In the last ten years the development of functional genomics technologies has provided us with the ability to generate quantitative data representing the molecular state of cells and tissues at a genome level [1,2]. These datasets can be in the form of a time series representing the dynamics of gene expression profiles (e.g. mRNA, proteins and metabolites) in response to a given stimulus, such as an environmental perturbation, the effect of a growth factor or an experimentally induced gene deletion. Despite the relatively large amount of information, predicting underlying regulatory networks from observational data is still not trivial and is a matter of intense research [3].
A number of reverseengineering approaches have been proposed. Some of these are designed to infer networks from a compendium of perturbation experiments while others are able to use time course data to develop dynamical models of gene interaction. Bayesian networks have been among the first to be applied to biological problems [4]. They work by inferring probabilistic relationships between variables, can use either time course or steady state data and allow integration of prior knowledge in the model. Correlationbased methods [5,6] compute correlation coefficients between variables to infer the underlying network topology. Statespace models (SSMs) [7,8], and ODEbased methods [9,10], on the other hand use timecourse data to develop dynamic models of gene regulatory networks (GRN). For an extensive overview of these methodologies see: [11,12].
The general validity of the principal of integrating multiple data sources in the reverseengineering process is exemplified by the observation that the best performing methods utilize some degree of integration between different experiments [13]. For example, the top performing method in the third edition of the "Dialogue for Reverse Engineering Assessments and Methods" (DREAM), developed by Yip et al. [14], was based on a combination of a statistical errormodel and ODE modeling to integrate gene knockout (KO) and timecourse experiments. Interestingly, Yip et al. [14] also noted that a relatively simple differential geneexpression analysis, comparing wildtype and mutant strains, was in itself a very good representation of the underlying gene regulatory network. However, not all KO experiments are likely to be equally informative and identifying a priori the most relevant genes is not a trivial task. Moreover, largescale geneinactivation experiments are not a viable option for many nonmodel species.
Therefore, there is the need to expand the repertoire of available network inference tools by developing more methods that allow integration of multiple data sources and have the flexibility to use a wide range of datasets and information. In order to achieve this objective, we set out to develop a computational framework that has the potential to combine different inference methodologies, multiple datasets, as well as any preexisting biological knowledge. We based this approach on an ODE framework combined to a multiobjective optimization (MOO) procedure for parameter estimation. We named this method "NetworkInference with Multi Objective Optimization" (NIMOO).
Methods
The basic network inference framework: Model Equations and parameter estimation of a single objective optimization procedure
Gene interactions in a regulatory network can be modelled using a set of ordinary differential equations [9,10]. In this implementation we have used a linear ODE model where the interaction between genes is additive. In this context, changes in the expression of a given gene depend on a weighted linear sum of the expression of its regulators:
where, x_{i }represents expression level for gene i, b_{i }represents the effect of the external perturbation on gene i, and, N is the number of genes in the dataset. The parameter matrix w is obtained by minimizing the Squared Error (E^{SQE})
The gene regulatory network (GRN) is then inferred from the optimized parameter matrix w. The matrix element w_{ij} indicates the strength of the interaction between genes i and j (with gene j regulating gene i), and, sign (w_{ij}) indicates whether the effect is excitatory (w_{ij }> 0) or inhibitory (w_{ij }< 0)
In our implementation of single objective optimisation (SOO), minimisation of E^{SQE }was achieved using the trustregion method based on the interiorreflective Newton method [15,16]. In this method the minimisation process involves defining a trust region where the objective function SQE can be approximated with a simpler function q. For successive iterations, function q, in conjunction with the Preconditioned Conjugate Gradient Method [16], is used to find a new trust region where the function SQE is lower. The process is terminated when the change in function value is less than a predetermined tolerance (10^{6}).
Parameter estimation using a multiobjective optimization procedure
Multiobjective optimisation (MOO) is based on minimisation of E^{SQE }in conjunction to additional objective functions, E^{object}, which are built as Euclidean distance between the parameter matrix w and objectives O constructed from additional data and/or existing knowledge:
To implement multiobjective optimization we have used the goal attainment method [17,18]. In this method the problem of simultaneously optimizing multiple functions is reduced to the task of standard minimization. A set of goals [J_{1}, J_{2}, ...,J_{m}] and weights [θ_{1}, θ_{2}, ..., θ_{m}] are assigned to the objective functions F = [F_{1}, F_{2}, ..., F_{m}], where, F_{1 }= E^{SQE}, F_{2 }= E^{Object }etc. Also, a scalar dummy variable γ is introduced so that the aim is to minimize for γ such that
The term θ_{k}γ introduces flexibility in the degree of goals attained. Also, the weight factor θ_{k }can be used to assign relative importance to the objectives: Thus, from Equation 4, θ_{k }= 0 implies hard goal for the corresponding objective function F_{k}. For all results presented in this paper, unless mentioned otherwise, the goal and weight corresponding to the objective SQE were set to 0.
Overall strategy for the development of a MOObased inference method
Figure 1 shows in a schematic format the different procedures that are part of the NIMOO framework and their relationships with the experimental datasets.
Figure 1. Overview of the NIMOO methodology. The figure shows in a schematic format the relationships between type of experiment, methods used to build the O_{ij }objectives and the MOO procedures. Details are given in the Methods section.
The principle behind NIMOO, as detailed in the above sections, is to infer the gene regulatory matrix (GRM) w by minimizing the E^{SQE }for the ODE system in conjunction with additional objective functions, E^{object}, which represent the distance between the parameter matrix w_{ij }and objectives O_{ij }constructed from additional information.
In principal, objectives O_{ij }can be constructed from any data available on the underlying regulatory network. In this paper, we focused on two possible scenarios.
In the first case, we considered the possibility that MOO might be used to integrate two different networkinference procedures, for example applied to independent replicates of a timecourse experiment. In this implementation we used timedelayed Spearman rankcorrelation [6] to develop a matrix O_{ij }(Equation 3) representing the degree of statistical correlation from any pair of genes within a set time delay interval (Figure 1, objective DSp).
Alternatively, O_{ij }can be built from the results of gene inactivation experiments. We reasoned that these experiments might fall in at least two categories. In the first case the gene is deleted at some stage of the lifecycle of the organism so that gene expression measurements can only be acquired after the new steady state has been reached. This can be easily achieved, by a plethora of gene knockouts (KOs) methodologies in model systems ranging from E. coli to mice. Alternatively, gene inactivation could be achieved by using biochemical inhibitors or RNA interference. In this scenario mRNA expression can be monitored at different time points following intervention. In the context of MOO both scenarios lead to a geneexpression matrices where rows are represented by genes and columns by gene KO experiments. From each of these matrices an objective O_{ij }can be computed to represent the relationship between every gene pair (Figure 1, objectives Tc, Tr are derived from expression data at a given timepoint shortly after gene inactivation whereas objectives Sc and Sr derive from expression data at a single time point at steady state; for further details on how to compute O_{ij }see sections below).
Different procedures may be used in combination using an ensemble approach; in this paper we describe the results of combining MOOTr with MOOTc (Figure 1, MOOT_{ens}) and MOOSr with MOOSc (Figure 1, MOOS_{ens}).
All MOO procedures developed within NIMOO have been compared with the accuracy of an ODE model developed by minimizing E^{SQE}, a procedure that we called singleobjective optimization (Figure 1, SOO).
The paragraphs below describe in detail how the different objectives were computed.
Construction of a timedelay correlation matrix (objective DSp)
To test the potential of MOO to combine different network inference approaches we choose to build an objective based on timedelayed Spearman Rankcorrelation [6] (Figure 1, objective DSp). DSp was computed as follows: For each gene pair (i,j), the expression profile of gene i is shifted along the time axis with respect to that of gene j. The Spearman Rankcorrelation coefficient is calculated for varying time delays and the largest coefficient from this list forms the (i,j)^{th }element of the delayed Spearman Rankcorrelation matrix dSRC. We also construct a time delay matrix dt from the corresponding values. The objective DSp is then obtained from dSRC by equating all dSRC(i,j) = 0 for which dt(i,j) < t_{o}, so that only gene pairs with delay of t_{o }or more are considered.
Construction of a gene KO matrix: a ratiosbased procedure
The objectives Tr and Sr (Figure 1) were constructed by computing the ratios between the expressions of each gene i in the mutant j and the expression of gene i in the wild type. The expression of gene i is taken either at a given time point t_{p }after inactivation (Tr) or at the steady state (Sr). We selected t_{p }as the time point where the largest numbers of genes have the highest derivative in absolute value. We found that this heuristic rule allowed us to identify a value of t_{p}, which often (8 out of 9 networks tested) corresponded to the highest AUC values within a tolerance of 5% (Figure S1).
Construction of a gene KO matrix: a correlationbased procedure
The objectives Tc and Sc (Figure 1) were computed by calculating the correlation between the expressions of every pair of genes (gene i, gene j) across all KO samples. Similarly to the ratio procedure, the Tc matrix was built using the measure of gene i expression at time t_{p}, where t_{p }was chosen as detailed above.
Combining MOO procedures using an ensemble approach
The ratio and correlation methods were integrated to produce a single model by using an ensemble approach. Within this procedure, a GRM w_{a }was constructed so that w_{a}(i,j) = w_{r }(i,j) and sign(w_{a}(i,j)) = sign(w_{c }(i,j); Where w_{r }and w_{c }represent two GRMs obtained from the ratio and correlation procedures, respectively. As exemplified in Figure 1, MOOS_{ens }represent the result of combining the MOOSr and MOOSc procedures whereas MOOT_{ens}, is the result of combining the MOOTr and MOOTc procedures.
Simulated data
The validation study has been performed using the java application GeneNetWeaver (GNW) http://gnw.sourceforge.net webcite[19]. This network generator has been used as part of the DREAM Initiative [20]. It builds synthetic networks by specifying a biologically relevant topology and implementing an ODE model to generate synthetic data. GNW grows the initial topology from a seed node (selected randomly) in a Source Gene Network (E. Coli in this application) by progressively adding a randomly selected neighbouring node till the desired size is reached. Each model can be used to generate simulated time course gene expression data either with the intact network or following deletion of one of the nodes.
We tested the performance of MOO in conjunction with the objectives DSp (MOOSp), Tc (MOOTc), Tr (MOOTr), Sc (MOOSc) and Sr (MOOSr). Each of these procedures was applied to ten independent networks of size 20, 35 and 50 genes. The gene KOs dataset associated with every network was build by generating synthetic data after the stepwise deletion of each gene in the network.
All GNWgenerated networkmodels were used to simulate timeseries datasets (26 time points, t_max = 200) as well as steadystate data for all KOs.
Data processing and optimisation procedure
Noise was added to the simulated data (5% of the signal) to mimic variability in experimental replication. Polynomial fitting was used for interpolation of the timeseries data [21] after averaging three experimental replicates. 200 interpolated, equally spaced timepoints were then computed and used as input of the MOO procedures. Optimisation of the matrix w was initiated from a randomly generated matrix. In order to test the reproducibility of the optimization methods, fifty independent runs of optimization from each MOO procedure were carried out for a subset of the GNW networks. We found that the AUCs values were always within 0.2%.
Network inference accuracy
In order to evaluate various MOO methods we compared the inferred generegulatory matrix w with the true network topologies. The accuracy of the inference process for undirected networks was quantified by using the area under curve (AUC) of a ROC plot (False Positive Rate (FPR) versus True Positive Rate (TPA)). For directsigned networks the AUC was computed by plotting TNR (True Negative Rate) versus TPR as described in [10]. The distribution of AUC values for boxplots and these represented each batch of networks were compared when appropriate using a Wilcoxon's nonparametric rank sum test [22].
Modelling in vivo tumour development
In order to assess the potential of NIMOO to model true biological systems we have used two microarray datasets generated in our laboratory.
We first used an in vivo model of glioblastoma development to test the MOOSp procedure. In this experimental model [23] U87 human glioma cells (ATCC, USA) were maintained in DMEM with 10% FBS, antibiotics, and lglutamine. Fertilized chicken eggs (Gallus gallus; E.A.R.L. Morizeau, Dangers, France) were incubated at 37°C and 80% humidified atmosphere. On day 4 of development, a window was made in the eggshell after punctuating the air chamber and sealed with Durapore tape. On embryonic day 10, a plastic ring was placed on the embryo chorioallantoic membrane (CAM), and 3 million to 5 million U87 cells in 20 μl of medium were deposited after gentle laceration of surface. Implantation experiments were performed in triplicate, and, from day 11 to day 15, mRNA from the growing tumour was extracted every 12 hours using the standard protocol provided in the Qiagen RNeasy kit. Labelling was performed using protocol V5.7 provided in Agilent's Quick Amp OneColour labelling kit and hybridized onto human Agilent microarrays (AMADID:014850). Data were normalized using quantile normalization and genes differentially expressed over time were identified using the statistical methodology SAM [24]. 58 genes were selected among the most differentially expressed across the time course (Table S3) and used as input of the modelling procedure.
Modelling E. coli acid stress
In order to fully test the potential of MOO methodology we have applied the MOOS_{ens }procedure to model the E. coli response to mild acid conditions, a stress signal relevant to pathogenesis in diarrheagenic E. coli strains [25]. The procedure was used to integrate two microarray datasets representing the dynamic response of the E. coli MG1655 strain to acid exposure (pH = 5.5) and a gene KO experiment performed in the related strain E. coli BW 25113, representing the transcriptional state of strains mutated in the 26 most differentially expressed genes. In this analysis we aim to reverse engineer the interactions between these 26 genes. The timecourse analysis of the response of the E. coli strain MG1655 to acid exposure was performed maintaining a constant cell number (OD_{600 nm }= 2) using a media replenishment procedure. Samples were collected every 5 minutes for 1 hour in three replicated experiments. Mutant strains representing 26 of the most differentially regulated genes over time were selected from the BW25113 KEIO mutant collection [26] and analysed using microarrays as described below. Experiments were performed exactly in the same conditions as the MG1655 strain but only control and 15 minutes in acid were processed for microarray analysis.
Microarray analysis was performed as follows. 10 ml of cultures were samples at the different time points and stabilized by adding a solution of phenolethanol (final concentration of 19% phenol and 1% ethanol). Cell pellets were recovered by centrifugation and stored at 80°C. mRNA was extracted using the standard protocol provided in the Quiagen RNEasy kit (QUIAGEN, USA) and labelled with Cy5 labelled dCTP (Amersham Biosciences, USA) using the CyScribe PostLabelling Kit (Amersham Biosciences, USA). Probes were then purified using CyScribe purification Kit (Amersham Biosciences, USA) according to the manufacturer's instructions. Labelled probes (80 pmol) were then hybridized on Operon E. coli Ultra GAPS microarray slides (Corning, USA). Slides were washed in AdvaWash automated washing station (Adavlytix, USA) and scanned with the ScanArray^{® }GX (PerkinElmer^{®}, USA), using the ScanArray^{® }software. Data were normalized using quantile normalization and genes differentially expressed over time were identified using the statistical methodology SAM [25]. We modelled the E. coli datasets by using the ensemble approach integrating both correlation and ratio procedures as described above. In order to generate comparable sparse networks we thresholded the connectivity matrix w to obtain networks with same number of genes in the networks (25).
Method implementation and datasets availability
NIMOO was implemented in MATLAB [27]. Code and datasets are available at this URL: http://biptemp.bham.ac.uk/NI_MOO/NI_MOO.zip webcite.
Results
Combining different inference methodologies within the MOO framework improves the accuracy of network reconstruction
The first scenario we considered involved combining two network inference methods to model replicated time course experiments. To achieve this, we used delayed Spearman Rankcorrelation [6] to build the objective O_{ij }(Equation 3) for MOO.
We discovered that the simple timedelayed correlation matrix DSp (Figure 1) was more effective than SOO to reverse engineer undirected networks of size 20 and 35 (up to 10% increase, p < 0.05) (Figure 2A, C and Table 1). The MOODSp procedure was always more effective than SOO (up to 11% increase, p < 10^{3}) (Figure 2A, C, D and Table 1) and gave higher AUC values than the simple DSp matrix for networks of size 50 (8% increase, p < 0.05) (Figure 2D and Table 1). With directedsigned networks the dSP matrix was more effective than SOO although p values were borderline except for the larger 50 genes network size (7% increase, p < 0.01) (Figure 2B, D, F and Table 1).
Figure 2. Distribution of AUC values for the MOOSp procedure. Boxplots representing the distribution of AUC values for 20, 35 and 50gene networks. Accuracy of GRN reconstruction for both undirected (panels A, C and E) and directedsigned (panels B, D and F) networks is given for the SOO, DSp and MOOdSp procedures. p values are indicated in red when significant (α = 0.05). Borderline p values and indicated in black (α = 0.2).
Table 1. Accuracy of GRN inference with MOOdSp
Overall, we can conclude that in the event that only replicated timecourse experiments are available, a situation which is not uncommon, the integration between two methodologies can lead to a dynamical model with better accuracy than one solely based on a SOO procedure.
Integrating timecourse and gene inactivation experiments within the MOO framework: A ratiobased procedure
Having shown that MOO is an effective approach to combine different networkinference methodologies we set to test whether it may also provide a solution to integrate timecourse and geneinactivation experiments.
We initially approached this challenge by applying the MOOSr and MOOTr procedures to simulated data, representing gene expression in KO experiments either at the steady state or at a single time point t_{p }after inactivation. We discovered that AUC could vary considerably (up to 25%) depending on the value of t_{p }(Figure S1 in Additional File 1), suggesting that the choice of the right timepoint was an important factor. We also observed that the time point at which the largest number of gene expression profiles had the highest derivative often lead to higher AUC values within a tolerance of 5% (Figure S1 in Additional File 1). Although this has not to be considered a criterion to identify the optimal t_{p }value we believe it represents a useful guideline. MOO substantially improved the prediction of undirected networks, with all network sizes tested. The largest gain we observed was a 20% increase respect to SOO with 35gene networks, with the MOOSr procedure (p < 10^{3}) (Figures 3A, C, E, Table 1 and Table 2). Overall, the MOOSr procedure also gave consistently higher AUC values than MOOTr although p values were borderline significant (p value = 0.16). Combining Tr with Sr in the MOO procedure (MOO(Tr+Sr)) did not further improve the accuracy of network inference (Figures 3A, C and 3E and Table 2). For directsigned networks, only MOOTr gave consistent higher AUC values respect to SOO although p values were borderline significant (p value = 0.12) (Figures 3B, D, F and Table 2).
Additional File 1. A method comparison study and additional tables and figures as detailed in the body of the paper.
Format: PDF Size: 232KB Download file
This file can be viewed with: Adobe Acrobat Reader
Figure 3. Distribution of AUC values for ratiobased inference procedures. Boxplots representing the distribution of AUC values for 20, 35 and 50gene networks. Accuracy of GRN reconstruction for both undirected (panels A, C and E) and directedsigned (panels B, D and F) networks is given for the SOO, MOOTr, MOOSr, MOO(Tr+Sr) procedures. p values are indicated in red when significant (α = 0.05). Borderline p values and indicated in black (α = 0.2).
Table 2. Accuracy of GRN inference by integrating gene KO datasets in the MOO framework
Integrating time course and gene inactivation experiments within the MOO framework: A correlationbased procedure
In this section, we describe the results of the correlationbased procedure to construct MOO objectives from mutant gene expression data. As detailed in the methods section, this approach works by computing the correlation between the expression profiles of every pair of genes across the mutant samples.
We discovered that inference accuracy of the ratio and correlation methods had opposite trends with respect to undirected and directedsigned networks. More precisely, the correlationbased objectives gave higher AUC values for directsigned networks and lower AUC values than the ratio method for undirected networks. The differential in AUC values between the two methods was statistically significant for both undirected and directedsigned networks (up to 12% with undirected networks and up to 37% with directedsigned networks, p < 0.05 and p < 10^{3 }respectively) (Figure 4 and Table 2) Interestingly, the largest differential corresponded to the directedsigned larger 50gene networks (37%, p < 10^{3}).
Figure 4. Comparison of AUC values for ratio and correlation based inference procedures. In this figure the distributions of AUC values for ratio and correlationbased methods are represented on the same plots for comparison. Accuracy of GRN reconstruction for both undirected (panels A, C and E) and directedsigned (panels B, D and F) networks is given for the SOO, MOOTc, MOOTr, MOOSc and MOOSr procedures. Distribution of AUC value for ratiobased procedures are represented by notched boxplots whereas these for correlationsbased procedures are represented by rectangular boxplots. p values are indicated in red when significant (α = 0.05). Borderline p values and indicated in black (α = 0.2).
We discovered that the method of correlation is efficient even when a partial dataset is available. Figure 5 shows the results of the analysis for a 50gene network when KO data is available for 50% of the genes. We did not observe any increase in inference accuracy for undirected networks with the MOOSc and MOOTc procedures (Figure 5A and Table 3). However, a considerable increase in accuracy was detected when inferring directedsigned networks (Figure 5A and Table 3, up to 27% improvement versus a SOO approach, p < 10^{3}).
Figure 5. MOO with incomplete gene KO datasets. Boxplots representing the distribution of average AUC values for 50 gene undirected (panel A) and directedsigned (panel B) networks. Accuracy of GRN reconstruction is given for the SOO, MOOTc_{50% }and MOOSc_{50% }procedures. p values are indicated in red when significant (α = 0.05). Borderline p values and indicated in black (α = 0.2).
Table 3. Accuracy of GRN inference with partial coverage gene KO datasets
Combining ratio and correlationbased procedures further improve inference acuracy
Since we have shown that correlation and ratiobased methods provide complementary information, we decided to test whether combining them using an ensemble approach could result in an even higher accuracy of the network inference process.
This approach was successful. AUC values for the ensemble models built from combining the MOOTr and MOOTc approaches (MOOT_{ens}) were comparable to the best performing MOOTc models (Figure 6A, C and 6E and Table 2) whereas models built from combining MOOSr and MOOSc (MOOS_{ens}) yield even higher AUC values than MOOSc models for the larger 35 and 50gene networks (15% and 10% increased AUC values, p < 0.05) (Figure 6D and 6F and Table 2).
Figure 6. Combining MOO procedures. Boxplots representing the distribution of AUC values for 20, 35 and 50gene networks obtained by the application of ensemble approach combining correlation and ratiobased MOO procedures. Accuracy of GRN reconstruction for directedsigned networks is given for the MOOTr, MOOTc, MOOT_{ens }procedures (panels A, C and E) and for the MOOSr, MOOSc, MOOS_{ens }procedures (panels B, D and F). p values are indicated in red when significant (α = 0.05). Borderline p values and indicated in black (α = 0.2). p values are indicated in red when significant (α = 0.05). Borderline p values and indicated in black (α = 0.2).
Overall, MOOS_{ens }was the best performing procedure in inferring directedsigned networks. Therefore, we concluded that if both timecourse and KO data are available for a subset of genes of interest, MOOS_{ens }may be the procedure of choice.
Modelling biological systems with NIMOO
In order to test the validity of NIMOO to model real biological systems, we have analysed two datasets generated in our own laboratory. The first was a replicated geneexpressionprofiling timecourse experiment representing a model of in vivo glioblastoma development. A subset of these data were modelled with the MOOSp procedure. The second dataset included a time course representing the transcriptional response of E. coli during acid adaptation and the expression profiling of a compendium of 26 mutants exposed to acid. Because of the availability of both timecourse and mutant steadystate data we applied the MOOS_{ens }procedure.
Modelling in vivo tumour development
Our model identified a network organized around three main hubs (NFE2L2, ERBB2 and HSPB1) (Figure 7B). NFE2L2 (Nuclear factor E2 p45related factor 2; commonly called Nrf2) is a transcription factor that binds to the cisregulatory, antioxidant response element and transcriptionally upregulate an environmental stress response program [28]. Nrf2 is critical for protection against a wide range of inflammatory conditions, hyperoxia, fibrosis, hepatotoxicity, carcinogenesis, neurodegeneration, cardiovascular disease and aging [29]. Inactivation of Nrf2 in some cancers, promote tumorigenicity and resistance to an array of chemotherapeutic compounds [30]. The biological role of Nrf2 as a master regulator of a crucial response is fully reflected in our model that identifies Nrf2 as the most upstream network node with the highest number of connections. Note that without the application of the MOO methodology this network feature was not inferred (Figure 7A).
Figure 7. Inference of of biologically relevant networks. Gene regulatory networks obtained from the glioblastoma (panels A, B) and E. coli acid stress datasets (panels C and D). Networks obtained from SOO (panels A and C) and MOO (panels B and D) procedures are shown.
The other network hubs are also known important signalling factors. ERBB2 is a gene that encode for a member of the epidermal growth factor (EGF) receptor family of receptor tyrosine kinases. This protein has no ligandbinding domain but it does bind tightly to other ligandbound EGF receptor family members enhancing kinasemediated activation of downstream signaling pathways. HSPB1has a cytoprotective function and support of cell survival under stress conditions. This gene is also involved in the apoptotic signalling pathway and interacts with actin and intermediate filaments to protect actin filaments from fragmentation. It also preserves the focal contacts fixed at the cell membrane. These observations support the hypothesis that Nrf2 sits high in the hierarchy of events leading to the development of a fully vascularized tumour.
Reverse engineering an E coli acid response network
Both single objective (SOO) and multiobjective (MOO) optimization methods were applied to investigate regulatory networks representative of E. coli acid adaptation. In order to test the full potential of the NIMOO methodology, we used both timecourse and geneinactivation experiments.
The networks identified using either the time course data (SOO procedure) or the combination of time course and gene KO profiles (MOO procedure) are represented in Figure 7C and 7F respectively. In order to validate the model, we have compared our results with the gene interactions known in literature or extracted from the REGULON DB database [31].
The SOO method identified a number of gene connections that were known to play a role in acid adaptation. These included the interaction between two of the glutamatedependent acidstress response genes gadW and gadX [32]. However, in this model the directions of the gene interactions are mostly incorrect and not representative of the known E. coli acid response mechanisms. For example, the coding glutamate decarboxylase gene gadB is unlikely to be involved in the modulation of the twocomponent system PhoP/PhoQ.
On the contrary, the gene regulatory network derived from the application of the MOO procedure (Figure 7D) includes several gene interactions known to be important in acid adaptation.
A key interaction involved the twocomponent system PhoP/PhoQ [33]. This complex is a known upstream regulator of acid adaptation. The model we developed (Figure 7C) reflects the upstream regulatory role of this complex and correctly predicted its influence in the regulation of the acid resistance genes gadW and hdeA [34]. The network also shows the known negative interaction between gadX and gadW [32] and the inhibition of the crp gene by fis [35,36]. Another validated interaction found by the MOO procedure is represented by the link between the histonelike protein hns and cadA [37]. Our model shows that hns may activate the expression of cadA. The connection is consistent with the literature, however, in GNB7145K hns mutants Shi et al. [37] have shown that hns acts as a repressor.
Some of the interactions in the network represent potentially novel regulatory mechanisms in E. coli response to acid stress. An example is the hypothesis that phoP may be involved in the activation of narX, a nitrite/nitrate sensor protein. This is a gene that is part of a twocomponent system regulating a component of anaerobic metabolism, which is a function known to be regulated during acid response [38].
Discussion
In this paper we presented the gene regulatory network inference method "Network Inference with Multi Objective Optimization" (NIMOO).
When tested on simulated and "real world" datasets, NIMOO performs well even if incomplete data are available. The main feature of this methodology is that it can be used to develop dynamical models of gene regulatory networks integrating multiple data sources and combining any existing network inference methodology to identifying the network topology.
Although other methods have the potential to include prior knowledge in the inference process the ability to plugin different inference methods in the same modelling procedure is so far a unique feature of NIMOO. In this paper we tested this concept and demonstrated that the approach can be successful even if a relatively simple procedure is integrated in the ODE model parameter estimation. However, a more comprehensive testing may be required to explore the full potential of this approach, for example combining more advanced methods in the MOO optimization procedure.
In terms of data integration, we have mainly focused on gene KO experiments. However, some of the procedures we have tested (e.g. MOOTc and MOOSc) are directly applicable to other types of experimental data. For example, a compendium of environmental and growth factorinduced perturbations could be employed to develop an objective compatible with these procedures. Such objectives could be for example computed by using the information theoretical approach ARACNE [5].
Moreover, additional information, for instance the confidence level in transcription factor binding consensus sequences in a gene's promoter region could also be incorporated within the optimisation process. More generally, in the event that multiple objectives are used within a MOO procedure, each function's relative importance could be weighted by adjusting the optimization parameters, such as weights θ_{k }(Equation 4). Additionally, any definite qualitative knowledge of the presence or absence of gene connections may be used as a constraint on the inferred generegulatory matrix (hard prior).
Because of the ability to integrate different methods the user can very easily customize NIMOO. In this respect, NIMOO is an integration framework rather than a specific method. Comparing its performance with existing methods is therefore not necessarily consequential. However, we have performed an initial assessment comparing some implementations of NIMOO to other methods. For example, all NIMOO procedures outperformed TSNI [10] in inferring undirected networks (Table S1 in Additional File 1) and the MOOS_{ens }and MOOT_{ens }performed better with both undirected and directsigned networks (Table S1 in Additional File 1). Moreover, NIMOO performed in a comparable manner to the method developed by Yip et al. [15], which won the DREAM3 competition http://wiki.c2b2.columbia.edu/dream/index.php/ webcite (Table S2 in Additional File 1).
So far the application of multiobjective optimization methods to inferring biological networks has been limited: For example, van Someren et al. [39] and FomekongNanfack et al. [40] used MOO to incorporate multiple constraints arising from the requirement of stability and robustness of gene networks, and, Liu and Wang [41] have used MOO to infer biochemical networks by simultaneously minimizing for the concentration error and the slope error. However, in all these cases a single data set and a single reverse engineering criterion were used.
Conclusions
The networkinference framework NIMOO is flexible and can be used in many different scenarios, even when available information is incomplete. The application of NIMOO to biological datasets representing two different "real world" scenarios produced very interesting results. The analysis of the experimental datasets illustrated that inclusion of additional objectives from the same dataset could significantly improve our ability to identify key regulators of relevant biological processes.
Authors' contributions
RG implemented the method, performed the validation and contributed to the writing of manuscript. PA contributed with the processing and analysis of the experimental data. AS and SD performed the microarray experiments. AB and RB designed the tumor implantation experiments and AB performed the experiment. FF designed the approach and contributed to writing the paper. All authors read and approved the final manuscript.
Acknowledgements and Funding
The work described in this paper was funded by the CRUK grant C8504/A9488 and partially funded by the BBSRC grant BBC5151041. AS is a recipient of a Darwin Trust PhD fellowship and PA is a recipient of a BBSRC PhD studentship. We would like to thank Victor Trevino for useful comments on the manuscript.
References

Fodor SP, Rava RP, Huang XC, Pease AC, Holmes CP, Adams CL: Multiplexed biochemical assays with biological chips.
Nature 1993, 364:555556. PubMed Abstract  Publisher Full Text

Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray.
Science 1995, 270:467470. PubMed Abstract  Publisher Full Text

Stolovitzky G, Monroe D, Califano A: Dialogue on reverseengineering assessment and methods: the DREAM of highthroughput pathway inference.
Ann N Y Acad Sci 2007, 1115:122. PubMed Abstract  Publisher Full Text

Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED: Advances to Bayesian network inference for generating causal networks from observational biological data.
Bioinformatics 2004, 20:35943603. PubMed Abstract  Publisher Full Text

Basso K, Margolin AA, Stolovitzky G, Klein U, DallaFavera R, Califano A: Reverse engineering of regulatory networks in human B cells.
Nature Genetics 2005, 37:382390. PubMed Abstract  Publisher Full Text

Schmitt WA jr, Raab RM, Stephanopoulos G: Elucidation of gene interaction networks through timelagged correlation analysis of transcriptional data.
Genome Research 2004, 14:165463. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Rangel C, Angus J, Ghahramani Z, Lioumi M, Sotheran E, Gaiba A, Wild DL, Falciani F: Modeling Tcell activation using gene expression profiling and statespace models.
Bioinformatics 2004, 20:136172. PubMed Abstract  Publisher Full Text

Hirose O, Yoshida R, Imoto S, Yamaguchi R, Higuchi T, CharnockJones DS, Print C, Miyano S: Statistical inference of transcriptional modulebased gene networks from time course gene expression profiles by using state space models.
Bioinformatics 2008, 24:932942. PubMed Abstract  Publisher Full Text

Guthke R, Möller U, Hoffmann M, Thies F, Töpfer S: Dynamic network reconstruction from gene expression data applied to immune response during bacterial infection.
Bioinformatics 2005, 21:16261634. PubMed Abstract  Publisher Full Text

Bansal M, Della Gatta G, di Bernardo D: Inference of gene regulatory networks and compound mode of action from time course gene expression profiles.
Bioinformatics 2006, 22:815822. PubMed Abstract  Publisher Full Text

Bansal M, Belcastro V, AmbesiImpiombato A, di Bernardo D: How to infer gene networks from expression profiles.

Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R: Gene regulatory network inference: Data integration in dynamic models: A review.
Biosystems 2009, 96:86103. PubMed Abstract  Publisher Full Text

Marbach D, Prill RJ, Schaffter T, Mattiussia C, Floreanoa D, Stolovitzkyc G: Revealing strengths and weaknesses of methods for gene network inference.
PNAS 2010, 107(14):628691. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Yip KY, Alexander RP, Yan KK, Gerstein M: Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data.
PLoS ONE 2010, 5(1):e8121. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Coleman TF, Li Y: On the Convergence of Reflective Newton Methods for Large Scale Nonlinear Minimization Subject to Bounds.
Mathematical Programming 1994, 67(2):189224. Publisher Full Text

Coleman TF, Li Y: An Interior, Trust Region Approach for Nonlinear Minimization Subject to Bounds.
SIAM Journal on Optimization 1996, 6:418445. Publisher Full Text

Gembicki FW: Vector Optimization for Control with Performance and Parameter Sensitivity Indices. In Ph.D. Dissertation. Case Western Reserve Univ., Cleveland, OH; 1974.

Gembicki F, Haimes Y: Approach to Performance and Sensitivity Multiobjective Optimization: The Goal Attainment Method.
IEEE Transactions on Automatic Control 1975, 20:769. Publisher Full Text

Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods.
Journal of Computational Biology 2009, 16:229. PubMed Abstract  Publisher Full Text

Prill RJ, Marbach D, SaezRodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, AltanBonnet G, Stolovitzky G: Towards a rigorous assessment of systems biology models: the DREAM3 challenges.
PLoS One 2010, 5(2):e9202. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Atkinson KendellA: An Introduction to Numerical Analysis. Volume Chapter 3. 2nd edition. John Wiley and Sons; 1988.

Wilcoxon F: Individual comparisons by ranking methods.
Biometrics 1945, 1:8083. Publisher Full Text

Hagedorn M, Javerzat S, Gilges D, Meyre A, de Lafarge B, Eichmann A, Bikfalvi A: Accessing key steps of human tumor progression in vivo by using an avian embryo model.
PNAS 2005, 102(5):16431648. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response.
PNAS 2001, 98(9):511621. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Foster JW: Escherichia coli acid resistance: tales of an amateur acidophile.
Nat Rev Microbiol 2004, 2(11):898907. PubMed Abstract  Publisher Full Text

Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H: Construction of Escherichia coli K12 inframe, singlegene knockout mutants: the Keio collection.
Mol Syst Biol 2006., 2
2006 0008
Publisher Full Text 
MATLAB version 7.10.0 (R2010a) The MathWorks Inc. Natick, Massachusetts;

Kensler TW, Wakabayashi N, Biswal S: Cell survival responses to environmental stresses via the Keap1Nrf2ARE pathway.
Annu Rev Pharmacol Toxicol 2007, 47:89116. PubMed Abstract  Publisher Full Text

Giudice A, Arra C, Turco MC: Review of molecular mechanisms involved in the activation of the Nrf2ARE signaling pathway by chemopreventive agents.
Methods Mol Biol 2010, 647:3774. PubMed Abstract  Publisher Full Text

Kensler TW, Wakabayashi N: Nrf2: friend or foe for chemoprevention?
Carcinogenesis 31(1):9099. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

GamaCastro S, Salgado H, PeraltaGil M, SantosZavaleta A, MuñizRascado L, Solano Lira H, JimenezJacinto V, Weiss V, GarcíaSotelo JS, LópezFuentes A, PorrónSotelo L, AlquiciraHernández S, MedinaRivera A, MartínezFlores I, AlquiciraHernández K, MartínezAdame R, BonavidesMartínez C, MirandaRíos J, Huerta AM, MendozaVargas A, ColladoTorres L, Taboada B, VegaAlvarado L, Olvera M, Olvera L, Grande R, Morett E, ColladoVides J: RegulonDB version 7: transcriptional regulation of Escherichia coli K12 integrated within genetic response units (Gensor Units).

Ma Z, Richard H, Tucker DL, Conway T, Foster JW: Collaborative regulation of Escherichia coli glutamatedependent acid resistance by two AraClike regulators, GadX and GadW (YhiW).
J Bacteriol 2002, 184(24):700112. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Groisman EA: The pleiotropic twocomponent regulatory system PhoPPhoQ.
J Bacteriol 2001, 183(6):183542. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Eguchi Y, Itou J, Yamane M, Demizu R, Yamato F, Okada A, Mori H, Kato A, Utsumi R: B1500, a small membrane protein, connects the twocomponent systems EvgS/EvgA and PhoQ/PhoP in Escherichia coli.
PNAS 2007, 104(47):187127. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Kolb A, Igarashi K, Ishihama A, Lavigne M, Buckle M, Buc H: E. coli RNA polymerase, deleted in the Cterminal part of its alphasubunit, interacts differently with the cAMPCRP complex at the lacP1 and at the galP1 promoter.
Nucleic Acids Res 1993, 21(2):31926. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

GonzálezGil G, Kahmann R, Muskhelishvili G: Regulation of crp transcription by oscillation between distinct nucleoprotein complexes.
EMBO J 1998, 17(10):287785. PubMed Abstract  PubMed Central Full Text

Shi X, Waasdorp BC, Bennett GN: Modulation of acidinduced amino acid decarboxylase gene expression by hns in Escherichia coli.
J Bacteriol 1993, 175(4):11826. PubMed Abstract  PubMed Central Full Text

Stewart V: Nitrate regulation of anaerobic respiratory gene expression in Escherichia coli.
Molecular Microbiology 1993, 9:425434. PubMed Abstract  Publisher Full Text

Van Someren EP, Wessels LFA, Backer E, Reinders MJT: Multicriterion optimization for genetic network modeling.
Signal Processing 2003, 83:763775. Publisher Full Text

FomekongNanfack Y, Postma M, Kaandorp JA: Inferring Drosophila gap gene regulatory network: a parameter sensitivity and perturbation analysis.
BMC Systems Biology 2009, 3:94. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Liu , Wang : Inference of biochemical models in Ssystem using multiobjective optimization approach.
Bioinformatics 2008, 24:10851092. PubMed Abstract  Publisher Full Text