T.J. Watson Research Center, Yorktown Heights, New York, USA

European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK

Autodesk Research, San Francisco, CA, USA

Department of Bioengineering, University of Washington, William H. Foege Building, Box 355061, Seattle, WA 98195-5061, USA

Department of Mathematics, MIT, Cambridge, Massachusetts, USA

BIOSS Centre for Biological Signalling Studies, University of Freiburg, Schänzlestr. 18, 79104 Freiburg, Germany

Merrimack Pharmaceuticals One Kendall Square, Suite B7201, Cambridge, MA 02139, USA

Physics Department, University of Freiburg, Hermann-Herder Str.3, 79104 Freiburg, Germany

Abstract

Background

Accurate estimation of parameters of biochemical models is required to characterize the dynamics of molecular processes. This problem is intimately linked to identifying the most informative experiments for accomplishing such tasks. While significant progress has been made, effective experimental strategies for parameter identification and for distinguishing among alternative network topologies remain unclear. We approached these questions in an unbiased manner using a unique community-based approach in the context of the DREAM initiative (Dialogue for Reverse Engineering Assessment of Methods). We created an

Results

We proposed two challenges; in the first, participants were given the topology and underlying biochemical structure of a 9-gene regulatory network and were asked to determine its parameter values. In the second challenge, participants were given an incomplete topology with 11 genes and asked to find three missing links in the model. In both challenges, a budget was provided to buy experimental data generated

Conclusions

A total of 19 teams participated in this competition. The results suggest that the combination of state-of-the-art parameter estimation and a varied set of experimental methods using a few datasets, mostly fluorescence imaging data, can accurately determine parameters of biochemical models of gene regulation. However, the task is considerably more difficult if the gene network topology is not completely defined, as in challenge 2. Importantly, we found that aggregating independent parameter predictions and network topology across submissions creates a solution that can be better than the one from the best-performing submission.

Background

Predictive and mechanistic models are powerful tools to understand biological processes at the core of systems biology. Building models requires a list of molecular components and their interactions. This list can be assembled from prior knowledge and/or inferred, or reverse engineered, from dedicated experimental data

In a real-life scenario of limited resources, the key question is how to design experiments that are most useful for parameter characterization

To explore this fundamental problem in a rational and unbiased fashion, we first set up the

Besides the question of the algorithmic/experimental strategy used to infer the kinetic parameters of a model, we also addressed how well new connections in a network could be inferred. This is also a relevant question, as many canonical pathways are only approximations to the system under study. We therefore ran a second challenge, the

We complemented the analysis of the submissions by analyzing the participants’ algorithmic strategies and credit usage for data acquisition. We concluded that using fluorescent data from protein time courses is a key component of parameter estimation strategies, and that in both challenges aggregation created solutions that fared as well or better than the best performing approaches. We chose an

Results

In both the

A realistic model of a gene regulatory network

In model 1 for the

**Supplementary material files– Models and Submissions – model and data for challenge are provided as supplementary material as well as participants’ submissions.** Models are provided in MATLAB and Systems Biology Markup Language (SBML format) and the submissions name reflects the rank except for the best performing teams. They are also available at the DREAM site (

Click here for file

The regulation of each gene was inspired from prokaryotes and modeled as follows: each gene can have, upstream of the protein coding region, an activator binding site, an inhibitory binding site, a promoter, and a ribosomal binding site (Figure

Model and gene regulatory network of the parameter estimation challenge

**Model and gene regulatory network of the parameter estimation challenge. A**. Example of a case of regulation of the transcription of coding sequence **B**. Gene network from model 1 of the Parameter Prediction challenge consisting of 9 genes whose 45 parameters and the prediction of response to perturbations were requested from challenge participants.

For each regulatory process, activation or repression, two parameters have to be estimated: the dissociation constant K_{d} and the Hill coefficient h. In model 1, for each protein production process, there are two parameters to be estimated: the promoter strength and the ribosomal binding site strength (see Figure _{d}, and 13 Hill parameters; see Table _{3}, _{5}, and _{8} under perturbed conditions defined below.

Scores and correlation between parameter and protein prediction distances for model 1

**Scores and correlation between parameter and protein prediction distances for model 1. A**. Graph representing the dynamics of the mRNAs from the 9 genes for model 1 network. Dots are the data with noise, lines represent the data without noise and shades the associated noise model **B**. Overall scores from the participants calculated from the p-values as indicated by the formula. P-values were obtained from the two different metrics used for challenge scoring described in Additional file **C**. The participant distances defined for scoring the submitted predictions for the parameters and the protein perturbation predictions are plotted respectively in the y-axis ^{2} coefficient for a linear fit in log-scale is 0.23; the red line is a visual reference for a perfect fit. **D**. For each of the 45 parameters in the model, the vector of parameter values submitted by the 12 participants is correlated (R^{2}) to the unique vector of

**Parameter**

**Model 1**

**Model 2**

Parameters involved in the

Promoter strength

9

**X**

rbs strength

9

**X**

Protien synthesis

**X**

16

Basals

**X**

2

Degradation rate

1

11

kd

13

16

Hill coefficient

13

16

Total

45

61

Although the basic structure for both challenges is similar, the _{d}, and 16 Hill parameters), 11 degradation rates, and 2 basal transcriptional rates for genes 5 and 11 that are not regulated by any other gene (see Table _{d} and h).

Network topology challenge gene network and scores A. Gene network for model 2 of 11 genes and 45 parameters where links

Click here for file

A credit system mimicking a limited experimental budget

The participants are given a virtual budget of ‘credits’ to buy data from experiments (produced

i. gene deletion, that produces the elimination of both mRNA and protein for the gene for 800 credits;

ii. siRNA-mediated knockdown, that increases the mRNA degradation rate 10-fold for 350 credits;

iii. a decrease of RBS (ribosomal binding site) activity that leads to a 10-fold decrease in translation rate for 450 credits.

Upon each of these types of perturbation, the teams could purchase data collected with different technologies, reflecting the relative ease or difficulty of acquiring this type of data in reality. Specifically, participants could buy time course data for:

i. protein abundance for 2 proteins of their choice at the highest resolution (every time unit) using fluorescence protein fusion for 400 credits;

ii. mRNA (for all genes) measured with a microarray, at either low resolution (every 4 time units) or high resolution (every 2 time units), at 500 and 1000 credits, respectively. Microarrays were only available in challenge 1, since the model of challenge 2 does not include mRNA;

iii. protein abundance for all proteins measured via mass spectrometry, also at high and low resolution for 500 and 1000 credits, respectively. This was available only in challenge 2, as an alternative to the microarrays of challenge 1.

Specific parameter values, namely the binding affinity (_{
d
}) and Hill coefficient (

Finally, in both data modalities a noisy measurement is simulated by adding some noise to the deterministic value of each variable. More precisely, if _{noisy} = v + 0.1 × g1 + 0.2 × g2 × v, where g1 and g2 are Gaussian random variables with standard deviations of 1. That is, for small v the standard deviation of v_{noisy} is close to 0.1, while when v is large, v_{noisy} amounts to measuring v with a standard error close to 20% of the true value. Note that if the value after noise addition is smaller than 0, the value of v_{noisy} is clipped at 0.

Challenge results

The

In order to solve the challenge, participants were allowed to spend credits to procure data generated

As the questions posed in models 1 and 2 are different, identifying topology in one case and identifying parameters in the other, we decided to separate the two challenges and select a winner for each one. Figure

**Model 1**

**Parameter distance D**^{param}

**P-value for parameter predictions**

**Protein distance D**^{prot}

**P-value for protein time course predictions**

**Score**

**Bayesian**

**Decompose network"**

**Selection of data**

**Sampling**

Table for Model 1 of the

Orangeballs

0.0229

3.25E-03

0.002438361

1.21E - 25

27.4

no

yes

Game Tree

Sequential local search

2

0.8404

1.00E + 00

0.016023721

3.39E-18

17.5

no

no

Manual based on parameter uncertainty

Global method

3

0.1592

6.00E-01

0.035404398

4.45E-15

14.6

yes

no

Manual

LH

4

0.0899

1.88E-01

0.047495432

6.28E-14

13.9

no

yes

Manual

LM + Particle Swarm

5

0.1683

6.45E-01

0.09791128

4.01E-11

10.6

yes

no

Train + Sim

UKF

6

0.0453

1.37E-02

0.198785197

1.93E-08

9.6

no

no

A=Criterion

Local (LM)

7

0.1702

6.45E-01

0.362463945

2.90E-06

5.7

no

yes

Sensitivity analysis

Hybrid (Local + Global)

8

0.8128

1.00E + 00

0.356429217

2.53E-06

5.6

yes

no

Estimation of improved uncertainty

Global (MH)

9

0.3766

9.99E-01

0.817972877

1.34E-03

2.9

yes

yes

MI

ABC-SMC

10

0.0699

9.83E-02

19.32326868

1.00E + 00

1.0

no

yes

Minimize variance based on FI

Multistart local search

11

0.1883

7.29E-01

3.222767988

6.90E-01

0.3

no

no

Train + Sim

LH + DE

12

5.0278

1.00E + 00

14.77443631

1.00E + 00

0.0

no

no

Manual

Local method

Score calculation of the Parameter Estimation Challenge. A. A distance as shown by the equation is calculated based on the 45 parameters predicted values and a p-value is calculated when compared to a distribution of randomly generated relative null-hypothesis. B. A distance as shown by the equation is calculated based on the predicted protein concentration value for

Click here for file

**Model 2**

**Network score**

**p-value**

**Score**

**Link addition**

Table for Model 2 of the Network topology Challenge contains anonymized teams (except for best performer) ordered by Score rank. Next to each team is listed their network score ^{network}, associated p-value and the final score

crux

12

1.49E-02

1.83

Manual

2

9

5.60E-02

1.25

Manual

3

8

1.07E-01

0.97

Manual first + algorithm

4

8

1.07E-01

0.97

Manual('logic reasoning')

5

8

1.07E-01

0.97

Manual

6

7

2.10E-01

0.68

Algorithm(Grenits)

7

6

3.83E-01

0.42

Manual

8

5

6.01E-01

0.22

Manual

9

4

8.01E-01

0.10

Did not participate

10

4

8.01E-01

0.10

Did not participate

11

3

9.86E-01

0.01

Manual

12

2

1.00E + 00

0

Algorithm GP-DREAM

Parameter inference results

An intriguing result of the ^{th} overall ranked team was second in parameter estimation but last in protein prediction. Conversely, the second overall ranked team was next to last in parameter estimation but second in protein prediction (Figure ^{2} = 0.23 for the correlation of parameter distance _{1}
^{
param
} to protein prediction distance _{1}
^{
prot
} (Figure ^{nd} and 10^{th} overall ranked teams was puzzling. After contacting the 10^{th} team we learned that their optimization objective was centered on the parameters and not on protein prediction. This underscores how the choice of scoring metric is not a trivial question and can dramatically influence the results ^{nd} ranked team focused on the prediction of the protein values, and grouped together parameters that they found to be non-identifiable. Combinations of such non-identifiable parameters, such as _{
d
} and ^{nd} ranked team did, reproducing the original dynamical system behaviors might be more relevant than parameter estimation.

To further investigate this possibility, we analyzed the dependence of the protein perturbation predictions on each individual parameter, and calculated for each one of the 45 parameters the correlation of the vector of participants’ submitted parameter values to their protein prediction distance, _{1}
^{
prot
}. _{1}
^{
prot
} was most dependent on the values of parameters directly involved in _{d} for r13 (R^{2} = 0.88), rbs4 (R^{2} = 0 .66), rbs8 (R^{2} = 0.61), rbs5 (R^{2} = 0.59), rbs3 (R^{2} = 0 .45) (Figure ^{2} = 0 .35) is a global parameter. The strong dependency of _{1}
^{
prot
} and _{1}
^{
param
}.

Aggregation of participants’ results

For model 1, most participants’ time-course predictions of proteins

Scores of aggregated participant results

**Scores of aggregated participant results. A**. Protein concentrations of participants’ predictions (in blue) and the solution (green) are plotted against time for proteins **B**. Participant submissions are aggregated by averaging each protein concentration for individual time points, starting from the 2 best performing teams until all 12 teams are included. Each aggregated result is plotted in blue and the solution is plotted in green. **C**. Log scale distance to the solution of parameter predictions is plotted for participant teams ordered by rank (blue line) and geometric means of parameter predictions from teams ordered by number of aggregated teams following parameter distance rank (green line) or inverse rank order (red line). **D**. Log-scale distance to the solution of proteins

This phenomenon also occurs when aggregating the participants’ submitted parameters by geometric mean using the same procedure as above. _{1}
^{
param
} for this aggregation by geometric mean shows that for up to eight aggregated teams, the aggregated team submission is closer to the solution when compared to _{1}
^{
param
} for the best individual team submission (Figure

Results are mitigated when one considers _{1}
^{
prot
} as a measure of the effectiveness of the aggregation of solutions. Indeed, choosing as a solution the aggregation of all teams brings a _{1}
^{
prot
} that is worse than eight of the teams (Figure ^{-25}, compared to 3.35. 10^{-3} for parameter estimation results (see Additional file

In model 2, to find the participants’ consensus 3 missing links, we counted how often links were submitted by participants and chose the 3 most popular ones (Figure

Dynamics and scores of the network topology challenge

**Dynamics and scores of the network topology challenge. A**. Time courses of the proteins from the 11 proteins in the model 2 network. Dots are the data with noise, lines represent the data without noise and shades the associated noise model. **B**. Ordered scores from the participants as well as the score of the consensus solution defined as the 3 most submitted links. Scores were calculated from the p-values as indicated in Methods, Additional file **C**. The 3 links **D**. Diagrams of consensus network of links (blue) and solution (black). Dashed arrow indicates an indirect regulation.

Analysis of participants’ strategies and experimental credit usage

The various types of data and perturbations were used differently by teams for each of the challenges. Available data types were slightly different between the challenges, since mass spectrometry data was not available for challenge 1 and microarray data was not available for challenge 2. Of the 13 different possible combinations of experiments, the most demanded one was the measurement via fluorescent microscopy of time-courses of two proteins (Figure

Analysis of experimental credit usage in challenges

**Analysis of experimental credit usage in challenges A.** Histogram indicating the number of times credits were spent on an experiment for the **B**. Histogram indicating the number of times credits were spent on an experiment for the **C**. Diagram indicating the sequence of experiments performed in the p**D**. Diagram indicating the sequence of experiments performed in the

For the

These differences indicate alternative strategies for the solution of both challenges (see details in Tables

Winning strategy for the

The basic idea of our approach was to compute a maximum-likelihood fit of the model parameters given observed data purchased from

We began our analysis of each model by buying time courses of all proteins under wildtype conditions. These experiments were by far the cheapest and allowed us to start making initial guesses at parameter values. For example, the protein degradation rate can be estimated from the time course of a non-regulated protein (e.g.,

Having initial guesses of the parameters, we then viewed the problem of choosing successive data purchases as a game tree of possible sequences of experiments, with the goal being to identify paths most likely to reduce the uncertainty as much as possible at minimum cost. Given that the optimal sequences change as data is purchased (revealing information about the model parameters), we generally tried to find experiments to perform early on that (i) were likely to be necessary regardless of the actual parameter values, or (ii) would provide information distinguishing the most disparate possibilities (e.g., in some cases it was impossible to tell initially whether a regulator was performing full activation or zero activation).

Because of the combinatorial complexity of possible data purchase paths, however, it was critical to apply heuristics to estimate the utility of purchases and to limit the search space. Given the heuristic nature of the search and the relatively small size of the networks, we found it most practical to map out plausible purchase paths on paper rather than codifying our game tree search scheme. We now describe a few key heuristics we developed that we found most valuable.

• Steady-state values provide the cleanest measurements of parameters because having a multiplicity of measurements of the same steady-state value allows for averaging out noise. Moreover, combining different steady-state values enables direct inference of activation and repression parameters (_{
d
} and

and

• Combining these equations,

• Considering for the moment the case of a single repressor, there are two unknowns, _{
d
} and

• Different steady-states under experimental perturbations yield values of the right side corresponding to different values of the regulatory protein concentration, and taking ratios of these values isolates the effect of the regulation. It follows that 3 steady-state measurements are theoretically enough to determine _{
d
} and _{
d
}.

• For the purpose of obtaining new steady-state measurements at minimal cost, a trade-off has to be considered between protein measurements (which get 2 new steady states) and mRNA measurements (which get values for all genes, but at much lower resolution). Additionally, a given perturbation typically only produces new steady states for a small number of genes because the effect of the perturbation is often mitigated downstream (by saturation of an activator or repressor). We found that 2-protein measurements generally seemed to be most cost-effective with a few exceptions.

• Most protein and mRNA time courses simply converge to steady-state behavior, but in cases with interesting dynamics, the time trace information is highly informative and can allow inference of parameters with fewer perturbations; this is important to keep in mind to reduce costs.

• For some regulations, the only option is to measure _{
d
} and _{
d
} for a reasonable amount of time; most often this happens when _{
d
} is very small and the regulating protein increases quickly in concentration. Another case is if

These heuristics collectively allowed us to drastically limit the number of candidate experiments to consider at each purchasing step, typically just to one or two possible experiments directed at investigating each unknown parameter. Because the scoring function was based on total squared relative error, prioritizing the least constrained parameters was clearly advantageous and further reduced the search space. Additionally, whenever we were able to identify components of a model that functioned approximately independently, we applied a divide-and-conquer approach to analyze each component in isolation – again limiting the combinatorial explosion of search paths – and then aggregated the results,

As a final note, after finding potential perturbations to run using these heuristics, we were able to test whether the experiments were likely to achieve their objectives by simply simulating the effects of the perturbations and checking whether different values of the parameters led to noticeably different time traces. We found this simple check to be very useful in helping decide which data to buy.

Winning strategy for the

From the point of view of statistical methodology, inferring missing links in a gene regulatory network model based on experimental data constitutes a model discrimination issue. We applied the classical maximum likelihood methods to address this benchmark challenge.

For the given error model which can be described by a probability distribution

over all data points _{
i
} interpreted as a function of the parameters

Since the model is nonlinear with respect to the parameters, the likelihood landscape can exhibit local minima. Therefore, optimization was repeated using multiple initial guesses. For this purpose, we used Latin hypercube sampling to efficiently explore the parameter space

To assess the model’s ability to explain the data, we used the least-squares goodness of fit statistic.

where x denotes the concentrations predicted by the model. Moreover, likelihood ratios have been utilized to statistically test whether extending the model by additional parameters significantly improves the fit. Since in the challenge the measurement errors were given as normally distributed, log-likelihood ratios are in fact proportional to differences of χ^{2}.

The profile likelihood

Initially, we performed less costly protein measurements for the wildtype setting to have a minimal amount of experimental information enabling the application of the tools introduced above. In this stage, we already gained confidence that the data required an extension of the model allowing for oscillations. Introducing a negative feedback on protein p1 mostly improved our outcome.

In the next stage, we favored mass spectrometry experiments since they provide comprehensive information of all regulators and targets. Having a complete data set for a perturbation setting is advantageous to minimize the risk of erroneously proposing links. Moreover, we preferred high-resolution data to obtain as much information as possible about the dynamics. We noticed that missing links with a Hill-type kinetic are only identifiable if the concentrations of the regulator cover the range around the respective Michaelis constant _{
d
}. Therefore, we primarily concentrated on perturbations where we expected largely different concentration ranges of potential regulators.

Additional file _{
d
} = 17.9 of this missing link.

Summary of the experimental design considerations of team crux for the network inference challenge. The second column denotes the chosen experimental conditions in the notation used during the challenge. The arguments underlying their decisions are denoted by abbreviations. Wild-type measurements provide data for substantially fewer credits (argument “WT”). Such measurements have been chosen initially to obtain a setting with a reasonable set of identifiable parameters. Data with high resolution over time (argument “High-Res”) provides more detailed information about the dynamics and was therefore expected to be more efficient for distinguishing potentially missing links with similar qualitative effects. Using a measurement technique providing data for all compounds (argument “All”) is advantageous to obtain a comprehensive overview of the effect of a perturbation. The argument abbreviated by “Range” indicates the fact that missing links are only identifiable, if the concentration range of the regulator is not far from the respective Michaelis constant _{d}. Therefore we performed perturbations affecting the concentration range of potential regulators in a desired direction. Finally, we had to take into account the remaining credits indicated by the argument “Budget”.

Click here for file

Discussion

In order to evaluate how well mechanistic models could be built upon inferred biological networks, we tested the accuracy of model parameter predictions and missing link identification. Surprisingly, with a limited amount of data, participants were able to reliably predict the value of the parameters and temporal evolution of 3 proteins under perturbed conditions in the

Aggregation of participant results

DREAM results for a diverse set of challenges have recurrently demonstrated the “wisdom of crowds” phenomenon, where aggregation of participants’ results has proven to give robust and top performing results

In spite of these original features, we have been able to obtain, as in other DREAM challenges, a robust and high-performing set of predictions based on the geometric mean for the parameters and arithmetic mean for the protein predictions (Figure

For the

Participants’ methods and credit usage

The strategies for data acquisition were different for the

Conclusions

Our results show that from a defined gene network model it is possible to accurately determine the kinetic parameters of a gene regulatory circuit, given simple fluorescent-based experimental data and an adequate inference strategy. More generally, our results suggest that state-of-the-art parameter estimation and experimental design methods can in principle determine accurate parameters of biochemical models of gene regulation, but the task is considerably more difficult or maybe impossible to unequivocally solve if the knowledge of the topology is not precise, as often is the case.

As they stand, this study and the underlying data and models are a useful resource for those interested in developing parameter inference methods and to benchmark them against state-of-the-art methods. This strategy could be extended and tested on larger, genome size gene networks using whole-cell models

Methods

Scoring the

Distance between simulated and predicted values

For model 1, participants were requested to predict three protein time courses from _{0} = 0 to _{
i
} the time at data point _{
k
}
^{
pred
} and _{
k
}
^{
sim
}(_{3,}
_{5,} and _{8} (see Figure _{
kd
}, a 2-fold increase in rbs3 strength and a 10-fold increase of rbs5 strength. These proteins and perturbed states were chosen so that predictions could not be trivially inferred from purchased data.

Because the initial conditions are given, the real challenging predictions take place after some time has elapsed from _{0}. We considered that time to be 10 intervals of time and thus evaluated the predictions from the 11^{th} time point onwards. Accordingly, the squared distance between predicted and measured protein abundances for the model we used is:

Note that the squared difference terms are normalized with the variance, and the variance follows the noise model that was implemented in the data provided (with σ_{
b
} = 0.1 and σ_{
s
}

Finally, the difference is divided by (3 * (

To statistically evaluate the performance of the teams, a relative null hypothesis was created from this distance, based on the predictions of all the participants. For each protein, we chose at random one of the 12 participant’s predictions for the first time point _{
k
}
^{
pred
}(_{
i
}), then at random one of the 12 predictions for the next time point, and so on. We therefore obtained a value of

Distance between estimated and known parameters

As degradation rates are equal for all proteins, only one degradation parameter has to be determined and thus model 1 has _{
p
} = 45 parameters to be considered for scoring.

Let us denote as _{
i
}
^{
pred
} and _{
i
}
^{
real
} the predicted and actual parameter values used in the simulations, where _{
p
}. The mismatch between estimated and true parameters will be assessed on the log-scale. In this way, a mismatch by a factor of x has the same penalty independent of the parameter’s nominal value and the ratio is also independent of physical unit changes. Therefore the “distance” between predicted and real parameters is calculated as follows:

Similar to the case of the distance between simulated and predicted protein abundances, a relative null hypothesis is created from the distance between estimated and known parameters based on the predictions of all the participants. For each parameter, we chose at random one of the 12 participant’s predictions for the parameter. We therefore obtained a value of _{1}
^{
param
}. That p-value will be denoted as _{1}
^{
param
} (see Additional file

For each team the overall score Score_{1} combining both parameters and protein values is defined as

Scoring the

Distance between the estimated and true network

For model 2 we requested the prediction of 3 missing links of the network as shown in Additional file

For each of the three predicted links i = 1,2,3, we defined a score _{
i
}
^{
link
} that gives a value between 0 and 6 depending on how well the link is captured: a perfect prediction of the link is rewarded with 6 points, while correctly predicting only the starting gene, the end gene, or the sign of the effect, is given a lower score. Specifically, the score is computed as

Where _{
i
} = 6 if one connection has all its elements correctly predicted (that is, the source gene, the sign of the connection, and the destination gene are all correct). For the special case that a link regulates an operon composed of two genes and both connections are correct, reflecting the correct prediction of two connections, a doubled number of points _{
i
} = 12 was awarded. Otherwise, _{
i
} = 0 if some element of the connection is not fully correct. If _{
i
} = 6 or 12 then _{
i
} = 0 and the scoring for that link is complete, with a final score _{
i
}
^{
link
} of 6 or 12, respectively. In case a link is not correctly predicted (_{
i
} = 0), _{
i
} adds to the score a value (less than 6) indicating how good the prediction is. Each gene interaction is positive or negative and composed of a source and a destination gene. Then, _{
i
} is increased by 1 for each correctly predicted gene, and by 2 if the destination gene and the nature of the regulation (i.e. +/−) are correct. Correct (+/−) predictions without the correct associated genes are given no points. Some examples of these scores are provided in the non-exhaustive Additional file

Table used to score the submitted links for network topology challenge A link is defined by a source and a destination gene, and a source gene may or may not have two destination genes. Each row on the table represents a possible link submission. _{i} represents the number of points given for the submitted link, where

Click here for file

The scores for the predictions of the three missing links are added in a global score

A null model is calculated by generating a distribution of scores from a large number of surrogate gene networks obtained by randomly adding 3 links that follow the connection rules indicated in the challenge description. For each participant, a _{2}
^{
netw
} p-value associated with the score under the null hypothesis is calculated (see Additional file _{2} for this challenge is computed as

Dialogue for reverse engineering assessment and methods 6 (DREAM6) &7 parameter estimation consortium

We indicate D6 or D7 if teams participated only in DREAM6 or DREAM7

team ALF D6

Alberto de la Fuente, Andrea Pinna, Nicola Soranzo. CRS4 Bioinformatica c/o Parco Tecnologico della Sardegna, Edificio 3 Loc. Piscina Manna 09010 Pula ITALY

team amis2011

Adel Mezine : 1 Artemis Llamosi : 1 & 3 (current address : Université Paris Diderot, Sorbonne Paris Cité, MSC, UMR 7057 CNRS, 75013, Paris, France) Véronique Letort : 2 Arnaud Fouchet : 1 Michele Sebag : 3 Florence d’Alché-Buc : 1 & 3

1 : IBISC EA 4526, Université d’Evry Val d’Essonne, 23 Bd de France, 91000, Evry, France,

2 : Ecole Centrale Paris, Laboratory of Applied Mathematics and Systems (MAS), F92295 Châtenay Malabry, France,

3 : INRIA Saclay, LRI umr CNRS 8623, Université Paris Sud, Orsay, France.

team BadgerNets D6

Devesh Bhimsaria, Parameswaran Ramanathan, Aseem Ansari, Parmesh Ramanathan

Dept. of Electrical & Computer Engineering Tel: (608) 2630557 University of Wisconsin, Madison Fax: (608) 2621267 Madison, WI 537061691

Team BIOMETRIS D7

Laura Astola, Jaap Molenaar, Maarten de Gee, Hans Stigter, Dijk van Aalt-Ja, Simon van Mourik, Johannes Kruisselbrink

Wageningen University Plant Sciences Subdivision Mathematical and Statistical Methods, PO box 100 6700 AC Wageningen, Netherlands

team BioProcessEngi D6

Julio Banga, Eva Balsa Canto, Alejandro F Villaverde, Oana Chis, y David Henriques.Bioprocess Engineering Group Institute for Marine Research (IIMCSIC), R/Eduardo Cabello, 6. Vigo 36208, Galiza, Spain

team COSBI D6

Paola Lecca

The Microsoft Research – University of Trento Centre for Computational and Systems Biology. Piazza Manifattura 138068 Rovereto, Italy

current affiliation is Centre for Integrative Biology University of Trento Via delle Regole, 101,38123 Mattarello (TN), Italy Email: paola.lecca@unitn.it

team Crux

Clemens Kreutz, Andreas Raue, Bernhard Steiert, Jens Timmer

Freiburg Institute for Advanced Studies (FRIAS), University of Freiburg, Albertstr. 19, 79104 Freiburg, Germany

Institute of Bioinformatics and Systems Biology, Helmholtz Center Munich, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany

Physics Department, University of Freiburg, Hermann Herder Str. 3, 79104 Freiburg, Germany

team ForeC_in_HS D7

Julian Brandl, Thomas Draebing, Priyata Kalra, Ching Chiek Koh, Jameson Poon, Dr. Sven Sahle, Dr. Frank Bergmann, Dr. Kathrin Huebner, Prof. Dr. Ursula Kummer. University of Heidelberg, Seminarstraße 2, 69117 Heidelberg, Germany

team GIANO6 D6

Gianna Toffolo, Federica Eduati and Barbara Di Camillo

University of Padova Department of Information Engineering Via Gradenigo 6B 35131 Padova, ITALY

team ipk_sys D6

Syed Murtuza Baker, Kai Schallau, Hart Poskar, Bjorn Junker, Swetlana Friedel. Data Inspection group and Systems Biology Group, Leibniz Institute of Plant Genetics and Crop Plant Research.

team KroneckerGen D6

David R Hagen

1) Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

2) Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA

3) Department of Computer Science and Electrical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

team 2pac

Cihan Oguz, Tyson Lab,

Departments of Biological Sciences Virginia Polytechnic Institute & State University Blacksburg, VA 24061 USA

team LBM D6

Michael Mekkonen, MIT

Lu Chen, WUSTL School of Medicine

Vipul Periwal, LBM, NIDDK, NIH

team ntu D7

Ching Chang1, Juo Yu Lee1, MeiJu May Chen2, YuYu Lin3 and ChienYu Chen1,2

1 Department of BioIndustrial Mechatronics Engineering, National Taiwan University, Taipei, Taiwan;

2 Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan;

3 Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan

team orangeballs

Po-Ru Loh, George Tucker, Mark Lipson, Bonnie Berger

Department of Mathematics, MIT, Cambridge Massachusetts

team Reinhardt

Christian Lajaunie, Edouard Pauwels, Jean Philippe Vert

Centre for Computational Biology, Mines ParisTech, Fontainebleau, F77300 France Institut Curie, Paris, F75248, France U900, INSERM, Paris, F75248, France

team TBP D7

Orianne Mazemondet, Friedemann Uschner Katja Tummler, Max Floettmann, Sebastian Thieme, Abel Vertesy, Marvin Schultz, Till Scharp, Thomas Spiesser, Marcus Krantz, Ulrike Mänzner, Magdalena Rother, Matthias Reis, Katharina Albers, Wolfgang Giese and Edda Klipp from Theoretical Biophysics Humboldt Universität zu Berlin

team thetasigmabeta

Juliane Liepe, Siobhan MacMahon, Paul Kirk, Sarah Filippi, Christopher Barnes, Thomas Thorne, Michael P.H. Stumpf Centre for Integrative Systems Biology and Bioinformatics, Imperial College London London SW7 2AZ UK

team ZiBIOSS D6

Zhike Zi, BIOSS Centre for Biological Signalling Studies, University of Freiburg, Schaenzlestr. 18 s, 79104, Freiburg, Germany

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

DC, GS, HS, JRS, KHK, PM, TC designed the challenge, DC generated the data, TC EB PM did the scoring of the challenge, PM JRS TC wrote the manuscript. All authors read and approved the final manuscript.

Acknowledgments

We acknowledge the financial aid received from the EU through project “BioPreDyn” (ECFP7-KBBE-2011-5 Grant number 289434). HS, KK and DC acknowledge support from the National Institute of General Medical Science of the National Institutes of Health under award number R01GM081070 NSF support (0827592) in Theoretical Biology (MCB) and NSF support (1158573) EF. Thanks to Michael Menden for useful comments on the manuscript and the analysis. PL and GT acknowledge support from Defense NDSEG graduate fellowships. PL and ML acknowledge support from NSF graduate fellowships. AR, BS, CK are funded by German Federal Ministry of Education and Research [Virtual Liver (Grant No. 0315766) and LungSys II (Grant No. 0316042G)].