(Bio)Process Engineering Group, IIM-CSIC, Spanish National Research Council, C/Eduardo Cabello, 6, 36208 Vigo, Spain

Process Control Research Group, MTA SZTAKI, Kende u. 13-17, H-1111 Budapest, Hungary

Abstract

Background

The inference of biological networks from high-throughput data has received huge attention during the last decade and can be considered an important problem class in systems biology. However, it has been recognized that reliable network inference remains an unsolved problem. Most authors have identified lack of data and deficiencies in the inference algorithms as the main reasons for this situation.

Results

We claim that another major difficulty for solving these inference problems is the frequent lack of uniqueness of many of these networks, especially when prior assumptions have not been taken properly into account. Our contributions aid the distinguishability analysis of chemical reaction network (CRN) models with mass action dynamics. The novel methods are based on linear programming (LP), therefore they allow the efficient analysis of CRNs containing several hundred complexes and reactions. Using these new tools and also previously published ones to obtain the network structure of biological systems from the literature, we find that, often, a unique topology cannot be determined, even if the structure of the corresponding mathematical model is assumed to be known and all dynamical variables are measurable. In other words, certain mechanisms may remain undetected (or they are falsely detected) while the inferred model is fully consistent with the measured data. It is also shown that sparsity enforcing approaches for determining 'true' reaction structures are generally not enough without additional prior information.

Conclusions

The inference of biological networks can be an extremely challenging problem even in the utopian case of perfect experimental information. Unfortunately, the practical situation is often more complex than that, since the measurements are typically incomplete, noisy and sometimes dynamically not rich enough, introducing further obstacles to the structure/parameter estimation process. In this paper, we show how the structural uniqueness and identifiability of the models can be guaranteed by carefully adding extra constraints, and that these important properties can be checked through appropriate computation methods.

Background

During the last decade, the wide availability of high-throughput biological data has made it possible to produce new knowledge via a systems biology approach

In this context, it is particularly worth mentioning the DREAM initiative (Dialogue for Reverse Engineering Assessments and Methods)

The use of a performance profiling framework with the DREAM3 benchmark problems revealed that current inference methods are affected by different types of systematic prediction errors

Identifiability analysis studies whether there is a theoretical chance of uniquely determining the parameters of a mathematical model assuming perfect noise-free measurements and error-free modeling

The importance of identifiability has been recognized previously in systems biology, too

It has been known for long that chemical reaction networks with different structure and/or parametrization may produce the same dynamical models describing the time-evolution of species concentrations

As a novelty, we present in this paper the definition and a computational method to find the so-called core reactions that are present in any dynamically equivalent reaction network if the set of complexes is given a priori. Moreover, a computationally improved method is introduced for the computation of dense realizations of CRNs together with a modified algorithm to check the uniqueness of a constrained reaction network structure. Structural non-uniqueness and the use of the proposed computational methods will be illustrated with the help of biological models known from the literature.

The structure of the paper is the following. The 'Methods' section introduces the notions of chemical reaction networks, structural identifiability and distinguishability of dynamical models. Moreover, it contains the procedures to obtain core reactions of a network and its sparse and dense representations, which rely on standard methods of linear programming (LP) and mixed integer linear programming (MILP)

Methods

The model class considered in this paper is of the following form

where ^{n }
^{m }
^{k }
^{d }

Basic notions and known results related to mass-action models

In this subsection, the basic definitions for the description of CRNs will be given together with the already published results on finding dynamically equivalent network realizations with certain prescribed properties.

Structural and dynamical description of mass-action networks

Following

1.

2.

where _{ij }

3. _{i }
_{j }
_{i }
_{j}
_{ij }
_{i }
_{j}
_{i }
_{j }
_{ij }

The above characterization naturally gives rise to the following graph structure (often called 'Feinberg-Horn-Jackson graph' or simply reaction graph) of a CRN _{1}, _{2}, ... _{m}
_{i}
_{j}
_{i }
_{j }
_{ij }
_{i }
_{j }
_{1}, ..., _{k}

Assuming mass-action kinetics, the following dynamical description will be used to describe the time-evolution of species concentrations

where _{i }
_{i}
^{T }
_{i,j}
_{j}
_{i,j }
_{ji}
_{k }

Finally, ^{n }
^{m }

Dynamical equivalence of mass-action networks

As it is known even from the early literature

where for

In this case,

We will assume throughout the paper that the set of complexes (i.e. the stoichiometric matrix

where

Among the dynamically equivalent realizations, it is important to recall the following characteristic ones described in

Known computation approaches for finding preferred CRN realizations

Here we briefly summarize the already published results corresponding to the computation of preferred dynamically equivalent CRN realizations (more details of these methods can be found in the publications _{k}

Constrained realizations of CRNs and testing their structural uniqueness

The following is a straightforward extension of the results published in

where _{k }
_{k }
_{k}
_{k}

**P1 **Given a CRN (_{k}

**P2 **If the sets of complexes and constraints are fixed, then for any CRN, the structure of the constrained dense realization is unique.

**P3 **The reaction graph structure of a CRN with given sets of complexes and constraints is unique if and only if the unweighted directed graphs of its constrained dense and sparse realizations are identical.

The proofs of **P1**, **P2 **and **P3 **follow similar (although not completely identical) lines that were published in

New concepts and computation results related to dynamically equivalent networks

This subsection contains new methodological contributions that extend the previously published results.

Making the computation of dense realizations more efficient

Computing dense realizations is treated originally also in a MILP-framework in **P1**, it is easy to give a polynomial-time algorithm based on a finite series of linear programming (LP) optimization steps. The idea of the improved algorithm is simple: the reaction _{i }→ C_{j }
_{k}
_{j,i }>

The task of determining which reactions of a CRN belong to the dense realization can be effectively solved through the following problem set consisting of

where the decision variables are the off-diagonal entries of _{k}
_{ij }
_{k}
_{i,j }
_{q }
_{p }
_{pq }

By construction, _{ij }
_{ij }> _{k }

It is important to remark that the definition of _{ij }

Using the notion and described properties of constrained realizations, we are now able to test the structural uniqueness of given CRNs. To accomplish this, only the (constrained) dense and sparse realizations have to be computed and compared (see **P3**). This method will be illustrated in Example 2.

Definition and computation of core and non-core reactions

We will call a reaction a _{p }
_{q }

where the matrix _{k }
_{k }
_{p }
_{q }
_{p }
_{q }

Basic concepts on structural identifiability and distinguishability

Let us recall eq. (1). Shortly speaking, global structural identifiability means that

where

and

Let us denote two parameterized models with possibly different structure by _{i }
_{1 }(possibly except for a finite number of values) there is no _{2 }such that the input-output behaviour of

In the case of CRNs, we will assume that all species concentrations are measured (i.e. _{k}

Results and discussion

In this section, the application of the previously mentioned methods for finding different dynamically equivalent structures will be illustrated using biological models taken from the literature. The detailed numerical data corresponding to Examples 1-3 are contained in a standard spreadsheet form with brief explanations in Additional file

**Detailed numerical data of the CRNs shown in Examples 1-3**. This file contains the detailed data (i.e. stoichiometric matrices and reaction rate coefficients) of the dynamically equivalent reaction networks studied in Examples 1,2 and 3. The individual sheets correspond to the different examples.

Click here for file

Example 1: a positive feedback motif

The first example is a positive feedback motif shown in Figure _{1 }and _{2 }denote the concentrations of protein monomers and dimers, respectively. _{3 }and _{4 }are the concentrations of unoccupied and occupied promoters, respectively, and _{5 }corresponds to the mRNA. The degradation of dimers is ignored. The roles of the reaction rate coefficients are the following: _{1 }and _{2 }are the dimerization and re-dimerization rates, respectively. _{3 }and _{4 }are the binding and dissociation rates of the dimer to the promoter, while _{5 }and _{6 }denote the activated and basal transcription rates, respectively. _{7 }is the degradation rate of the mRNA, _{8 }is the degradation rate of the monomer, and _{9 }denotes the translation rate. The time-evolution of the species-concentrations is described by the following ODEs:

Positive feedback motif: original reaction graph and dense realization structure

**Positive feedback motif: original reaction graph and dense realization structure**. (a) This subfigure shows the reaction graph of a gene regulation network model with positive feedback described originally in

Our starting point is that we have a dynamic model of the process in the standard polynomial form of (20)-(24), the parameters of which are known from the results of identification and/or from literature. As we will see below, without well-defined constraints on the possible set of complexes and reactions, exactly the same dynamics can be realized in principle by a wide range of mechanisms.

The matrices characterizing the stoichiometry and graph structure of the system are the following (indicating only the nonzero non-diagonal elements of _{k}

We used the following parameter values that were taken from the Appendix of

where the units of measure are [M^{-1}] for _{1}, ..., _{4}, and [min^{-1}] for _{5}, ..., _{9}. The dynamically equivalent dense realization of the network is shown in Figure

Sparse realization structures for the positive feedback motif

**Sparse realization structures for the positive feedback motif**. Three different dynamically equivalent structures can be given for the positive feedback motif with the minimal number of reactions. The core and non-core reactions are indicated in the same way as in Figure 1.b.

As it is expected, the possible structures of sparse/dense realizations and the corresponding core and non-core reactions can change with the modification of parameter values. This is illustrated in Figure

The effect of modifying the complex set and the parameters

**The effect of modifying the complex set and the parameters**. (a) The core and non-core reactions of the dense realization of the positive feedback motif are shown in this subfigure with a randomly selected parametrization that is different from the one given in _{2 }+ _{4 }is involved into the model.

It is visible that the structure of the dense realization is the same as in Figure

In the next step, let us assume that another complex, namely _{2 }+ _{4 }is allowed in the model (again not necessarily assuming biological meaningfulness in this particular case). With the addition of this new complex, the stoichiometric matrix of the system can be written as

The dense CRN realization of the dynamics (20)-(24) with the updated

The above results clearly show that certain mechanisms may remain undetectable (or they are falsely detected) even if we have complete species concentration measurements and full information about possible complex formation, that are not very realistic assumptions. Moreover, the sparsest dynamically equivalent structure of mass-action models is not unique, therefore sparsity enforcing approaches for determining 'true' reaction structures are not enough in themselves without the necessary amount of prior information given in the form of additional constraints. The practical situation is most often even worse than that, since the measurements are typically incomplete, noisy and sometimes dynamically not rich enough, that may introduce further obstacles to the structure/parameter estimation process

Example 2: a biochemical switch in yeast cells

The following example is taken from

_{1}: [Sic1], _{2}: [Sic1P], _{3}: [Clb], _{4}: [Clb·Sic1], _{5}: [Clb·Sic1P], _{6}: [Cdc14], _{7}: [Sic1P·Cdc14], _{8}: [Clb·Sic1P·Cdc14], _{9}: [Clb·Sic1·Clb]. The original structure with 18 reactions is shown in Figure

Model of a biochemical switch in yeast cells

**Model of a biochemical switch in yeast cells**. (a) The subfigure shows the original structure of a CRN describing a biochemical switch published in

The non-zero off-diagonal elements of _{k }

Since there are no parameter values published in

The structure of the dense realization indicating the 12 core and 16 non-core reactions can be seen in Figure

It can be shown using the computational methods described in the 'Methods' section that the only possible sparse realization structure is identical to that of the original network. Therefore in this special case, there is only one possible reaction structure containing the minimal number of reactions. A straightforward approach to ensure the structural uniqueness of the whole network is to exclude all reactions that are not meaningful from the examined application's point of view or that are contradictory to modeling assumptions. For the current example, the removal of an unexpectedly low number of reactions is enough to obtain a unique structure. It can be shown by computing the corresponding constrained dense and sparse realizations, that excluding the reactions _{5 }→ _{3 }+ _{5}, _{4 }→ _{3 }+ _{4}, _{2 }+ _{3 }→ _{3 }+ _{5}, and _{3 }+ _{1 }→ _{3 }+ _{4 }is enough to make the reaction structure unique that is identical to the original structure shown in Figure

Example 3: a repressilator structure with 5 nodes and auto-activation

Consider the repressilator model shown in Figure

A standard repressilator structure

**A standard repressilator structure**. A repressilator structure with 5 nodes and auto-activation is shown in the figure. The mass-action type CRN model of this structure contains 51 distinct complexes and 55 reactions.

for the index pairs (_{i }
_{i }
_{i,k }
_{i,k }

Two cases with different sets of randomly selected rate coefficients were studied, and the structures of the obtained results were the same. The numerical details can be found on the 3rd sheet of Additional File

The number of core reactions in the model are 45. The set of non-core reactions (that, in principle can be substituted by other reactions) is given by

In particular, it is easy to show (see also Additional File

Example 4: sparse linear gene regulation network models

For structural identification, gene regulation networks are often modeled as linear time-invariant systems

where ^{n×n }
_{i,j }> _{i,j }< ^{n }

First, consider the 'true' genetic network structure that was simulated and inferred in

A sparse gene regulation network and their structural identifiability properties

**A sparse gene regulation network and their structural identifiability properties**. (a) This subfigure is the reproduction of one of the sparse gene regulation networks used for structural identification in

where '+', '-' and '*' represent positive, negative and nonzero (but otherwise undefined) parameter values, respectively. If there are no prior assumptions about the structure of the interconnection matrix or about the relations between certain parameters, we can easily test the structural non-identifiability of the model by checking whether all nodes are reachable from the perturbed node on a directed path in the interconnection graph or not

Relation between high level networks and CRN structure

As shown in Example 3, the various possible dynamically equivalent CRN structures do not correspond to a different GRN structure, if all species concentration measurements are available and the mapping described in

Conclusions

It has been shown in this paper using illustrative examples that biological network structures modeled by CRNs often cannot be uniquely determined even if the structure of the corresponding mathematical model is assumed to be known and all dynamical variables are measurable. The structural uniqueness and identifiability of the models often require additional constraints.

The main new contributions of the paper are the following. Firstly, core reactions present in any dynamically equivalent CRN realizations with a given complex set have been defined and a simple procedure with polynomial time-complexity has been given for determining them. Clearly, the core reactions are mandatory elements of every dynamically equivalent CRN realization assuming a fixed complex set. Secondly, a polynomial-time method based on linear programming for computing dense realizations has been outlined that is more scalable and therefore presents a clear improvement over the previously used MILP-based method. As an additional minor extension of previous results, constrained realizations of CRNs have been defined, and a computational method has been proposed to check the uniqueness of constrained realizations.

The presented concepts and algorithms were illustrated on previously published models describing biological processes. It was shown that the set of core reactions may change with the modification of the complex set. The examples also show that the frequently applied sparsity assumption alone is not enough for structural uniqueness of CRNs. Moreover, in the case of simple linear genetic network models, too sparse structures can degrade identifiability properties. The results further support the fact that as much prior information as possible should be incorporated in structural and parametric inference problems.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

All authors contributed to the conception and design of the work. JRB and GS selected and evaluated the examples. GS performed the numerical computations. All authors contributed to the writing of the manuscript. All authors read and approved the final manuscript.

A Appendix

**P1**. Let us denote the _{·,i
}. The proof is based on the following well known fact of linear algebra. Consider an inhomogeneous set of linear equations:

If

The matrix equation _{k }

Let us choose any _{k}
_{·,i
}, _{·,i
}. Let us assume that there are _{k }

The equation sets (46) and (47) can be written into a single set of equations as

where _{j }
_{j }
^{m }
_{c }

**P2**. This is a straightforward consequence of **P1**, since the unweighted directed graphs of all constrained dense realizations must be identical.

**P3**. If the graph structure of the constrained realization is unique, then it trivially implies that the structures of the constrained dense and sparse realizations are identical, since there exists only one possible constrained reaction structure. If the structures of the constrained dense and sparse realizations are identical, then the number of nonzero reaction rates is the same in any constrained realizations including the constrained dense ones. Then it follows from **P1 **that the constrained reaction structure is unique.

Acknowledgements

This work was financially supported by project CAFE (Computer Aided Food Process Engineering) FP7-KBBE-2007-1 (Grant no: 212754). The authors acknowledge the support by the Spanish government, MICINN project 'MultiSysBio' (ref. DPI2008-06880-C03-02), and by CSIC intramural project 'BioREDES' (ref. PIE-201170E018). GS acknowledges the partial support of the Hungarian Scientific Research Fund through grant no. K83440. The authors thank Dr. Irene Otero Muras (Dept. of Computational Systems Biology, ETH Zurich) and Prof. Zsolt Tuza (MTA SZTAKI) for the fruitful discussions. The authors wish to thank the anonymous reviewers for their helpful comments.