Abstract
Background
A biochemical mechanism with mass action kinetics can be represented as a directed bipartite graph (bipartite digraph), and modeled by a system of differential equations. If the differential equations (DE) model can give rise to some instability such as multistability or Turing instability, then the bipartite digraph contains a structure referred to as a critical fragment. In some cases the existence of a critical fragment indicates that the DE model can display oscillations for some parameter values. We have implemented a graphtheoretic method that identifies the critical fragments of the bipartite digraph of a biochemical mechanism.
Results
GraTeLPy lists all critical fragments of the bipartite digraph of a given biochemical mechanism, thus enabling a preliminary analysis on the potential of a biochemical mechanism for some instability based on its topological structure. The correctness of the implementation is supported by multiple examples. The code is implemented in Python, relies on open software, and is available under the GNU General Public License.
Conclusions
GraTeLPy can be used by researchers to test large biochemical mechanisms with mass action kinetics for their capacity for multistability, oscillations and Turing instability.
Keywords:
Biochemical mechanism; Bipartite digraph; Multistability; Turing instability; Oscillations; Parameterfree model discriminationBackground
Biochemical mechanisms are often modeled by differential equations (DE) systems. Instabilities, such as multistability, oscillations, or Turing instability, are ubiquitous in DE models of biochemical mechanisms. Methods from bifurcation analysis are usually applied in order to analyze DE models for instabilities [1]. Bifurcation analysis methods are easily applied when the DE model has one or two concentration species (phase plane analysis) or has a relatively small number of parameters (numerical bifurcation analysis). However, it is both difficult and expensive to apply bifurcation methods to analyze large DE models with many variables for instabilities.
On the other hand a biochemical mechanism can be represented as a directed bipartite graph (bipartite digraph), which is a graph with two different sets of nodes representing species and reactions, and directed edges connecting a species and a reaction node. The existence of structures referred to as critical fragments in the bipartite digraph of a biochemical mechanism is necessary for the existence of multistability or Turing instability in the DE model [24]. Thus biochemical mechanisms that do not have the potential for multistability or Turing instability can be ruled out early in the modeling process. The existence of a critical fragment that does not contain all species nodes can indicate that oscillations exist for some parameter values for the DE model [3]. Thus graphtheoretic methods can be used to determine the potential of various biochemical mechanisms to exhibit some desired behavior, including multistability related to cell decision [5,6], oscillations related to circadian rhythms [7], or Turing instability related to pattern formation [8].
Graphtheoretic methods are applicable to mechanisms with any number of species and reactions, which enables the screening of large biochemical mechanisms for potential instabilities. However, application of graphtheoretic methods by hand becomes challenging for large mechanisms, making a computational implementation highly desirable.
The graphtheoretic method implemented by GraTeLPy identifies all critical fragments in the bipartite digraph of a biochemical mechanism that can give rise to some instability (multistability, oscillations and Turing instability) [3,4]. GraTelPy is implemented in Python and can run in parallel on computer clusters which increases the size of testable biochemical mechanisms.
Other software packages implement theoretical and computational methods for studying chemical reaction networks for multistability. Using the deficiency theory developed by M. Feinberg and collaborators [9], it can be shown that a chemical network model does not admit multistability for any choice of parameter values. The CRNT toolbox [10] developed originally by M. Feinberg implements the Deficiency One algorithm [11], that can be used to detect if a given network has the capacity for multistability [12]. If a given network admits multiple positive equilibria, in many cases the CRNT toolbox returns rate constant values such that the corresponding model system has at least two positive equilibria. In recent years, the CRNT toolbox has been extended to implement an algorithm for the massaction injectivity test. A special case of this test is the Jacobian criterion, which provides a sufficient condition for excluding the existence of multiple positive equilibria and is based on the theory developed in [2,13,14].
Related software packages include BioNetX [15] and CoNtRol [16]. BioNetX is based on the work of M. Banaji and G. Craciun [17,18] and is created by C. Pantea. BioNetX is used to analyze unimolecular and bimolecular reaction networks for the existence of multiple positive equilibria in [19,20]. CoNtRol [16] is a webbased software package that employs matrix and graphtheoretic methods based on the DSR graph [17,18,21]. In particular CoNtRol provides information about the capacity of a given chemical network for multistability based on the DSR graph and on some additional tests. In addition CoNtRol calculates the deficiency of a network and checks if a network is weakly reversible. BioNetX and CoNtRol are available to download for free, they are opensource and are conveniently webbased.
We describe in Section Mathematical background the DE model and the bipartite digraph of a biochemical mechanism, as well as the instability criteria. In Section Implementation we describe the algorithm for finding critical fragments. In Section Results and discussion we present several examples along with some concluding remarks. A guide for downloading and installing GraTeLPy for Mac, Windows and Linux operating systems is available in the Additional file 1.
Additional file 1. GraTeLPy Manual: A Practical Software Guide.
Format: PDF Size: 55KB Download file
This file can be viewed with: Adobe Acrobat Reader
Mathematical background
Here we introduce the differential equations model and the bipartite digraph representation of a biochemical mechanism. In this section we also briefly describe the instability criteria for multistability, oscillations and Turing instability. More details on the instability criteria are available in [3,4].
Mathematical model
A biochemical mechanism withn species A_{i}, i=1,…,n, and m elementary reactions B_{j} can be written as
where k_{j}>0, j=1,…,m are the rate constants. The constants α_{ji}≥0 and β_{ji}≥0 in (1) are small integers called stoichiometric coefficients that account for the number of molecules of species A_{i} participating in the j^{th} elementary reaction. An example of a biochemical mechanism, the reversible substrate inhibition mechanism, is given below:
where the first two reactions represent an inflow and outflow reaction, respectively.
We will assume that every species A_{k} in (1) is consumed and produced in at least one true reaction, i.e, a reaction which is different from an outflow reaction A_{k}→∅ or an inflow reaction ∅→A_{k}. However, we do not specifically require that all species participate in an inflow and an outflow reaction.
We further assume mass action kinetics for the mechanism (1) with rate functions
where u_{k}(t) is the concentration at time t of a species A_{k}, k=1,…,n.
The ordinary differential equations (ODE) model of a massaction biochemical mechanism (1) can be written in vector form as
where u(t)=(u_{1}(t),…,u_{n}(t))^{T} is the concentration vector of the chemical species of (1), S_{ji}=β_{ji}−α_{ji} are the entries of the stoichiometric matrix S and w(u)=(w_{1}(u),…,w_{m}(u))^{T} is the vector of rate functions (3). Throughout the paper it will be assumed that the ODE system (4) has a positive equilibrium.
The model equations of the reversible substrate mechanism (2) are given below
The rank of the stoichiometric matrix S of (5) equals 3 since there is one conservation relationship u_{2}+u_{3}+u_{4}=c_{0}.
Since
where S_{ji} are the stoichiometric matrix entries and w_{j}(u) are the rate functions (3), the Jacobian matrix J(u,w) has entries that can be written as
Note that the concentrations u_{k}, k=1,…,n and the rate functions w_{j}(u), j=1,…,m (both considered evaluated at a positive equilibrium) are used as parameters in (6). The rank of the Jacobian (6) equals the rank of the stoichiometric matrix S[3].
The Jacobian matrix of the model (5) parametrized in (u,w) has rank 3 and is given below
The characteristic polynomial of J(u,w) is
where I is the identity matrix. Note that the coefficients a_{i}=a_{i}(u,w), i=1,…,n of (8) are also functions of (u,w). For example, the last nonzero coefficient of the characteristic polynomial of the Jacobian (7) is
The bipartite digraph of a biochemical mechanism
For the convenience of the reader, in this section we present definitions regarding the bipartite digraph of a biochemical mechanism (1) [3,4,22]. To illustrate the definitions in this section we will continue to use as an example the reversible substrate mechanism (2).
A directed bipartite graph (bipartite digraph) has a node set that consists of two disjoint subsets, V_{1} and V_{2}, and each of its directed edges (arcs) has one end in V_{1} and the other in V_{2}[23].
The bipartite digraphG of a biochemical reaction network (1) is defined as follows. The nodes are separated into two sets, one for the chemical species V_{1}={A_{1},A_{2},…,A_{n}} and one for the elementary reactions V_{2}={B_{1},B_{2},…,B_{m}}. We draw an arc from A_{k} to B_{j} if and only if species A_{k} is a reactant in reaction j, i.e., if the stoichiometric coefficient α_{jk}>0 in (1). Similarly, we draw an arc from B_{j} to A_{i} if and only if A_{i} is a product in reaction j, i.e., if the stoichiometric coefficient β_{ji}>0 in (1). Therefore the set of arcs E(G) consists of arcs such as (A_{k},B_{j}) and (B_{j},A_{i}). Hence the bipartite digraph can be defined as G={V,E(G)} where V=V_{1}∪V_{2} is the set of nodes and E(G) is the set of arcs. If an arc is not weighted explicitly, we assume that its weight equals 1. The corresponding bipartite digraph of the reversible substrate inhibition mechanism (2) is shown in Figure 1.
Figure 1. Bipartite digraph of the reversible substrate inhibition mechanism. Bipartite digraph of the reversible reaction mechanism (2). Circles denote species nodes and squares denote reaction nodes of the mechanism.
The element [ A_{k},B_{j}] is an edge if α_{jk}>0, i.e., if species A_{k} is a reactant in reaction j. The weight of an edgeE=[ A_{k},B_{j}] is defined as
For example, the edge E=[ A_{1},B_{3}] in Figure 1 has weight K_{E}=−1.
If α_{jk}β_{ji}>0, then the arcs (A_{k},B_{j}) and (B_{j},A_{i}) form a positive path [ A_{k},B_{j},A_{i}] that corresponds to the production of A_{i} from A_{k} in a reaction j. The weight of the positive path [ A_{k},B_{j},A_{i}] is defined as α_{jk}β_{ji}. For example, the positive path [ A_{1},B_{3},A_{3}] in Figure 1 has weight 1.
If α_{jk}α_{ji}>0, then the arcs (A_{k},B_{j}) and (A_{i},B_{j}) form a negative path that corresponds to A_{k} and A_{i} interacting as reactants in reaction j. The weight of the negative path is defined as −α_{jk}α_{ji}. Note that the negative paths and are considered to be different since they start at a different species node. For example, both and in Figure 1 are negative paths with weight −1. We note that the direction of the arcs is followed in the positive paths but not in the negative paths.
A cycleC of G is a sequence of distinct paths with the last species node of each path being the same as the first species node of the next path , , …, , . A cycle will be denoted by , where the number of species nodes defines its order. The set of species nodes in a cycle is distinct, but there may be a repetition among the reaction nodes. This is because negative paths containing the same nodes are considered different depending on the starting species node. For example, in Figure 1 is a cycle formed by the two negative paths and .
A cycle is positive if it contains an even number of negative paths and negative if it contains an odd number of negative paths. The sign of a cycle C can also be determined by the cycle weight which is a product of all corresponding weights of negative and positive paths of C
For example, (see Figure 1) is a negative cycle of order 2 with weight K_{C}=−1. The cycle (see Figure 1) is a positive cycle of order 2 with weight K_{C}=1.
A subgraphg={L_{1},L_{2},…,L_{s}} of G consists of edges or cycles L_{i}, i=1,…,s, where each species is the beginning of only one edge, or one path participating in a cycle. In other words, the edges and cycles in a subgraph are species mutually disjoint. The number of species nodes in a subgraph is defined as its order. The subgraph weight is defined using the product of the cycle weights (11) and the edges weights (10) of the cycles and edges in g
where c is the number of cycles in g. For example, the subgraph with weight K_{g}=−1 is shown in Figure 2 (bottom right).
Figure 2. Critical fragment and subgraphs of the reversible substrate inhibition mechanism. Critical fragment and constituent subgraphs of the reversible substrate inhibition mechanism computed by GraTeLPy. (top left) Critical fragment . (top right) Subgraph g_{3}={[A_{1},B_{5}],[A_{2},B_{3}],[A_{3},B_{4}]}. (bottom left) Subgraph . (bottom right) Subgraph .
Since more than one path can exist between species nodes via different reaction nodes in a bipartite digraph, the number of subgraphs through the same node sets may be greater than one. The set of all subgraphs g of order k with the same species nodes and reaction nodes sets is called a fragment of order k and is denoted by . For a fragment we define the number
as the fragment weight. If , then S_{k} is a critical fragment.
For example, the fragment is shown in Figure 2 (top left) together with its three subgraphs , and g_{3}={[ A_{1},B_{5}],[ A_{2},B_{3}],[ A_{3},B_{4}]}. Each of the first two subgraphs g_{1} and g_{2} contains a positive cycle, and thus is a critical fragment since
In [3,22] it is shown that the coefficients of the characteristic polynomial (8) have the following graphtheoretic representation
Note that similar terms in a_{k} have been combined using summation over the subgraphs of a fragment (13) and (14) is in a simplified form. It follows by (14) that the correspondence between a fragment and a nonzero term in a_{k}(u,w) is onetoone. For example, the negative coefficient in a_{3}(u,w) given in (9) corresponds to the critical fragment shown in Figure 2 (top left).
Critical fragments corresponding uniquely to negative terms in (14) are important for the existence of instabilities as it is explained next.
Instability criteria for the Jacobian and the bipartite digraph
Here we summarize classical results from bifurcation analysis [1] and more recent results relating graphtheoretic methods to instabilities [3,4,22].
Multistability often arises from a saddlenode bifurcation in an ordinary differential equations (ODE) model, [1,24]. If a saddlenode bifurcation occurs, then a real eigenvalue λ(u,w) of J(u,w) changes sign as the parameters (u,w) change values. Hence, a necessary condition for multistability arising from a saddlenode bifurcation is a_{n}(u,w)= det(−J(u,w))=0 for some parameter values of (u,w) [1].
Often ODE models of biochemical mechanisms (4) have mass conservation relations reducing the rank r of the stoichiometric matrix S and the Jacobian J(u,w) to r<n, which means that the last nonzero coefficient in (8) is a_{r}(u,w). Thus if a saddlenode bifurcation exists, then a_{r}(u,w)=0 for some values of (u,w) [3]. Therefore a critical fragment of order r, corresponding uniquely to a negative term in (14) for k=r, is required for a saddlenode bifurcation, and thus for multistability [3,22]. Thus the potential of a biochemical mechanism (1) for multistability depends on the structure of its bipartite digraph.
Oscillations in ODE models of biochemical mechanisms (1) often arise from Hopf bifurcation. It is shown in [3], that if a coefficient a_{k}(u,w)≥0, k∈{1,…,n−1} is close to zero, then it is possible to choose parameter values for (u,w) such that oscillations arising from Hopf bifurcation occur.
The existence of a critical fragment of order k∈{1,…,n−1} makes it possible to minimize a_{k}(u,w)≥0, k<n for some parameter values of (u,w) by increasing the magnitude of the corresponding negative term in a_{k}(u,w). If there are mass conservation relations reducing the rank of the Jacobian matrix to r<n, a critical fragment of order k<r is required to detect possible oscillations in an ODE model (4) of a biochemical mechanism (1). Thus, the existence of oscillations in the ODE model of a biochemical mechanism (1) can also be determined by the structure of the bipartite digraph.
Patterns in a corresponding reaction–diffusion model to (4) usually arise as a result of Turing instability. Turing instability arises when a spatially homogeneous equilibrium is asymptotically stable in the absence of diffusion and becomes unstable when diffusion is added to the model [25]. For the existence of Turing instability, we study the matrix J(u,w)−μD, where J(u,w) is the Jacobian matrix (6), D is a diagonal matrix with positive diffusion coefficients d_{i}>0, i=1,…,n on the diagonal and μ>0 is a parameter (μ represents an eigenvalue of the negative Laplacian) [25]. Turing instability is associated with a real eigenvalue of the matrix J(u,w)−μD passing through zero from left to right as parameter values are varied. In [4,22] it is shown that a necessary condition for Turing instability is the existence of a critical fragment of order k<n. Thus, the potential of a biochemical mechanism to display Turing instability can be inferred from the structure of its bipartite digraph.
Implementation
Recall that the existence of a critical fragment in the bipartite digraph of a biochemical mechanism, where r is the rank of the stoichiometric matrix S, can induce multistability. Similarly a critical fragment of order k<n can induce Turing instability or even oscillations. On the other hand if no critical fragments of order r are found, then the existence of multistability can be excluded for any values of the parameters. Similarly if no critical fragments of order k<n are found, then the existence of Turing instability can also be excluded for any values of the parameters.
GraTeLPy enumerates all critical fragments of userdefined order k for a given biochemical mechanism, thus providing the user with information on the potential of a biochemical mechanism for multistability, oscillations or Turing instability.
We present in Figure 3 a flowchart that describes schematically the algorithm implemented by GraTeLPy.
Figure 3. Flowchart that summarizes the steps taken by GraTeLPy to find all critical fragments of a given order. The division of tasks between a single server and one or more clients is highlighted. (top diamonds) The fragment server reads in the userspecified mechanism file and generates the bipartite digraph. The server generates all fragments of an userdefined order k and places them in a queue. (center rectangles) One or more client scripts fetch fragments off the queue and process them independently. For each fragment S_{k}, a client generates all subgraphs and computes the weight of each subgraph. The subgraph weights are then added to compute the weight of the corresponding fragment. The client passes the computed data back to the server and fetches another fragment off the queue if the queue is not yet exhausted. (bottom diamonds) After preparing the fragment queue, the server waits for the results sent by the clients. Upon receipt of clientcomputed results for a fragment, the server stores these results if the fragment is found to have nonzero weight. Once the queue is exhausted, the server informs the user about the number of critical fragments discovered and generates other informative output.
First a biochemical mechanism is read from an userprovided input text file and its bipartite digraph is generated. Then all fragments of an userdefined order k are enumerated and placed in a computational queue. Each fragment from the queue will be further processed in order to compute its weight (top diamond nodes, Figure 3).
For each fragment in the queue, a linear sequence of operations is carried out (central rectangular nodes, Figure 3). First all subgraphs g of a fragment are enumerated, and the weight K_{g} of each subgraph g is computed. Then the weights of all subgraphs g contained in are added to compute the weight of the given fragment. At this point it is decided based on the sign of the weight , if the fragment is critical, i.e., is satisfied.
Once all of the fragments from the queue have been processed, an output based on the potential of the biochemical mechanism for some desired instability is created (bottom diamond nodes, Figure 3). The information in the output includes the number of critical fragments of an userdefined order found by GraTeLPy. Based on the number of critical fragments found, GraTeLPy states if a biochemical mechanism meets the necessary condition for multistability or Turing instability, and if the mechanism can exhibit oscillations for some parameter values. In addition a list of all critical fragments of a given order detected by GraTeLPy can be provided.
Processing the queue of the enumerated fragments (central rectangular nodes, Figure 3) is inherently parallel as each fragment may be handled independently of all other fragments. To use this parallelism to our advantage, two scripts are created for GraTeLPy that implement a server role and a client role, respectively.
The server script takes care of actions in the top and bottom diamond nodes in Figure 3. The server creates the bipartite digraph, enumerates all fragments and places them in a queue (top diamond nodes). At the end the server collects all computed data from the client processes before displaying them for the user (bottom diamond nodes).
The client script deals with actions in the central rectangular nodes in Figure 3. The client fetches a fragment from the queue presented by the server, generates all subgraphs of the fragment, computes the weights of the subgraphs, computes the weight of the fragment, and reports all computed results back to the server. If the fragment queue has not been exhausted, then the client fetches another fragment and repeats these steps.
This serverclient architecture allows the user to run one or multiple instances of the client script to analyze several fragments of a large mechanism in parallel. We discuss the technical details of the parallelization in more detail in Implementation challenges below.
In the following subsections we describe in detail the implementation of both fragment and subgraph enumeration as these parts presented considerable technical challenges during development.
Fragment enumeration
It follows by the definition of a fragment given in Section The bipartite digraph of a biochemical mechanism that fragments are identified by the species and reaction indices of their subgraphs. A fragment of order k contains k unique species indexed by {i_{1},…,i_{k}}, and k possibly repeated reactions indexed by {j_{1},…,j_{k}}.
Suppose that a given biochemical mechanism has N species and R reactions. Using a combinatorial approach, we can generate all fragments of order k by pairing the unique combinations of species with R^{k} combinations of reaction nodes. This approach generates possible fragments that need to be filtered. This is because many of the combinatorially generated fragments do not exist in the bipartite digraph of a given biochemical mechanism.
To save computational time and cost we use a different approach. We note that each fragment contains one subgraph that consists of edges , s=1,…,k. Let us denote by E_{i} (i=1,…,n_{i}) the number of edges that a species A_{i} in a biochemical mechanism induces, or, is the starting node of. If we assume that each species A_{i} is on average the starting node of E=Avg(E_{i}) edges, then this approach generates approximately fragments. Empirically, we observe that E is usually considerably less than some common values for the number of reactions R. Hence this latter approach generates fewer fragments than the former combinatorial approach. In fact, since fragments correspond uniquely to the subgraphs consisting of edges, using this method we generate only the fragments that are present in the bipartite digraph.
By using the method of onetoone correspondence between fragments and subgraphs consisting of edges, we reduce the number of fragments generated by the combinatorial approach by multiple orders of magnitude. A reduction in the number of the generated fragments translates directly to a reduction in computational cost. Hence the latter approach for fragment generation is an important development in the implementation of GraTeLPy that allows for analyzing larger biochemical mechanisms. To highlight this reduction in computational cost we plot the number of fragments (of varying order) generated with both methods for the doublelayer mitogenactivated protein kinase (MAPK) mechanism in Figure 4. The doublelayer MAPK mechanism is discussed in more detail in the last example in Section Results and discussion.
Figure 4. Fragment enumeration for doublelayer MAPK mechanism. Number of fragments of different orders generated for the doublelayer MAPK network (i) combinatorially (gray) and (ii) generated from the unique correspondence between fragments and edgesonly subgraphs (black).
Subgraph enumeration
Given a fragment we generate all edges , positive paths and negative paths , where l,s=1,…,k, that are induced by the species and reactions of the fragment. We will refer collectively to edges, and positive and negative paths of a subgraph as subgraph components.
The subgraph components of a fragment are stored in a lookup table that lists for each species and corresponding reaction all subgraph components induced by the pair . The subgraph components of a fragment are generated as follows:
(i) For a fragment each edge (s=1,…,k) is identified and stored in the lookup table.
(ii) For each edge in the lookup table, arcs starting at , such as (l=1,…,k) are identified. This way all positive paths induced by are generated and added to the lookup table as part of the record for species .
(iii) Similarly to (ii), for each edge in the lookup table, arcs ending at , such as are identified. This way all negative paths induced by are generated and added to the lookup table as part of the record for species .
To gain some intuition on how subgraphs can be generated, we first describe a simple combinatorial approach before we introduce the method implemented by GraTeLPy. Suppose that for a fragment there are subgraph components induced by each species . We can generate combinatorially a subgraph g of by selecting at random one subgraph component per species since each species must be the starting node of exactly one component [3]. Using this approach we can generate combinatorially all possible combinations of subgraph components , that represent all possible subgraph candidates of a fragment . For a fragment with L=Avg(L_{i}) subgraph components per species on average, this method generates L^{k} possible subgraphs.
Even though this method guarantees that each species is the starting node of exactly one subgraph component, there may be combinations of paths that do not form cycles as defined in Sec. Mathematical background. This is because the end species node of a path has to be the starting species node of another path in a cycle [3]. If we use the combinatorial method for generating subgraphs, then all candidate subgraphs that do not satisfy the definition of a subgraph given in Sec. Mathematical background need to be removed which would increase the computational cost.
In the next two subsections we introduce the path graph and the cycle graph that will allow us to generate only the subgraphs that belong to a given fragment. The implementation of the algorithms associated with the path graph and the cycle graph by GraTeLPy will allow us to further reduce the computational cost.
Cycle detection: the path graph
We can avoid generating invalid subgraphs if paths are not joined combinatorially, but rather only paths that form cycles are joined. Recall that a cycle is a sequence of paths where the end species node of each path is the starting species node of exactly one other path in the sequence.
Next, we introduce expanded paths, where a negative path is converted into two expanded paths [ A_{i},B_{m},A_{j}] and [ A_{j},B_{m},A_{i}] that are positive. This expansion is necessary as negative paths can be traversed in both directions as explained in Section The bipartite digraph of a biochemical mechanism. To enumerate all cycles of a given fragment , we construct the directed graph (digraph) Φ. The nodes of Φ correspond uniquely to the expanded negative paths and the positive paths of a fragment . We connect the nodes of the digraph Φ representing paths whose end nodes and starting nodes are the same. For example, there is a directed edge in Φ that starts at a node representing and ends at a node representing . Selfloops in Φ from a node back to itself are also permitted and they correspond to paths of the form .
To summarize, we generate a digraph Φ with the following properties:
● The nodes Φ_{i} of Φ are the expanded negative paths and the positive paths of a given fragment .
● A directed edge (Φ_{i},Φ_{j}) exists if and only if the end species node of the path corresponding to Φ_{i} is the starting species node of the path corresponding to Φ_{j}. Selfloops (Φ_{i},Φ_{i}) are permitted and correspond to positive paths of the form [ A_{i},B_{j},A_{i}].
We refer to the digraph Φ as the path graph. For a fragment with a total of P paths that include all expanded negative paths and all positive paths of , the generation of the path graph Φ has time complexity .
To detect the cycles of the path graph Φ, and ultimately the cycles of a given fragment , an implementation of Johnson’s algorithm [26] provided by NetworkX [27] is used by GraTeLPy. For a fragment with a total number of P expanded negative paths and positive paths (corresponding uniquely to the nodes of Φ), P_{E} sequential relations between these paths (corresponding uniquely to the directed edges of Φ), and P_{C} cyclic sequential relations (corresponding uniquely to the cycles of Φ), the enumeration of all P_{C} cycles requires units of time [26].
Next we illustrate the construction and usage of the path graph Φ described above. The path graph Φ for the critical fragment (see Figure 2) is shown in Figure 5, together with the cycles c_{1} and c_{2} produced by Johnson’s algorithm.
Figure 5. Path graph for the critical fragment of the reversible substrate inhibition mechanism. The path graph Φ and the detected cycles, that do not have repeated species as starting nodes of paths, of the critical fragment shown in Figure 2. Only two cycles c_{1} and c_{2} that are reported previously in [3] are found.
Some of the cycles of Φ enumerated by NetworkX correspond to closed paths with revisited nodes in the bipartite digraph G, and are therefore not cycles of G. This is the case because Johnson’s algorithm finds all cycles of all lengths of the path graph Φ. In our current implementation, we remove cycles of Φ that correspond to closed paths with revisited nodes of the bipartite digraph G. However, further optimization of Johnson’s algorithm is likely possible, so that only cycles that exist in the bipartite digraph G are generated in the first place.
Cycle combinations: the cycle graph
Suppose that P_{C} valid cycles of a given fragment have been found using the algorithm from the previous subsection. Possible candidates for subgraphs of can be constructed by creating all possible combinations of cycles. In total, there are possible ways to combine P_{C} cycles into combinations of k cycles with no repeating cycles. Then, edges may have to be added to the combinations of cycles in order to construct the subgraphs of a fragment .
Generally, not all combinations of cycles or edges form subgraphs since such combinations may not contain every species of a fragment exactly once. Suppose that a given set of cycles has mutually disjoint species sets, but the orders of the cycles sum to less than k. In order to form a subgraph of a fragment we need to amend such a cycle combination with a set of edges whose species nodes are in , but not in any of the cycles. If on average a species A_{i} is the starting node of E edges and if on average we have to add μ edges to a cycle combination, then we generate combinatorially possible subgraphs.
Many of the combinatorially generated cycle and edges combinations will have repeated species nodes, thus rendering such a combination of edges or cycles invalid as a subgraph. Hence we need to verify which of the generated combinations of cycles or edges are subgraphs of . If validating a subgraph requires units of time then, on average, validating all possible candidates for subgraphs has time complexity . In reality, validating a combination of cycles or edges as a subgraph has greater time complexity than . Therefore, the computational cost will be greatly reduced if we can generate only subgraphs that require no further validation steps.
We use a similar approach to the one for finding the cycles of a given fragment. We will reduce the problem of generating cycle or edge combinations forming subgraphs to a problem that can be solved with available algorithms from the literature. To this end we generate an undirected graph Γ whose nodes correspond uniquely to the cycles of a given fragment , that are found using the path graph Φ. Drawing an edge between two nodes (representing cycles of ) means that these two cycles do not share species nodes and can be combined as a part of a subgraph. Next, we formally define the undirected graph Γ
● A node Γ_{i} of Γ represents a cycle of a given fragment .
● An edge (Γ_{i},Γ_{j}) exists if and only if the set of species nodes of the cycle represented by Γ_{i} and the set of species nodes of the cycle represented by Γ_{j} are disjoint.
We refer to the undirected graph Γ defined above as a cycle graph.
If a given set of cycles does not contain a number of species equal to the order of the subgraph constructed, then speciesdisjoint edges need to be added. To this end the problem of generating a subgraph of a given fragment can be reduced to finding all cliques in Γ. Recall that a clique is a set of nodes of an undirected graph such that every node is connected to every other node from the set [23]. To find all subgraphs of a fragment , its corresponding undirected graph Γ should be searched for all cliques. This is a standard problem in graph theory, known as clique enumeration, that can be solved using existing algorithms from the literature [28].
As an example, we construct the cycle graph Γ corresponding to the fragment of the Reversible Inhibition reaction (2) shown in Figure 2 (top left). We use the fact that the cycles c_{1} and c_{2} of the path graph Φ, shown in Figure 5 have paths that share at least one species. Hence, the cycle graph Γ consists of two nodes corresponding to the two valid cycles c_{1} and c_{2} with no edge connecting them. Since the cycle graph Γ constructed from the valid cycles in Figure 5 is completely disconnected, we can choose one cycle at a time and attempt to construct a valid subgraph by adding edges to the cycle. If c_{1} is chosen, then the remaining nodes A_{1} and B_{5} form the edge [ A_{1}, B_{5}] yielding a valid subgraph of order 3, Figure 2 (bottom right). If c_{2} is chosen, then no other nodes remain. Thus the cycle c_{2} forms a valid subgraph of order 3, Figure 2 (bottom left).
When generating subgraphs of a fragment combinatorially, the number of subgraph candidates depends on the number of subgraph components of a given fragment. Using the improved algorithm (based on the path and cycle graphs) implemented by GraTeLPy, the number of generated subgraphs depends on the number of cycles in the path graph.
To compare the computational cost of the two approaches, we count the number of subgraphs generated in both cases for 100 randomly selected fragments of varying order for the doublelayer MAPK network. The results are presented in Figure 6, and show that multiple orders of magnitude fewer subgraphs are generated by the path and cycle graph method in comparison to the combinatorial method.
Figure 6. Subgraph enumeration for doublelayer MAPK mechanism. Number of subgraph candidates generated for 100 randomly selected fragments (of indicated order) of the doublelayer MAPK network. Gray: combinatorial approach, black: using the path and cycle graphs. Circles denote averages, squares denote maxima (maximal number of subgraph candidates generated for any one of 100 randomly selected fragments).
Implementation challenges
As an overarching principle, we have strived to reduce code duplication hence we reuse as many components as possible from open source libraries. To this end, we have used combinatorial standard libraries distributed with the programming language Python [29], NetworkX [27] for all graphrelated operations, and matplotlib [30] for graphical output. We note however that matplotlib is an optional package and is not required for the core functionality of GraTeLPy.
Over the course of implementation of GraTeLPy we encountered combinatorial blowup and memory usage as major challenges. Thus we have designed GraTeLPy to minimize storage of fragments, subgraphs, and intermediate structures in memory. To this end we make considerable use of the standard Python library itertools and the concept of generators that allow us to transfer many results from one method to another with minimal memory footprint.
The cycle enumeration method provided by NetworkX [27] stores all detected cycles, causing memory shortage and overflow due to the large number of generated invalid cycles revisiting species. We have amended this library method to only store and return valid cycles of the bipartite digraph, i.e., those cycles that do not revisit species nodes.
Analyzing large networks is computationally unfeasible when only a single processor is used. Hence, we have parallelized the code. Python’s global interpreter lock causes threaded code to run slowly, so we have used the Multiprocessing module [29], which operates by using subprocesses rather than threads. We have implemented two parallelized versions of the code:
1. Single multiprocessor machine. Here we use the Multiprocessing.Pool system, whereby the Multiprocessing module itself launches the requisite subprocesses. The code allows the user to specify the number of subprocesses that should be run (ideally this should match the number of processors available on the machine).
2. Multiple machines client/server. In this case, we launch a server process that generates the list of fragments to be analyzed. Clients can then be launched on any machine with a network connection to the server. These clients receive fragments from the server and pass back to the server the results of the analysis of the fragments. The server collates the responses.
Because the time spent analyzing a fragment is orders of magnitude higher than the time required to pass a representation of the fragment between a client and the server, the parallelization is extremely efficient. We have tested this client/server implementation with over 500 client processes, and the processing time scales very well.
Results and discussion
GraTeLPy allows the user to enumerate critical fragments of an userdefined order. Thus biochemical mechanisms can be analyzed for their potential for some instability in an efficient way. The existence of multistability requires at least one critical fragment of order r, which is the rank of the stoichiometric matrix. If a critical fragment of order k<n exists, then oscillations may exist for some parameter values. The existence of Turing instability requires at least one critical fragment of order k<n, where n is the number of species.
Several examples of biochemical mechanisms of different sizes are presented in this section. We have used GraTeLPy to find the critical fragments of a given order in the bipartite digraph of each biochemical mechanism. The first three examples are smaller mechanisms and are used to verify the correctness of implementation of GraTeLPy, since their critical fragments have already been found elsewhere. Furthermore, we show that by using GraTeLPy, finding critical fragments in larger biochemical mechanisms such as the MAPK singlelayer and MAPK doublelayer networks becomes feasible. The median running time for finding the critical fragments for each biochemical mechanism is presented.
The models and data discussed in this section are available at https://github.com/gratelpy/gratelpysupplementaryinformation webcite.
Reversible substrate inhibition
The reversible substrate inhibition model is analyzed for multistability in [3] using the graphtheoretic method presented here.
GraTeLPy reads in the biochemical mechanism from a plain text file.
Recall that the bipartite digraph of (15) shown in Figure 1.
The bipartite digraph of (15) contains one critical fragment of order 3 found in [3]. GraTeLPy reproduces this fragment and its constituent subgraphs shown in Figure 2.
The median running time with one processor for finding the critical fragment is 0.05 sec.
Remark
Note that the critical fragment corresponds to the negative term in the last nonzero coefficient (9) of the characteristic polynomial of the Jacobian matrix (7). This suggests how we may choose parameter values for (u,w) so that a saddlenode bifurcation and multistability occur. The inequality w_{4}>w_{1} should be satisfied, otherwise a_{3}(u,w)>0. Also u_{4}≫u_{i}, i=1,2,3 so that a_{3}(u,w) is close to zero. In general if is a critical fragment, then the species concentrations at equilibrium , s=1,…,k should be chosen small and the rate functions , s=1,…,k should be chosen large in order for a saddlenode bifurcation to occur.
As a future extension of GraTeLPy, we plan to make parameter choices for (u,w) such that some desired instability occurs available to the user.
GlycolysisGluconeogenesis switch
Critical fragments of order smaller than n, the number of species in a biochemical mechanism, can induce Turing instability in a reaction–diffusion model [4] as well as oscillations in an ODE model [3].
The biochemical mechanism
represents a glycolysisgluconeogenesis switch and is studied for oscillations in [31]. It has been shown that the critical fragments (identified here by GraTeLPy as well) are the structural reason for the oscillations [31]. Based on the existence of the critical fragments, parameter values are chosen such that oscillations occur [31]. Similarly the mechanism (15) is studied for the existence of Turing instability in [4]. In fact, parameter values are found such that Turing instability exists in the reaction–diffusion model of (15).
The bipartite digraph of (15) is shown in Figure 7. The stoichiometric matrix associated with the biochemical mechanism (15) has rank 5. The biochemical mechanism meets the necessary criterion for Turing instability since critical fragments of order 1≤k≤5 exist [4]. In Figure 8 we show the critical fragments of order 2 and 3 reported in [4] and identified by GraTeLPy.
Figure 7. Bipartite digraph of the GlycolysisGluconeogenesis switch mechanism. Bipartite digraph of the glycolysisgluconeogenesis switch.
The median running time with one processor for finding the critical fragments of the bipartite digraph of the glycolysisgluconeogenesis switch is 5.9 sec.
Cdc42 network in yeast
A biochemical mechanism that describes the Cdc42 dynamics of yeast and cell polarity is studied in [32]. The corresponding reactiondiffusion model displays Turing instability and patten formation for some parameter values [32].
The biochemical mechanism of the Cdc42 network is given below
where RD and RT denote the membranebound inactive and active form of Cdc42 respectively; I denotes cytoplasmic GDI that forms a membranebound complex with RD, RDI_{m}, that detaches from the membrane and diffuses as RDI_{c} in the cytoplasm. The enzyme E is a complex that contains Cdc42activating Cdc24 and exists in both a cytoplasmic and membranebound form, E_{c} and E_{m}, respectively. If E is on the membrane, it can form a catalytic complex M together with RT, that aids activation of membranebound RD.
The bipartite digraph of the Cdc42 network is shown in Figure 9. The Cdc42 network has a corresponding stoichiometric matrix of rank 5. The necessary condition for Turing instability requires that a critical fragment of order 1≤k≤5 exists. GraTeLPy identifies 35 critical fragments – among which we find the two critical fragments reported in [32] and shown in Figure 10.
Figure 9. Bipartite digraph of the yeast Cdc42 mechanism. Bipartite digraph of the yeast Cdc42 network described in [32].
Figure 10. Critical fragments of the yeast Cdc42 mechanism. Critical fragments of order 5 of the yeast Cdc42 network reported in [32] and reproduced by GraTeLPy.
The median running time with one processor for finding the critical fragments of the bipartite digraph of the Cdc42 network is 9.7 sec, and with two processors 6.1 sec.
Singlelayer MAPK network
As the size of biochemical mechanisms increases, enumerating the critical fragments of their corresponding bipartite digraphs by hand becomes tedious and difficult. Using GraTeLPy we can find the critical fragments of a given order of larger mechanisms in a short period of time.
An example of a larger biochemical mechanism that is difficult to analyze by hand is the singlelayer MAPK network
whose bipartite digraph is shown in Figure 11. The MAPK network is a wellknown example of a multistable system [5,33]. The necessary condition for multstability requires the existence of a critical fragment of order equal to the rank of the stoichiometric matrix. Since the rank of the stoichiometric matrix for the MAPK network equals 6, using GraTeLPy, we enumerate all critical fragments of order 6. The 9 critical fragments of order 6 of the MAPK network are shown in Figure 12.
Figure 11. Bipartite digraph of the singlelayer MAPK mechanism.
Figure 12. Critical fragments of the singlelayer MAPK mechanism. Critical fragments of the singlelayer MAPK network found by GraTeLPy.
The median running time with two processors for finding the critical fragments of the bipartite digraph of the singlelayer MAPK network is 10.7 sec.
Doublelayer MAPK network
For large biochemical mechanisms the number of critical fragments of a given order may grow into the dozens or hundreds. Thus the task of enumeration by hand of all critical fragments of a given order becomes unfeasible, but can be accomplished with the help of GraTeLPy.
An example of a larger biochemical mechanism is provided by the doublelayer MAPK network which has 12 species and 18 reactions
Similarly to the singlelayer MAPK network, the doublelayer MAPK network is known to display multistability. The stoichiometric matrix for the doublelayer MAPK network has rank 9. Therefore the necessary condition for multistability requires the existence of at least one critical fragment of order 9. GraTeLPy detects 88 critical fragments of order 9. The list of all critical fragments of order 9 of the bipartite digraph of (17) can be obtained from https://github.com/gratelpy/gratelpysupplementaryinformation webcite.
The median running time of each client for finding the critical fragments of the bipartite digraph of the doublelayer MAPK network is as follows: 141 sec with 100 clients, 270 sec with 50 clients and roughly 4 hours with a single client.
Conclusions
We have implemented a graphtheoretic method that allows for parameterfree model testing of biochemical mechanisms with mass action kinetics for multistability, oscillations and Turing instability. GraTeLPy is opensource and is based on a free software. GraTeLPy enables users to identify the graph structures referred to as critical fragments that can be responsible for the existence of some instability in a differential equations model of a biochemical mechanism (1).
At present, GraTeLPy expects that the user converts a biochemical mechanism to a text format such as the one presented in (15). In a future release we plan to include additional functionality so that biochemical mechanisms provided in SBML [34] and other formats can be analyzed. A list of the critical fragments of a userdefined order is provided upon completion.
We plan a future extension of GraTeLPy where choices of parameter values such that some desired instability occurs will be offered to the user. This extension will be based on the existence of a critical fragment and its onetoone correspondence with a negative term in a coefficient of the characteristic polynomial (See Remark in the Reversible substrate inhibition Example).
We also plan to combine GraTeLPy with a new analytic method, local perturbation analysis (LPA) [35,36], in order to test biochemical mechanisms for pattern formation.
An extension of GraTeLPy to multigraphs [37] that can be used for the analysis of gene regulatory networks [38] is also left as a future extension.
Availability and requirements
GraTeLPy is available from https://github.com/gratelpy/gratelpy webcite and has the following requirements: Python 2.6 or 2.7 and NetworkX 1.6 or above.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
GRW and MH implemented the software. MM provided the mathematical background. GRW, MH and MM wrote the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We are grateful to Verônica Grieneisen, Robert Ietswaart, Richard Morris and Marc Roussel for useful feedback on the manuscript. MH and GRW are grateful to the John Innes Centre for financial support.
References

Kuznetsov Y: Elements of Applied Bifurcation Theory, Volume 112. New York: Springer; 1998.

Craciun G, Feinberg M: Multiple equilibria in complex chemical reaction networks. II.
SIAM J Appl Math 2006, 66(4):13211338. Publisher Full Text

Mincheva M, Roussel MR: Graphtheoretic methods for the analysis of chemical and biochemical networks. I.
J Math Biol 2007, 55:6186. PubMed Abstract  Publisher Full Text

Mincheva M, Roussel MR: A graphtheoretic method for detecting potential Turing bifurcations.
J Chem Phys 2006, 125(20):204102. PubMed Abstract  Publisher Full Text

Markevich NI, Hoek JB, Kholodenko BN: Signaling switches and bistability arising from multisite phosphorylation in protein kinase cascades.
J Cell Biol 2004, 164(3):353359. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ozbudak EM, Thattai M, Lim HN, Shraiman BI, Van Oudenaarden A: Multistability in the lactose utilization network of Escherichia coli.
Nature 2004, 427(6976):737740. PubMed Abstract  Publisher Full Text

Smolen P, Baxter DA, Byrne JH: Modeling circadian oscillations with interlocking positive and negative feedback loops.
J Neurosci 2001, 21(17):66446656. PubMed Abstract  Publisher Full Text

Murray J: Mathematical Biology, Volume 2. New York: Springer; 2002.

Feinberg M: Chemical reaction network structure and the stability of complex isothermal reactors. I.
Chem Eng Sci 1987, 42(10):22292268. Publisher Full Text

Ellison P, Feinberg M, Ji H: Chemical Reaction Network Toolbox. Available online at http://www.crnt.osu.edu/CRNTWin webcite 2011

Feinberg M: Chemical reaction network structure and the stability of complex isothermal reactors–II. Multiple steady states for networks with deficiency one.
Chem Eng Sci 1988, 43:125. Publisher Full Text

Ellison P, Feinberg M: How catalytic mechanisms reveal themselves in multiple steadystate data. I.

Craciun G, Feinberg M: Multiple equilibria in complex chemical reaction networks: I. The injectivity property.
SIAM J Appl Math 2005, 65(5):15261546. Publisher Full Text

Ji H: Uniqueness of equilibria for complex chemical reaction networks.
PhD thesis.
Mathematics, Ohio State University; 2011

Pantea C: BioNetX. Available online at http://bionetx.uwbacter.org/ webcite 2010

Banaji M, Donnell P, Marginean A, Pantea C: CoNtRolChemical reaction network analysis tool. Available online at http://math.wvu.edu/~cpantea/ webcite 2013

Banaji M, Craciun G: Graphtheoretic approaches to injectivity and multiple equilibria in systems of interacting elements.
Comm Math Sci 2009, 7(4):867900. Publisher Full Text

Banaji M, Craciun G: Graphtheoretic criteria for injectivity and unique equilibria in general chemical reaction systems.
Adv Appl Math 2010, 44(2):168184. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pantea C, Craciun G: Computational methods for analyzing bistability in biochemical reaction networks. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium. Paris: IEEE; 2010:549552.

Pantea C, Koeppl H, Craciun G: Global injectivity and multiple equilibria in uniand bimolecular reaction networks.

Banaji M, Pantea C: Some results on injectivity and multistationarity in chemical reaction networks. Available on http://arxiv.org/pdf/1309.6771v2.pdf webcite 2013

Volpert A, Ivanova A: Mathematical models in chemical kinetics. In Mathematical modeling (Russian). Nauka, Moscow; 1987:57102.

Tyson J: Classification of instabilities in chemical reaction systems.
J Chem Phys 1975, 62(3):10101015. Publisher Full Text

Conway ED: Diffusion and PredatorPrey Interaction: Pattern in Closed Systems, Volume 101. Boston: Pitman; 1984.

Johnson D: Finding all the elementary circuits of a directed graph.
SIAM J Comput 1975, 4:7784. Publisher Full Text

Hagberg AA, Schult DA, Swart PJ: Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008). Pasadena: 7th Annual Python in Science Conference; 2008:1115.

Zhang Y, AbuKhzam FN, Baldwin NE, Chesler EJ, Langston MA, Samatova NF: Genomescale computational approaches to memoryintensive applications in systems biology. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference. Washington, DC: IEEE Computer Society; 2005:1212.

Python Software Foundation: Python Language Reference, Version 2.7. Available at http://www.python.org webcite 2010

Goldstein B, Maevsky A: Critical switch of the metabolic fluxes by phosphofructo2kinase: fructose2, 6bisphosphatase. A kinetic model.
FEBS Lett 2002, 532(3):295299. PubMed Abstract  Publisher Full Text

Goryachev A, Pokhilko A: Dynamics of Cdc42 network embodies a turingtype mechanism of yeast cell polarity.
FEBS Lett 2008, 582(10):14371443. PubMed Abstract  Publisher Full Text

Conradi C, Flockerzi D, Raisch J: Multistationarity in the activation of a MAPK: Parametrizing the relevant region in parameter space.
Math Biosci 2008, 211:105131. PubMed Abstract  Publisher Full Text

Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, the SBML Forum: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.
Bioinformatics 2003, 19(4):524531. PubMed Abstract  Publisher Full Text

Holmes WR: An efficient, nonlinear stability analysis for detecting pattern formation in reaction diffusion systems.
Bull Math Biol 2014, 76:157183. PubMed Abstract  Publisher Full Text

Walther GR, Marée AF, EdelsteinKeshet L, Grieneisen VA: Deterministic versus stochastic cell polarisation through wavepinning.
Bull Math Biol 2012, 74(11):25702599. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mincheva M, Craciun G: Multigraph conditions for multistability, oscillations and pattern formation in biochemical reaction networks.

Karlebach G, Shamir R: Modelling and analysis of gene regulatory networks.
Nat Rev Mol Cell Bio 2008, 9(10):770780. Publisher Full Text