Department of Bioinformatics, University of Würzburg, Am Hubland, Würzburg, Germany

Abstract

Background

Boolean networks capture switching behavior of many naturally occurring regulatory networks. For semi-quantitative modeling, interpolation between ON and OFF states is necessary. The high degree polynomial interpolation of Boolean genetic regulatory networks (GRNs) in cellular processes such as apoptosis or proliferation allows for the modeling of a wider range of node interactions than continuous activator-inhibitor models, but suffers from scaling problems for networks which contain nodes with more than ~10 inputs. Many GRNs from literature or new gene expression experiments exceed those limitations and a new approach was developed.

Results

(i) As a part of our new GRN simulation framework Jimena we introduce and setup Boolean-tree-based data structures; (ii) corresponding algorithms greatly expedite the calculation of the polynomial interpolation in almost all cases, thereby expanding the range of networks which can be simulated by this model in reasonable time. (iii) Stable states for discrete models are efficiently counted and identified using binary decision diagrams. As application example, we show how system states can now be sampled efficiently in small up to large scale hormone disease networks (

Conclusions

Jimena simulates currently available GRNs about 10-100 times faster than the previous implementation of the polynomial interpolation model and even greater gains are achieved for large scale-free networks. This speed-up also facilitates a much more thorough sampling of continuous state spaces which may lead to the identification of new stable states. Mutants of large networks can be constructed and analyzed very quickly enabling new insights into network robustness and behavior.

Background

For the simulation of genetic regulatory networks (GRNs) two important paradigms have been used: Discrete models, where each node has a value of either 0 or 1 and Boolean expressions are used to update the values of the nodes in each simulation step using an updating scheme like CRBN (classical random Boolean networks) or ARBN (asynchronous random Boolean networks)

Two commonly used continuous modeling paradigms for GRNs are activator-inhibitor-models such as the exponential standardized qualitative dynamical systems model

These interpolations extend the domain and the codomain of Boolean functions {0, 1}^{
n
} → {0, 1} by defining functions [0, 1]^{
n
} → [0, 1]^{
n
} which mimic the behavior of the original function for intermediate input values in the interval (0,1). For example, an adequate interpolation of the function

Wittmann et al. ^{
n
}.

Interpolations of the Boolean function x OR y

**Interpolations of the Boolean function x OR y.** Different panels show **A)** piecewise linear functions **B)** product-sum fuzzy logic **C)** min-max fuzzy logic **D)** Boole-Cubes **E)** Hill-cubes.

For a Boolean function ^{n} → {0, 1}, the BooleCube interpolation _{1},…,_{n})] is given by

As an example consider the Boolean function ^{2} and is smooth (Figure

Wittmann et al. ^{
n
} / (^{
n
} + ^{
n
}) leading to HillCubes and, with a normalized sigmoid function ^{
n
}/(^{
n
} + ^{
n
}))/(1/(1 + ^{
n
})), to normalized HillCubes (Figure

These high degree polynomial interpolations of Boolean functions are implemented in the Matlab package Odefy ^{
n
}) (where

In extension of such approaches we show how a tree data structure to store the functions of the network leads to a straightforward and efficient way to calculate the polynomial interpolation for almost any example of practical importance, thereby greatly expanding the range of networks that can be simulated and analyzed in reasonable time using this model. Since semi-quantitative models allow for a range of new analysis techniques such as sensitively quantifying the basins of attraction of the stable states or the influence of noise on network behavior, this paves the way for additional insights into network dynamics. We demonstrate this new algorithm as a part of Jimena, a new Java GRN simulation framework which focuses on computational efficiency and a modularized architecture to facilitate the development and testing of new algorithms and models surrounding GRNs.

Implementation

A recursive algorithm to calculate the BooleCube polynomial

To tackle the space and time complexity issues of the polynomial interpolation present in previous implementations, we use simple Boolean trees to represent the Boolean functions of the network. In a Boolean tree (Figure _{1} in Figure

A Boolean tree for the function _{1}(_{1}, _{2}, _{3}) = (_{1}) _{2} AND _{3})

**A Boolean tree for the function **_{1}_{1}**, **_{2}**, **_{3}**) = (**_{1}**) **_{2}** AND **_{3}**).** Input variables _{i} are connected by Boolean operators. The OR node is the root of the tree, i.e. its value determines the value of the function represented by the tree.

Boolean trees can be straightforwardly created in linear time by parsing the Boolean expression which defines a Boolean function, and the function represented by the tree can be interpolated very quickly using a recursive algorithm which we will describe in detail below.

While Odefy, which uses exhaustive value tables stored as multidimensional arrays to represent the functions, needs a space and time in Θ(2^{
N
}) to store a function where

In addition to the speed up of the creation of the GRN, the tree structure also expedites the calculation of BooleCube (and therefore HillCube) interpolations since we can essentially apply the interpolation separately to all logic gates of the function and recursively evaluate the tree from the root node to the leaves. For a more precise description of the algorithm consider a regulatory network with nodes {_{1},…,_{n}}. May the Boolean function _{k}(…) of a node _{k} be given by a Boolean tree consisting of nodes {_{1},…,_{m}}. Note, that as shown in Figure _{i} represent binary or unary Boolean gates or inputs to the function _{k}(…). For each function in the network we get a separate tree and therefore a separate set {_{1},…,_{m}} .

To illustrate the relationship between {_{1},…,_{n}} and {_{1},…,_{m}} consider the network {_{1},_{2}} where _{1}(_{1},_{2}) = _{1} AND _{2} and _{2}(_{1},_{2}) = _{1} OR _{2}. A possible Boolean tree for the function _{1} could then be given by the nodes _{1},_{2},_{3}, where the root node _{1} is an AND node with the leaves _{2} and _{3}, _{2} is an input node representing _{1} and _{3} is an input node representing _{2}.

We call the function given by the subtree whose root is _{i}_{i}, where _{i}(_{j}) = _{j} for some _{j} for all input nodes. If a node _{i} is not an input node to the network we call its binary or unary logic gate ⊗ _{i}. In our example Boolean tree from above we would get _{2}(_{1}) = _{1}, _{3}(_{2}) = _{2}, _{1}(_{1}) = _{2} = _{1} AND _{2} and ⊗ _{1} = AND.

For an arbitrary Boolean function ^{τ} → {0, 1}, _{i}] of a node _{i} ’s function _{i} using the following rules:

If _{i} represents an input node of the tree for which _{i}(_{j}) = _{j} we set _{i}] ≡ _{j}

If _{i} is a unary negating node whose input is a node _{j}, we set

If _{i} is a binary node with two inputs _{j1} and _{j2} whose functions are _{j1} and _{j2} we set

where ξ(_{j1}_{j2} if ⊗ _{i} = ∧ (i.e. the logic gate is an AND) and _{i} = ∧ (i.e. the logic gate is an OR), both of which can be calculated very efficiently. The

If we apply this algorithm to the root node of the network we get the interpolation _{k}] of the function _{k}. An overview of the algorithm written in pseudo code as well as a proof that the result of this algorithm is identical to the high degree polynomial defined in

**In this document file (.doc) we include a proof summary of the BooleCube interpolation algorithm, the topologies for the benchmarks used and the pseudocode for the interpolation algorithm.**

Click here for file

For our example network we get _{1}(_{1}, _{2})] = _{2}(_{1}), _{3}(_{2})) = _{2}(_{1}) ⋅ _{3}(_{2}) = _{1} ⋅ _{2}. As a second example consider the function _{1} (Figure _{OR} we get _{1}] = _{OR}] = _{NOT}] + _{AND}] - _{NOT}]_{AND}] = (1 - _{1}) + _{2}_{3} - (1 - _{1})_{2}_{3}.

Obtaining the stable steady states for discrete models from the Boolean tree

As a side effect, Boolean tree data structures instead of value tables also expedite and simplify the creation of binary decision diagrams (BDDs) equivalent to the Boolean functions of the network (see

BDDs, whose algorithmic potential was first investigated by Bryant et al.

A binary decision diagram for the function (a OR b) AND (b OR c) AND (a OR c)

**A binary decision diagram for the function (a OR b) AND (b OR c) AND (a OR c).** Evaluation starts at the node “**a**”, which does ** not** feature any ingoing connections from other nodes. If the value of the node is 1, the solid line is followed, if it is 0, the dashed line is followed. For the input values a = c = 1 and b = 0 one would go down from the “

A possible application of BDDs is the search for all stable steady states (SSS) in discrete models, i.e. network states which reproduce themselves in each following step of a discrete simulations. In contrast, a temporary state will be left if the system is simulated. The calculated steady states can be enumerated and applied in systems biology (e.g.

If _{i} are the Boolean functions defining a network consisting of the nodes _{i}, a network state _{i}_{n} is a stable steady state _{i,j}_{i}_{n}} is the _{i} . In other words, all Boolean functions must evaluate to the value which their target node already holds. In common BDD frameworks, such as the JavaBDD framework _{OR}(_{OR} constructs the logical OR of two BDDs.

Recursively traversing the Boolean tree of the functions _{i}, the BDDs of these functions can be straightforwardly constructed in the framework by the synthesis method described above, and then combined to a BDD for the expression

In essence, Boolean trees are necessary to speed up the simulation of continuous networks, while BDDs are essentials for the efficient calculation of SSS.

Results and discussion

A jar-library version of Jimena, its sources code, a ready-to-use Eclipse workspace including a commented usage example, further documentation and example networks are available

Speed up of the BooleCube calculation

While it takes a time in Θ(^{
N
}) to compute previous implementations of the polynomial interpolation, the tree algorithm runs in a time in

To benchmark the time needed to simulate a network with a given node degree we used a scalable artificial network topology which features 2.5

Directly comparing the simulation speed of Odefy and Jimena is not trivial since the time needed by Odefy to simulate a network for a given time t does not depend on this parameter, since the simulation accuracy decreases with higher time t simulated.

Jimena, on the other hand, uses a standard fixed-step forth-order Runge–Kutta method to simulate the networks, hence its performance greatly depends on the step size of this solving method. For Figure

Simulation time for a continuous model

**Simulation time for a continuous model.** x-axis: number of involved nodes. y-axis: Time (in seconds) to simulate a standardized network with the given number of nodes. Note that in all figures the number of nodes refers to the number of actual network nodes x_{i} as opposed to the number of nodes in the Boolean tree. To highlight the time complexity of the different calculation methods, the data series are scaled to coincide for a network with 4 nodes. Actual simulation times for 4 nodes: Jimena (red) = 0.019 s, Odefy (blue) = 0.040 s, Squad (green) = 0.046 s. The Additional file

Since we chose test networks for which analogous activator-inhibitor-networks could be constructed, we were also able to benchmark the simulation of the equivalent networks using the octave code obtained from the SQUAD Export-to-Octave function. As one would expect from the design of the differential equations, the integration of the ODEs from the SQUAD model exhibits a linear time complexity with respect to the maximum degree of the network nodes.

While this example shows Jimenas performance for high node degrees, it does not cover networks with large numbers of nodes. We therefore compared the runtime behavior of BooleCube interpolations in Odefy and Jimena in small to large size networks created by the random Erdős–Rényi paradigm, where a connection between nodes are set with equal probability, and by the random scale-free paradigm, where the node degree distributions follows a power law, i.e. the number of network nodes with ^{-λ} where λ is a constant usually between 2 and 3. It has been established that a large majority of naturally occurring networks are scale-free (see

The run times (creation and simulation) are plotted in Figure

BooleCube network performance of Jimena and Odefy

**BooleCube network performance of Jimena and Odefy.** Random Erdős–Rényi and scale-free networks with a given number of nodes **n** and 3·n interactions (

Since SQUAD, BooleanNet and other simulation frameworks cannot simulate BooleCube networks, they are not included in this comparison. With the limitation to networks consisting only of simple activating or inhibiting influences, SQUADs runtime behavior is similar to that of Jimena (cf. Figure

Speed of the SSS calculation

Since the number of SSS in discrete models can be in Θ(2^{
E
}) this is also the minimum time complexity of a search algorithm. To benchmark our implementation we used the same scalable test topology as before which features 2^{(n-2)/2} SSS for

Stable state calculation time for a discrete model

**Stable state calculation time for a discrete model.** x-axis: number of involved nodes. y-axis: Time (in milliseconds) needed to determine the stable states of a standardized network with the given number of nodes. The Additional file

For medium sized random scale-free networks (100 nodes, 200 interactions, 100 unique networks) we obtained a mean run time of 3890 ms (median 1401 ms). Further experimentation showed that the calculation of the stable steady states using JavaBDD as a BDD framework is usually possible for random networks until about 150-200 network nodes and 500 interactions on standard hardware, with the limit being the main memory available in the computational environment.

Since larger networks for which Jimena takes a measurable time to calculate the SSS cannot be loaded in Odefy, we could not directly compare the two frameworks in this respect.

This time complexity makes the search feasible even for larger and highly interconnected networks which could not even be loaded using a multidimensional array implementation.

Multithreading

To determine as many stable steady states of a network as possible for continuous models such as the Odefy and SQUAD models, it is necessary to exhaustively sample a large state space. This task can be greatly expedited by distributing the sampling to multiple CPU cores as done automatically by the search algorithms implemented in Jimena.

Since Jimena’s tree-based networks are very lightweight compared to multidimensional array implementations, they can be copied quickly and many of them can be held in memory at the same time. This not only allows for an excellent scaling behavior on commonly used multi-core systems, resulting for example in an almost 8 times higher sampling rate on an 8 core system, but also facilitates the efficient comparison of variants of a given network to analyze its stability with regard to certain manipulations such as null mutations

Applied example: Arabidopsis thaliana development

The first example takes the plant

We simulated the network for 10 simulation-time seconds using the normalized HillCube (NHC) model on a standard 2.67 GHz CPU. For the step size of Jimena’s ODE solver we tested 0.01 s and 0.1 s. Even with a step size of 0.1 s, the absolute error of the simulation is in the order of 10^{-5} when simulating 10 seconds and much lower when searching for a stable state, which should already be more than enough for practical applications. To benchmark the original SQUAD ODEs we used the simplified activator-inhibitor version of the network shown in Figure ^{6} random initial states we found that although both models are based on the same Boolean functions, interestingly the inflorescence states INF1, INF2, INF3 and INF4 (inflorescence attractors 1-4), whose biological validity has been confirmed by gene expression experiments ^{6} initial states where the values of the nodes have been chosen randomly from the interval [0,1] only 0.06% of the simulations converge on a state corresponding to a non-flowering phenotype.

**Package**

**Calculation**

**Time (ms)**

The

Odefy

Network loading

467(±2)

Odefy

Network simulation

583(±4)

SQUAD

Network simulation

467(±0.3)

Jimena

Network loading

3.7(±0.2)

Jimena

Network simulation (time step: 0.1 s)

3.1(±0.4)

Jimena

Network simulation (time step: 0.01 s)

28.2(±0.7)

**Attractor**

**NHC model (%)**

**Discrete model (%)**

**Continuous model (%)**

An NHC model based on the Boolean functions of the corrigendum to ^{6} random initial vectors. The parameters of the Hill function were n = 2 and k = 0.5 and the decay parameter was τ = 1 for all nodes. The values for the corresponding discrete and continuous model from the original article are cited from there

INF1

0.005

1.66

4.74

INF2

0.016

1.66

4.77

INF3

0.010

0.88

4.01

INF4

0.032

0.88

4.06

SEP

0.144

9.91

11.01

PET1

0.477

10.05

12.74

PET2

0.024

0.14

1.89

STM1

74.556

37.4

28.46

STM2

7.920

1.15

6.54

CAR

16.816

36.25

21.79

Using active EMF1 (embryonic flower 1) and TFL1 (terminal flower 1) nodes (i.e. EMF1 > 0.5 AND TFL1 > 0.5) as an indicator of an inflorescence state, we then determined the basins of attraction of the same model assuming null mutations for all 42 interactions (arrows) of the network by simulating from 10^{4} random start vectors per mutant. The combined basin of attraction size of each mutant stayed below 0.5%, except for a removal of the influence of AP1 (APETALA1) on TFL1 whose mutation directly causes our condition for inflorescence state to fail, leading to a combined basin size of ~3.5%.

These results corroborate the hypothesis that the inflorescence attractors are transitory in nature, such that small perturbations lead to progress in plant development and cell differentiation arriving at few and robust standard outcomes of floral organs. Furthermore, the low size of the inflorescence basins of attraction of the mutant networks is consistent with a reported strong robustness of

Applied example II: Arabidopsis thaliana immunity and pathogen Pst DC3000

A second example considers a different area, the immune response of the

As we expected, the network exhibits a strong robustness against null mutations, with only 2 mutations changing the number of stable states (from 2 to 1). These are the null mutation of the influence of SA (salicylic acid) on ROS (reactive oxygen species) and of ROS on SA where SA is a key hub node of the network and the small cycle SA → RO → SA is crucial for its number of stable states. For all other mutations (n = 154) the changes of the stable states are minor, with only one mutation effecting more than four changed nodes per stable state, namely the removal of ETR/CTR1 (ethylene response / cytosolic serine/threonine kinase constitutive triple response 1) → AHP (Histidine-containing phosphotransmitters) which causes five nodes to change, and most single mutations (n = 142) leading to no change at all.

To check whether the number of stable states increases assuming multiple mutations we then determined the stable states for up to 4 null mutations (n ≈ 2.4⋅10^{7}) in the discrete network model and found that the number of stable states never exceeds 2. Using a single 2.67 GHz CPU core Jimena constructs and analyzes 2,700 mutants per second in this network, and more than 24,000 mutants per second in the

Conclusion

Within the last years the size and complexity of discovered genetic regulatory networks has increased substantially, partly due to automated network creation techniques using time series data from methods such as real-time RT-PCR or RNAseq.

Motivated by current limitations of Odefy (version 1.18, year 2013), the use of tree data structures and corresponding algorithms in Jimena paves the way for the simulation and analysis of more sophisticated networks than possible previously, including those beyond the scope of simple activating and inhibiting influences covered by SQUAD. This may provide additional insight especially with regard to the role of nodes that are influenced by many other nodes, which seem to greatly influence the behavior of many GRNs.

For an overview of all currently published features of Jimena see Figure

Feature overview of the Jimena simulation framework

**Feature overview of the Jimena simulation framework.** Included are all features as of version 160913.

Availability and requirements

The software, its source code, example data and a tutorial are available from

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SK developed, formally verified, implemented and benchmarked the algorithms, contributed ideas and algorithms to the applied examples and wrote the first draft of the manuscript. TD conceptualized and analyzed the applied examples and the biological insights, reviewed and revised the manuscript and led the project. Both authors have read the manuscript and approved the final version.

Acknowledgements

Support by DFG (Da 208/12-1) is gratefully acknowledged.