School of Computer and Communication Sciences EPFL, 1015 Lausanne, Switzerland

Abstract

Background

A central goal of Systems Biology is to model and analyze biological signaling pathways that interact with one another to form complex networks. Here we introduce

Results

We consider networks in which elements range over a small finite domain allowing more flexibility than Boolean values, and add target functions that allow to model a rich set of behaviors. We propose a symbolic algorithm for analyzing the steady state of these networks, allowing us to scale up to a system consisting of 144 elements and state spaces of approximately 10^{86 }states. We illustrate the usefulness of this approach through a model of the interaction between the Notch and the Wnt signaling pathways in mammalian skin, and its extensive analysis.

Conclusion

We introduce an approach for constructing computational models of biological systems that extends the framework of Boolean networks and uses formal verification methods for the analysis of the model. This approach can scale to multicellular models of complex pathways, and is therefore a useful tool for the analysis of complex biological systems. The hypotheses formulated during in-silico testing suggest new avenues to explore experimentally. Hence, this approach has the potential to efficiently complement experimental studies in biology.

Background

The emerging field of Systems Biology aims to gain high-level understanding of complex biological systems by studying the structure and dynamics of cellular functions

Executable models of biological systems can be analyzed using formal verification methods, which were originally designed for the validation of computer systems. Two possible approaches are

In order to construct executable models of biological pathways, it is necessary to precisely define the interactions between biochemical components. One of the widely used formalism for representing interactions in biology are Boolean networks

In this work, we propose an extension of Boolean networks called

Our approach to the modeling of signaling pathways allows the detection of gaps in the mechanistic understanding of the studied process, and suggests new biochemical interactions at a similar level of abstraction to the one observed in diagrammatic models. We illustrate this approach by a multicellular model of the interaction between the Notch and the Wnt pathways in mammalian skin cells.

Results

Qualitative networks

In this study we propose an approach for constructing executable models of signaling pathways which is based on Qualitative networks, an extension of Boolean networks. In this section, we first explain how biological pathways can be modeled using Boolean networks and then define Qualitative networks. Interactions in biological pathways can be represented as an edge-weighted interaction graph _{i }∈ _{ij }∈ _{i }to node _{j }has a weight _{ij}, corresponding to the effect of the component represented by _{i }on the component represented by _{j}. Activation is specified by a positive value of _{ij}, inhibition by a negative value. This graph therefore represents the type of interaction between components as well as their strength in a similar way to the diagrammatic models used by biologists. Li

A Boolean network _{i }∈ _{i }= 1 if the component is active and c_{i }= 0 if the component is absent or inactive. We use _{1}, ..., _{k}) ∈ {0, 1}^{k }of the states of all components of the Boolean network. A boolean function _{i }∈ _{i }: {0, 1}^{k }→ {0,1}). Time in Boolean networks is represented by a discrete variable _{i }(

_{i }(_{i }(

Given an interaction graph

We extend the Boolean network framework by using discrete domains to represent the state of a component and by allowing a rich set of interaction between components. The state of an active component can take several values, rather than being simply

A Qualitative network _{i }∈ _{i}, there exists a target function _{i }∈ _{i }: {0, ..., ^{k }→ {0, ...,

When

A particular kind of target function is directly obtained from the qualitative graph describing the studied biological system. It is used to model activation and inhibition in a similar way to the one describe for Boolean networks. In Boolean networks, the sum of all contributions was sufficient to decide on the next value of a component. Here we need to obtain a target function that ranges over {0, ..., _{i}, we compute separately the amount of activation _{i }and the amount of inhibition _{i }sensed by this component. Both of these values are scaled so that they are proportional to the weighted amount of activation, respectively inhibition. When all activating components are at their highest value, then _{i }= _{i }=

The target function is the difference between the amount of activation and the amount of inhibition. When there is no inhibition, the output of the target function corresponds to the amount of activation. The level of the component will not exceed the amount of the activation. When the amount of inhibition is higher than the amount of activation, the value of the target function is zero. We need to consider a special case when all incoming edges of a component are inhibitions. In this case, we must assume that the mechanism leading to the presence of this component is not modeled, and that, in the absence of inhibition, the target function of this variable will return the maximal value. We obtain the following definition of the target function:

We use Qualitative networks to define executable models of biological systems. The list of target functions specifies how the different molecules in the system influence each other. When creating a model based on the mechanistic understanding of a biological process, we use this information to define a set of target functions. Where necessary, we separate the representation of protein activity from the expression level of the corresponding gene. This is useful for modeling cases in which there are interactions at the transcription level as well as protein-protein interactions. During the analysis process, we may modify the definition of the target functions. While Qualitative networks allow for any kind of target function, we use a subset of functions that represent specific biological interactions, such as the activation/inhibition function described above. This ensures that new hypothesized interactions between components are defined at a level where they have a clear biological meaning. The implementation of models defined as Qualitative networks is described in the

Iterative model improvement

A model in general, and an executable model in particular, is meant to represent how the system that we study actually works. Thus, it is natural to expect that the model would be consistent with the data obtained from laboratory experimentation with this system. We use

Our methodology in using Qualitative networks is the

Iterative improvement of the model

**Iterative improvement of the model**. Schematic view of the iterative improvement process used to build a model which is consistent with the experimental data. The verification process is represented in blue. The improvement of the model based on counter examples is represented in red. The outer improvement loop, based on laboratory experiments, is represented in green.

The iterative improvement process consists of the following. We first derive a set of specifications from prior laboratory experimental results. We build an initial putative model based on the current mechanistic understanding of the studied biological process and the working hypotheses we would like to evaluate. We then use formal verification methods to verify if the proposed model conforms with the specification. If the model fails this test, we deduce that the model does not conform to the data. We use counter examples, to suggest a revised model by making new hypotheses. The new model is then evaluated again, and these steps are repeated until no contradiction is found between the model and the specifications. At this point, we might have had to modify our hypotheses, both in discarding unverifiable assumptions and in adding assumptions that are necessary for the model to comply with the specifications. The resulting model, and the corresponding new set of hypotheses, are then known to be consistent with the experimental results. At this stage, it is possible to query the model to gain additional knowledge about the biological system. This can be done either by modifying the model to evaluate or by adding specifications that could provide additional information on the behavior of the system. Hypotheses and changes in the model leading to interesting behaviors are candidates for experimental validation. The result of such an experiment, whether confirming the information gained from the computational analysis or not, can then be added to the specification. This creates a second, outer, improvement loop which combines computational and experimental validation.

The specifications derived from laboratory experimental results can be separated into two categories.

This iterative improvement approach can be applied to highly non-deterministic models, hence the use of model checking. Qualitative networks, like Boolean networks, have a deterministic behavior, and it is therefore sufficient to perform one execution per initial state, and verify if each of these executions satisfies the specifications. If some executions do satisfy the specifications, but others do not, then the model is not constrained enough, and we need to formulate hypotheses that allow avoiding the executions which do not satisfy the specifications. Ideally, the new hypotheses would not contradict the current mechanistic understanding of the system, but rather suggest additional putative interactions between components. If there is a specification which is not satisfied by any execution of the model, then there is a contradiction between the mechanistic understanding of the system and the experimental data. In this case, the modifications needed for the model to satisfy the requirement will suggest alternative mechanistic models of the studied system that are more consistent with the experimental data.

In the following sections, we describe two additional methods for analyzing the model during the improvement process: the use of non-determinism on parts of the model, and the efficient computation of all infinitely visited states.

Use of non-determinism

In the case where every interaction in the model is precisely defined, enumerative networks are deterministic. Non-determinism can, however, be used to represent unknown interactions. In particular, rather than studying a complete multicellular model, the model of a single cell can be studied by representing inter-cellular interactions non-deterministically.

The level of proteins that are external to the cell are modeled by non-deterministic variables. These variables cannot jump arbitrarily from one value to another value, but only change by one level per time step. At each time step, the variable can either increase, decrease or remain constant. This is sufficient to capture any possible behavior since updates of components are bounded by Equation 2. The set of possible behaviors of the state of this protein is therefore a superset of the possible behaviors for any possible target function.

Instead of simulation, we use model checking, which explores all possible executions of a non-deterministic system. This leads to the analysis of a superset of the possible executions of any model with more constrained inter-cellular interactions. If no execution of the system satisfies the specification, then it is impossible that any model including this cell can satisfy the specifications, no matter how the inter-cellular interactions are defined. If some executions satisfy the specifications, but other do not, then it is not possible to predict the behavior of the cell with more constrained inter-cellular interactions. This approach is therefore useful for finding out if the improvement should be performed at the level of the inter-cellular communication or at the intracellular level.

Finding infinitely visited states

We propose an efficient symbolic algorithm for computing the infinitely visited states of a Qualitative network. A state

Analysis of Boolean networks also include studies of the attractors of the model. In discrete, deterministic models, attractors are loops of one or more states that are visited infinitely often. Questions of interest include: does the system end up in one or several attractors, and what initial conditions lead to which attractor. This analysis requires exploring the executions starting from all initial states. The number of possible initial states is exponential in the number of variables, and therefore we try to prune them by using the available information on the system. In case that no biologically meaningful definition of the set of initial state exists, it is necessary to consider all states as being initial. In this case, enumerative exploration is practically intractable even for relatively small models. In the

The algorithm we propose uses the structure of multicellular Qualitative networks by interleaving composition and computation of infinitely visited states. We first consider a partial model consisting only of the first cell. The behavior of components in neighbor cells are abstracted by allowing them to change non-deterministically. We compute the set of infinitely visited states of this partial model. This set contains the projection of any infinitely visited state of the complete model onto the variables of the partial model. We use this information to compute the infinitely visited states of a partial model consisting of the first two cells. This process is repeated to obtain the complete model and the set of infinitely visited states. We further optimize this algorithm by using

Modeling network motifs using Qualitative networks

Transcription regulation networks contain patterns that occur significantly more often in a real network than in a random network with the same characteristics. These patterns are called network motifs

In this motif, transcription factor

In the _{inh}) unbind _{inh}. Second, the time needed by _{inh }is absent than when it is present. Figure

Model of the gal system in E. Coli

**Model of the gal system in E. coli**. _{inh}, which is a shortcut representing both _{S }on _{St}. The situation in which no _{inh }is not expressed is represented in blue. This corresponds to an I1-FFL. The situation in which _{inh }is present is represented in red. In this situation, the model behaves like with a simple activation. The response time, measured as the time needed to reach _{St}/2 is shorter in the absence of _{inh}.

Qualitative networks are able to provide realistic qualitative approximations of the behavior of further common network motifs, such as other types of feed-forward loops and regulated-feedback motifs that appear in developmental transcription networks

Case study: a model of the interaction between the Notch and Wnt signaling pathways

We apply our modeling approach to a model representing the crosstalk between the Notch and Wnt pathways in mammalian keratinocytes. Both pathways play a key role in the control of cell proliferation and differentiation and have been linked to several types of cancer. We first present the biological background, then introduce the Qualitative network model for a single cell and for a model with five cell representing the multiple layers of the mammalian epidermis. We define a set of specifications derived from laboratory experimental perturbations on these pathways, and use them to analyze the possible executions of the model.

Crosstalk between the Notch and Wnt signaling pathways

Notch signaling plays a key role in the control of the development of various tissues, with the particularity that, depending on the context, it can either induce differentiation or maintain cells in a proliferating state (role in carcinogenesis reviewed in

The Wnt signaling pathway has a critical regulating role in stem cells, and has been associated with cancer in several tissues (reviewed in

The outermost layer of the human skin, the epidermis, can be divided into several layers: the basal layer, which is the closest to the underlying dermis, the suprabasal layer, and the cornified layer, which is mainly composed of dead cells (Figure

Schematic view of the multicellular model

**Schematic view of the multicellular model**. View of the different layers of the mammalian skin, and how they are represented in the multicellular model. Connections through the Wnt pathway are represented in red and connections through the Notch pathway are represented in blue. The level of Notch receptor of each cell is fixed, and the corresponding values are indicated in the cells. Both the dermis and the cornified layer (composed of dead cells) are not represented in the model. The required result for each cell in the case of wild type simulation is indicated below the cell. Cells of the basal layer are proliferating, while cells of the suprabasal layers are differentiated.

Model construction

We build a model of the Notch and Wnt pathways in the epidermal layer of mammalian skin. Our model is composed of five identical keratinocytes, each of them representing one layer of the epidermis. Protein activity and gene expression are represented by variables that can take four values: _{ij }∈ {-1, 0, 1}). Figure

Visualization of a single cell

**Visualization of a single cell**. The components of the Wnt pathway are represented in red and the components of the Notch pathway in blue. The canonical Wnt pathway starts from the extracellular level of the short range signaling Wnt protein, which we represent by the _{ext }variable. This short range molecule binds to the _{ext }to Frizzled and then to DSH. The level of the scaffolding protein _{exp}. _{exp }and inhibited by Axin. The family of downstream target genes of the canonical Wnt signaling pathway, (_{in}). Since a successful binding between the receptor and the ligand leads to the cleavage of the intracellular part of Notch (

The complete model consists of a single row of five cells, which are named _{1 }to _{5}. Neighboring cells are connected both through the Wnt and the Notch pathways (Figure _{1}, the leftmost cell on the visualization of the model, represents the lowest layer of keratinocytes of the epidermis, which means that the dermis is to its immediate left. This cell should adopt a proliferating fate. The upper skin, to which the keratinocytes migrate before they die, is represented on the right side of the model. We consider that the two immediate neighbors contribute to the level of ligand sensed by a cell. The level of ligand sensed by a cell (Ligand_{ext}) is activated by the level of ligand on the membrane of its immediate neighbor, and is thus maximal when the level of ligand in both neighbors is high. The level of Jagged on both extremities of the model is constant at medium. Similarly, we connect the level of Wnt produced by a cell to the level of Wnt sensed by the nearby cells. We consider the case in which a cell does not sense Wnt emitted by itself. In this case we can use the same double activation scheme as for the ligands. On the left side of the model, the level of Wnt is fixed to high since the dermis is known to emit Wnt signaling proteins. In contrast, the level on the right side is fixed to low since Wnt signaling does not occur in the upper layers of the skin.

Specifications

In order to be able to define high level specifications, we need to map changes in the level of protein activity to a certain cell fate. We consider the balance between the downstream target genes of Wnt signaling (referred as

The first high level specification (_{1 }is proliferating (_{4–5 }are differentiated (

**H1.1: **GT1 > GT2 in Cell_{1}

**H1.2: **GT1 < GT2 in Cell_{4–5}

We do not use Cell_{2–3 }in these specification, but if a cell has adopted a differentiated fate, cells in a higher sub-layer (on its right in the model) cannot be proliferating. We formulate this as an additional requirement:

**H2.1 **GT1 = GT2 in Cell_{i }⇒ ∀_{j}.

**H2.2 **GT1 < GT2 in Cell_{i }⇒ ∀_{j}.

We derive further specifications from the experiments performed by Nicolas _{4 }is also proliferating.

**H3: **Notch = KO ⇒ GT1 > GT2 in Cell_{4}

After several months, mice in which Notch is knocked out begin to develop basal cell-like carcinoma. They are also significantly more sensitive to chemically induced carcinogenesis. In order to relate these long-term changes to variations at the protein level, we assume that the short term effect of the Notch knockout is an increase of GT1 in several cells, or a decrease of GT2 in several cells, or both.

**H4: **Notch = KO ⇒ GT1 increases in Cell_{1–5 }or GT2 decreases in Cell_{1–5}

The impact of p21 knockout was also considered. Mice with p21 deficiency were more sensitive to chemically induced carcinogenesis, but did not develop tumors on their own. We relate this result to the level of GT1 and GT2 in a similar way to Notch knockout.

**H5: **p21 = KO ⇒ GT1 increases in Cell_{1–5 }or GT2 decreases in Cell_{1–5}

The work by Nicolas

Analysis of the model

We compute the set of infinitely visited states of the model described above. We obtain a total of 6561 states. All of these states adhere to the specifications _{2 }is concerned. This cell is either proliferating (_{2 }can change between the different states of a single attractor. We can therefore conclude that, in this model, Cell_{2 }is no longer receiving constant signals for proliferation, but has not yet committed to terminal differentiation. After the model reaches a stable state, modifying it to knockout Notch or p21 allows to verify that the model also satisfies specifications

Further insights can be gained by studying variations of the model. We consider the hypothesis that Notch-IC activates the transcription of _{in}). No execution of this model can satisfy requirement

We also consider the hypothesis that Notch signaling uses the Delta ligand instead of the Jagged ligand. While there is a positive feedback mechanism from Notch-IC to Jagged, Notch-IC inhibits the expression of the Delta ligand on the cell surface. We find that such a model cannot reproduce the wild type behavior (specification

Performance analysis

In this section, we describe the performance of the symbolic algorithm used to compute all infinitely visited states of a Qualitative network. We first consider variations of the Notch/Wnt model described above. We then use an arbitrary model to show that the method can be applied to models of more complex interactions between components. Finally, we show that our algorithm performs better than simpler symbolic algorithms. The

The computation of all infinitely visited states of the five-cells Notch/Wnt model takes 21 minutes. This model has 4^{60 }≈ 10^{36 }initial states. This is a significantly larger number than previously studied by Boolean networks (for example, the model used by Li ^{11 }= 2048 states). In order to evaluate the performance of our symbolic algorithm on larger models, we extend the Notch/Wnt model up to 12 cells. We add cells both to the basal layer (on the left side of the model) and to the suprabasal layer (on the right side of the model). As in the five-cells model, the level of Notch receptor is set to ^{86 }initial states.

Execution times for variations of the Notch/Wnt model

Size

Pattern

Time

Initial states

Inf. visited states

3

OMH

33 sec.

4^{36 }≈ 10^{21}

1

4

OMHH

4 min. 27 sec.

4^{48 }≈ 10^{28}

256

5

OMHHH

21 min.

4^{60 }≈ 10^{36}

6561

6

OOMHHH

24 min.

4^{72 }≈ 10^{43}

6561

7

OOOMHHH

26 min.

4^{84 }≈ 10^{50}

6561

8

OOOLMHHH

63 min.

4^{96 }≈ 10^{57}

256

9

OOOLMHHHH

171 min.

4^{108 }≈ 10^{65}

256

10

OOOOLMHHHH

181 min.

4^{120 }≈ 10^{72}

256

11

OOOOLMHHHHH

513 min.

4^{132 }≈ 10^{79}

256

12

OOOOOLMHHHHH

543 min.

4^{144 }≈ 10^{86}

256

This table indicates the execution time for various sizes of the model. The patterns indicate the fixed amount of

It can be observed that adding a cell in the basal layer has a significantly smaller impact on the performance than adding a cell in the suprabasal layer. This is due to the fact that the possible behaviors of the cells in the basal layer are very limited due to the lack of Notch signaling. In general, the performance of symbolic algorithms is dependent on how compact the representation of the sets of states and the transition relation are. This complexity is directly dependent on the structure of the model. In the case of Qualitative networks, choosing a different combination of target functions has an impact on the performance even though the number of components stays the same. We illustrate this by constructing an arbitrary Qualitative network (Figure

Arbitrary pathways for performance evaluation

**Arbitrary pathways for performance evaluation**. This cell contains arbitrary chosen pathways that have no biological meaning. The interactions between components in this example were chosen in order to obtain a complex transition function. The component _{ext }is connected to the component _{ext }is connected to the component

Our algorithm interleaves the composition of a multi-cellular model with the computation of infinitely visited states. This approach outperforms symbolic algorithms that directly compute the set of infinitely visited states of the complete model. The application of the direct algorithm to models of the Notch/Wnt pathway with four or more cells fails, because the symbolic representations of the sets of states becomes too large. On the three-cell model, our algorithm needs 33 seconds and the direct algorithm 231 seconds. The advantage of first computing the infinitely visited states on partial models is that the sets of potentially infinitely visited states is smaller than in the direct algorithm. This is illustrated in Table ^{13 }states. An algorithm that directly computes all infinitely visited states would start from the set of all states of the system (approximately 10^{72}).

Size of infinitely visited state sets for partial models

Size

Before

After

BDD Nodes

1

1.4 · 10^{6}

3424

427

2

2.9 · 10^{8}

3424

451

3

2.9 · 10^{8}

3424

475

4

2.9 · 10^{8}

3424

499

5

4.5 · 10^{8}

1.3 · 10^{6}

1189

6

2.5 · 10^{11}

5.2 · 10^{6}

23092

7

1.0 · 10^{12}

7.2 · 10^{8}

70626

8

1.4 · 10^{13}

5.6 · 10^{8}

146169

9

1.1 · 10^{14}

5.5 · 10^{9}

253558

10

6.1 · 10^{13}

255

8830

This table describes the execution of the algorithm on the 10 cells Notch/Wnt model. For each partial model, the number of potentially infinitely visited states before the execution of the algorithm and the number of infinitely visited states found by the algorithm are indicated. The size of the symbolic representation of the infinitely visited states of the partition is indicated in terms of number of BDD nodes.

The symbolic approach we propose is able to compute the infinitely visited states for large multicellular Qualitative networks and to handle complex models. By tailoring the algorithm specifically for Qualitative networks, we can analyze larger models than by directly computing the infinitely visited states.

Discussion

In this work, we extend Boolean networks to Qualitative networks; networks that allow variables to take discrete values representing qualitative levels of protein activity or gene expression. By introducing a target function, we allow a rich set of interactions between components. We propose a method for analyzing Qualitative networks, that uses formal verification of specifications derived from experimental observations for iteratively improving the model. In order to efficiently analyze the properties of infinitely visited states of large models, we introduce a symbolic algorithm which builds on the structure of our modeling method. We apply this method to build and analyze a multicellular model of the interactions between the Wnt and Notch pathways in mammalian keratinocytes. We show that the proposed symbolic algorithm can be applied to large and complex multicellular models.

Model analysis approach

Our modeling approach is aimed at using formal verification methods for the analysis of signaling networks in an iterative improvement process which is based on specifications that are directly derived from experimental results. This work therefore combines formal verification methods with the widely used formalism of Boolean network, and its extension to qualitative domains. Since Boolean networks are a particular case of Qualitative networks, the iterative improvement approach can be directly applied to Boolean networks as well.

Our method allows to build a model at a level of abstraction that is similar to the one commonly used in cell biology, and to analyze it using specifications that are directly derived from experimental studies. This approach is therefore useful for biologists to identify gaps between the hypothesized mechanistic description of a pathway of interest, and the experimental data on which the hypothesis has been formulated. Furthermore, the careful analysis of counter examples found using this approach can lead to new hypotheses on how to reconcile the model with the experimental data. We illustrate the usefulness of this approach by an example showing that we can infer a known interaction from the analysis of a model in which this interaction was purposefully ignored. It is known that in mammalian keratinocytes, Notch activates the expression of its ligand Jagged. For the purpose of this example, we ignore that information, and consider that Notch inhibits the expression of the Delta ligand, as it is the case in Drosophila when building our original model. Such a model is not able to reproduce the wild type specification (as formulated in the results section), and we have thus identified a gap between the hypothesized model and the experimental data. The counter example traces we obtain indicate that the model is unable to reproduce the sustained Notch signaling that is necessary in order to reproduce the observed wild type behavior. This strongly suggests that the type of interaction between Notch and its ligand in keratinocytes needs to be changed from inhibition to activation. Applying this change to the model and then running the verification algorithm again shows that the modified model is consistent with all specifications. This means that our refinement process allows us to infer the hypothesis that was left out in this example. Furthermore, a similar approach, although using a different formalism, has lead to new biological insights about the vulval development in

We believe that the possibility of

An area in which our approach appears to be particularly useful is the analysis of cell-cell communication in multicellular models. Given the size of such models, efficient algorithms are required. The efficient use of the modularity in multicellular models by our symbolic algorithm makes it particularly suitable for analyzing cell-cell communication processes together with complex intracellular pathways, a level of detail that would be intractable for enumerative or simpler symbolic algorithms. The utility of this algorithm extends beyond the scope of our iterative improvement approach. The computation of attractors of Boolean networks is, for example, mainly done using enumerative methods. Such methods become intractable even for small models, and since Boolean networks are a particular instance of Qualitative networks, our symbolic algorithm can be directly applied to this problem.

Modeling approach

In addition to being well suited for our formal verification based iterative improvement approach, using Qualitative network has several benefits compared to using other existing frameworks, such a quantitative models using differential equations and Boolean networks. In this section, we discuss these advantages, and then stress the benefit of clearly separating the modeling of a system using a well defined framework like Qualitative networks from its implementation.

Using discrete qualitative variables sets, allows modeling at a level of abstraction which is similar to the one used in many biological studies. Quantitative methods, such as differential equation-based models, allow continuous and precise modeling of the execution of a system, but require fine-tuning of multiple parameters that are often not easily available for many biological systems. The amount of precision provided by such methods may be unnecessary to analyze a putative model on the basis of experimental results that only provide qualitative levels for the components of the system. The results obtained using Qualitative network models of network motifs that occur frequently, such as the I1-FFL, indicate that the Qualitative networks approach provides a realistic qualitative approximation of the experimentally observed behavior of the system.

The iterative improvement approach used for the construction of a consistent model as well as our analysis algorithms can be applied to Boolean networks. However, extending Boolean networks allows both to create more detailed models in terms of granularity, and to model a richer set of biological interactions thanks to the introduction of the target function. Using more than two levels is necessary if an experimental result cannot be formulated as a requirement over boolean values. This is the case if, for example, the knockout of gene

The advantages of using Qualitative networks are not limited to the increased expressive power of the specifications. Boolean networks are not sufficient to model some commonly observed patterns. For example, the action of two transcription activators of a gene can be cumulative, whereas Boolean networks only allow modeling the case in which the presence of one transcription factor is sufficient for transcription to happen. Furthermore, while the influence of the I1-FFL motif on the response time can be modeled using Qualitative networks, this is not the case with Boolean networks. In the case of the studied example, the requirement that the presence of

These examples illustrate the need for extending Boolean networks to the qualitative domain. Depending on the structure of the studied network and the corresponding specifications, Boolean networks may however be sufficient in some cases. Since it is still possible to use the exact same analysis approach, Boolean values should be used in such a situation to reduce the size of the state space. Possible extensions of the Qualitative network framework include the addition of probabilities, resulting in Markov processes, for which formal verification methods exist. However, the necessary data for such models is often not available, and verification of probabilistic systems is computationally harder.

Separation between the model and the implementation

We specify the molecular interactions between components by defining a Qualitative network of the studied system. Compared to directly implementing an executable model, this approach allows to clearly separate this specification from the implementation. This allows discussion at the level of the Qualitative network, and thus leads to a improvement process at the level of abstraction often observed in diagrammatic models, which are commonly used by biologists. Furthermore, the clarity and coherence of the model is increased by defining a small set of target function that are consistently reused to model the same kind of interactions. The benefits of a clear separation between the model and its implementation extend beyond Qualitative networks. Using a modular approach based on simple building blocks allows a faster development of coherent executable models. Each type of building block represents a specific kind of biological interaction between components. This approach separates the executable model into two distinct parts: the definition of the interactions between proteins such as activation or inhibition, and the internal implementation of the building blocks. This modular design is particularly useful for fast search of multiple variations in the interaction model. Once the set of basic building blocks is implemented, interactions between biological elements of the model can be changed by replacing one building block by another, and new elements can be added as instances of the appropriate kind of building blocks, all this with minimal changes in the implementation of the model. Changes to the implementation of building blocks only need to be made in one place and can then be reused by all variations of the model. In the case that we want to explore the behavior of the model when we use different ways for representing the interactions between biological elements, it is sufficient to replace one set of basic building blocks by another set of basic building blocks. The clear separation between the interaction model and the implementation is also helpful for assessing the validity of the Qualitative networks approach to modeling biological systems. Rather than having to understand the behavior of every single protein of every variation of the model, our approach allows to first agree on the definition of specific target functions, and then assess the plausibility of a particular Qualitative network, whose representation is close to the common visualization of biochemical pathways. The clear distinction between what we model and how we do it therefore makes the whole approach easier to understand. The clear and concise definition of the behavior of a small set of basic building blocks makes it possible to evaluate (and criticize) the conceptual elements of our approach. Finally, this approach allows to clearly separate methodological issues from issues that are only related to a particular Qualitative network.

Notions of concurrency

Qualitative networks, like Boolean networks, are synchronous. At each time step, the state of all variables is updated according to the previous state of the system. In this section, we discuss the applicability of other notions of concurrency in the context of Qualitative networks. The variables we use are qualitative representations of the level of protein activity or gene expression. Hence, these variables represent a large population of individual molecules. The interactions between individual molecules are highly stochastic. When using discrete levels of protein activity or gene expression, we consider that the biochemical reactions have a delay. The stochasticity of the individual reactions results in small variations of this delay. The level of all variables is updated according to the previous values. A change therefore takes one time step to propagate from one variable to the next. The time step can thus be considered as being the delay of the biochemical reactions. In case there are major differences between the delays of the different reactions, it is possible to use target functions on the previous states of the systems rather than on the current state. The impact of the small variations of the delay depend on the granularity of the model. The minimal difference between two possible values of a variable is inversely proportional to the granularity of the variable. In a model with a small granularity, variations of the delay would lead to the same value due to rounding. In this case, the synchronous execution is sufficient to reproduce the behavior of the model. As the granularity increases, rounding is no longer sufficient for handling the variability of the delays. In order to model this variability, it is necessary to introduce non-determinism by allowing asynchronous execution of the model. In a totally asynchronous system, a scheduler would choose any subset of the system which would be updated, while the other remain constant. This means that the delay of an interaction is completely independent of the delay of other interactions of the system as well as of the previous delays of the same interaction. This independence is thus an over-approximation of the expected behavior of the system, and is therefore likely to result in unrealistic executions. Therefore, while a synchronous execution is appropriate when the granularity is low, more precise models in terms of granularity call for a better handling of concurrency.

The state explosion problem

The exhaustive exploration of all possible executions of a system is exponential in the number of variables of the system. This issue is called the ^{k}. Enumerative methods are practically intractable even for small values of ^{144 }≈ 10^{86 }initial states. Our approach is therefore well suited for analyzing multicellular Qualitative networks. This kind of model is particularly useful for studying pathways that involve inter-cellular communication.

In addition to using the specificity of the model for tailoring the verification algorithm, methods used in software and hardware modification, such as assume-guarantee reasoning

Conclusion

In this work we propose an extension to Boolean networks by allowing variables that represent the level of protein activity or gene expression to range over larger domains, combined with more flexible update options for these values. We show that a Qualitative networks model of a frequent network motif offers a valid qualitative approximation of the experimentally observed behavior of this motif. This framework can be used to model recurrent networks. In addition we use formal verification methods to analyze the steady state behavior of Qualitative networks. Reasoning about sets of states rather than individual states, and using a partition reduction allows us to scale-up to very large models. In particular, Qualitative networks can be used to analyze multi-cellular models, and thus to study pathways that involve cell-cell communication. Similar to Boolean networks, this approach could be useful for the analysis of biological pathways even where quantitative data is missing. We believe that the ability to formally verify a biological model versus the laboratory observations offers great advantages in improving working hypotheses and suggesting new experimental directions that are likely to yield new findings.

Methods

We use the

Verification of reactive systems

Reactive modules

Reactive modules is a modeling language for reactive systems

Mocha

Mocha

Modular implementation of Qualitative networks

The implementation of Qualitative networks extends and adapts the modularity of reactive modules to biological models. The list of target functions defines the relation between components by specifying one target function per variable. We separate this model into _{ij }used in the model. Instances of the basic building blocks are combined together according to the list of target functions by connecting the output variables of one building block to the input variables of one or several other building blocks. A cell is represented by the parallel composition of several building block instances that hides internal components (makes them private variables) and represents interactions between the cell and the environment as interface and external variables, thus allowing for connection with other cells. This architecture is a consequence of the modular approach of RM, which uses a concept of connection similar to the connections between components in an electrical circuit.

The implementation of each basic building block constitutes one Module, with one or more external variables (corresponding to the input of the target function) and one interface variable (the value of the component controlled by the building block). An additional variable is used to control the behavior of the building block. This variable is only awaited and therefore does not contribute to the size of the state space. The building block is composed of two atoms. The first atom computes the value of the target function of the building block based on the values of the input variables in the current state. This atom is specific to every building block. It is obtained directly from the definition of the corresponding target function, by mapping all possible valuations of the input to the corresponding target value. The second atom implements Equation 2 to compute the value of the output value in the next state of the system based on the target, the current value of the system and the control value. This atom is the same for all building blocks. It makes sure that the output variable increases and decreases by at most one level per step rather than directly jumping from its current state to the target. The control variable allows to specify the initial value or to choose to non-deterministically start from every possible value.

We also introduce the possibility for a building block to determine the initial output value of a building block based on the initial value of its inputs rather than having to set it arbitrarily or non-deterministically. This allows to obtain an initial state of the system in which only a few variables are set and the others are logically derived from them. Furthermore, choosing non-deterministic initial values for a few variables and then deriving the other initial variables leads to significantly less possible initial states than choosing the initial value of every variable non-deterministically, while still exploring the most plausible initial states. In case that there are mutual dependencies between variables (i.e., a component ultimately affects itself in a loop), this approach can be applied only to some of the substances. In such a case, trying to simplify the initial state of all substances would create a loop of dependencies that cannot be solved. It is therefore necessary to choose at least one variable in every loop of the model for which the initial value is set or non-deterministic.

Formulating specifications

Specifications play a key role in our approach: they are the link between the experimental results and the computational model. A putative model needs to adhere to the specifications in order to be considered as a potentially valid representation of the biological system. If the model does not adhere to a requirement of the specification, then we already know that the model is not able to explain all the existing experimental results, and therefore needs to be improved.

When defining specifications, we start from the description of the experiments and derive safety requirements. The observed result can then be formulated as a predicate over the values of the different components of the system. In the most simple case, the specification can be expressed as a predicate over the current state of the system. We can therefore formulate an invariant which must hold on all states of the system. It is however often necessary to consider that the model starts from an arbitrary initial state and needs some iterations to stabilize. We therefore choose an arbitrary number of steps during which the invariant is not checked. This setting allows to verify if the the wild type (normal) model satisfies certain specifications.

We are also interested in verifying the outcome of experiments that include modifications of the biological model. We first consider under which conditions the experiment has been performed. We consider experimental conditions involving two kinds of manipulations: the over-expression of a gene, in which the level of this gene remains at a

Using monitors, we can therefore verify specifications that are richer than invariants but a subset of general safety properties. The monitor described above allows to verify if all executions of the model satisfy a certain specification. When this is not the case, we are interested in knowing if some execution satisfies the specification. We can do this with monitors, by verifying if there is a trace on which the dual of the specification is not satisfied on all states on which the specification is checked.

We are also interested in extending the notion of delays for stabilization of the model, by only verifying the specification on the infinitely visited states. We consider an execution of the model as a prefix followed by a loop in which the model stays indefinitely unless there is some change in the model. The change triggers a similar behavior, with a second prefix and a second loop. Since Mocha does not provide a direct way of detecting loops, we use an over-approximation of the length of the prefix and the loop as delay for the stabilization. When a contradiction is found, we verify that the trace contains a loop, which means that the contradiction occurred in the loop and is thus valid, else we know that we need to increase our approximation. This solution is not usable when the number of different executions is large. In the section below, we propose a symbolic algorithm which finds all infinitely visited states of the model, and verifies a property only on these states. This method does not require the use of monitors.

Symbolic computation of all infinitely visited states

We propose a symbolic algorithm for computing all infinitely visited states of a Qualitative network. Infinitely visited states of the model represent the stable states of the biological system, and correspond to the notion of attractors in Boolean networks. Our algorithm builds on the modularity of Qualitative networks. It interleaves the computation of infinitely visited states of parts of the model with the composition of these parts. This process ultimately results in a complete model and the set of infinitely visited states of the model. In this section, we first introduce symbolic methods and the tool used for the implementation of the algorithm, and then describe the algorithm.

Symbolic methods

Symbolic methods are used to represent sets of states of a system in a compact way. Rather than enumerating all the states of a set, a symbolic representation uses constraints identifying all states in the set. A symbolic representation may therefore be exponentially more compact than the enumerative representation of the same set. Hence, symbolic algorithms operate on representations of sets rather than considering individual states. Symbolic algorithms are successfully used for model checking systems that have very large state spaces

Mocha offers both enumerative and symbolic methods for verifying invariants. However, in order to find the infinitely visited states of large models, we need an algorithm which is specifically tailored for Qualitative networks. We implement this algorithm using the Relational Manipulation Language. We also need to translate the RM implementation of the model into this language, a process that we do not describe here. We use CrocoPat

Algorithm

We want to find the set of all infinitely visited states of a module _{M }is a set of states of _{M }and the set of all variables of the module by _{M}._{
}For a given state _{M}, we define the projection _{M}) maps the region _{M }to the union of the successors of all states in _{M}. This function is constructed according to the definition of the module

In order to obtain the set of infinitely visited states of

**repeat**

**until **

The region ^{i }contains all states that can be visited at time step ^{in f }contains all states that are infinitely visited. In order to show that this algorithm terminates, it is sufficient to show that

We are interested in knowing if a predicate _{M }as the set of all states for which the predicate is true. Therefore we can say that the predicate holds on all infinitely visited sets iff

**repeat**

**until **

If

The general algorithm described above can be applied to every kind of module. We use a more specific algorithm tailored for Qualitative networks. Each module consists of an atom computing the target function and a controller atom which enforces the transition rules specified in Equation (2). Rather than directly computing the infinitely visited region of a composed module _{1}||_{2}, we first compute the infinitely visited region of each module separately, and then do the parallel composition of them. The separate modules might have external variables, whose behavior is not specified. Rather than allowing them to have any random behavior, we enforce the transition rules of Equation (2). The set of possible behaviors of an external variable is therefore a superset of the behavior of any building block. Using the algorithm above, we compute the infinitely visited region _{1}. We consider the projection _{1}. Since the set of behaviors of the non-deterministic building block are a superset of the behavior of any other building block we have that _{2},

When applying this method to a multicellular model, we use

Setting of the performance analysis

The performance tests are performed on a 3 GHz Intel Xeon computer running Linux Fedora Core 3 (Kernel version 2.6.11). The CrocoPat tool can use up to 3 GB of memory.

Authors' contributions

MS carried out the modeling work, conceived the idea of the Qualitative network framework, and performed the analysis of the model. JF conceived the study and participated in its design. TAH participated in the design of this study. All authors collaborated for the elaboration of the manuscript, read, and approved the final manuscript.

Acknowledgements

We thank Nir Piterman, Freddy Radtke and Grégory Théoduloz for fruitful discussions and Dirk Beyer for assistance with the implementation of the symbolic algorithms. This work was supported in part by SNSF grant 205301-111840.