Center for Cell Analysis and Modeling, University of Connecticut Health Center, Cell and Genome Sciences Building, 400 Farmington Ave, Farmington, CT 06030-6406, USA

Abstract

Logic-derived modeling has been used to map biological networks and to study arbitrary functional interactions, and fine-grained kinetic modeling can accurately predict the detailed behavior of well-characterized molecular systems; at present, however, neither approach comes close to unraveling the full complexity of a cell. The current data revolution offers significant promises and challenges to both approaches - and could bring them together as it has spurred the development of new methods and tools that may help to bridge the many gaps between data, models, and mechanistic understanding.

Have you used logic modeling in your research? It would not be surprising if many biologists would answer no to this hypothetical question. And it would not be true. In high school biology we already became familiar with cartoon diagrams that illustrate basic mechanisms of the molecular machinery operating inside cells. These are nothing else but simple logic models. If receptor and ligand are present, then receptor-ligand complexes form; if a receptor-ligand complex exists, then an enzyme gets activated; if the enzyme is active, then a second messenger is being produced; and so on. Such chains of causality are the essence of logic models (Figure 1a). Arbitrary events and mechanisms are abstracted; relationships are simplified and usually involve just two possible conditions and three possible consequences. The presence or absence of one or more molecule, activity, or function, [some icons in the cartoon] will determine whether another one of them will be produced (created, up-regulated, stimulated) [a 'positive' link] or destroyed (degraded, down-regulated, inhibited) [a 'negative' link], or be unaffected [there is no link]. The icons and links often do not follow a standardized format, but when we look at such a cartoon diagram, we believe that we 'understand' how the system works. Because our brain is easily able to process these relationships, these diagrams allow us to answer two fundamental types of questions related to the system: why (are certain things happening)? What if (we make some changes)?

Untangling the ridiculome

But how about looking at a similar diagram that contains thousands of components, interconnected near and far? We may be able to infer the properties of certain subsystems, but we would not intuitively be able to predict overall behavior; to understand it as a whole. This is exactly what led to the development of formal logic-based modeling applications in biology. Even somebody with little mathematics training can recognize that the causal relationships represented in Figure

Model representations

**Model representations**. **(a) **Typical cartoon diagrams and schematic interactions (adapted from **(b) **The zoomed-in part of the cartoon diagram in (a) translated into a logic model. Shapes and arrow styles are represented in Systems Biology Graphical Notation (SBGN) standard (Entity Relationship (ER) format) that provide a defined one-to-one correspondence with a logic formalism. Arrows correspond to activation reactions. **(c) **A truth table corresponding to the logic model in (b) that shows how presence (1) or absence (0) of molecules of the input nodes leads to presence or absence of activity in the selected molecules of interest (output nodes). **(d) **Dynamic representation of the same model in SBGN standard (Process Diagram (PD) format). It can be considered either as a logic model or as a reaction diagram for a kinetic time-course simulation, where every node represents concentration of a chemical species.

Logic models offer a conceptually simple representation of biology that is easy to simulate. They are naturally suited to exploring large-scale biological networks where causality links are being hypothesized, or sought: genome, transcriptome, proteome, metabolome, interactome, microbiome - the list goes on. We are witnessing an unprecedented increase in the amount and quality of data available for describing and modeling biology at the cellular level. Graphical representation of these data as a network of (putative) relationships with nodes and edges (Figure

Peeking under the rug

The fact that logic models are easy to compute makes them useful for random searches and screening (for example, analyzing perturbations at multiple elements of the network), and for processing large amounts of individualized data (for example, comparing proteomics data from tumor cells or mutants with data from their normal counterpart). They therefore generated a lot of excitement because they appear very attractive for fields such as drug discovery

Logic models

**Logic models**. Models can be encoded using simple Boolean logic (nodes may accept only true or false value), Bayesian probability (node values represent likelihood of events), or fuzzy logic (nodes have 'variable degrees of truth'). Depending on the availability of time-resolved data, these can be simulated to describe the system at steady state (one or a few selected time points), or dynamically (time course as a discrete sequence or as a continuous function). Solid red lines show methods implemented by the CellNOptR toolkit. ODE, ordinary differential equation.

Logic-derived models differ not only in the level of fine-graining of the functional relationships, but also in their ability to handle time - the dynamics of the systems. Boolean networks were originally designed to provide simple input-output relationships - that is, the steady-state achieved under varying conditions. This is appropriate, for example, for analyzing traditional transcriptomics or proteomics experiments. Whether we measure expression levels before or after some external perturbation (for example, applying a stimulus or drug), or compare different cell populations, it is still just a collection of different steady-states. True time-course data were typically limited to small scale experiments, but are now becoming available also in high-throughput technologies. Algorithms to allow logic-derived models to simulate dynamic systems aim to retain the simplicity of Boolean networks but with a fine-grained representation of time (Figure

More detail comes with the burden of increasing computational complexity and the risk of over-parameterizing: the extensions to logic models described above require both choosing a functional form and inclusion of additional parameters such as coefficients and thresholds, all of which are often arbitrary or at best phenomenological. Von Neumann once famously quipped that 'With four parameters I can fit an elephant, and with five I can make him wiggle his trunk' (in fact, this was recently rigorously proven to be true

Is modeling software only for the initiated?

The right choice of mathematical formalism thus depends on both the purpose of the model and the type, quantity, and quality of data at hand - and for systems of any complexity will most likely be a combination of multiple methods. This was (perhaps painfully) reinforced recently by the report by Karr

The gaps and uncertainties in the knowledge of networks are still prevalent in most cases, but custom -omics data are now much easier to obtain. So perhaps advances in logic-based modeling could help. A recent paper in

A collection of logic-derived model simulators implementing several algorithms in a single free open-source package would normally be regarded as an incremental advance. But here the whole is much larger than the sum of its parts. As mentioned before, the ease of simulating logical models makes them adept for the difficult task of reverse engineering. Indeed, CellNOptR was designed primarily to be used as a network inference tool, but using a novel approach (Figure

Data to model pipeline

**Data to model pipeline**. The data world (public and private pathway databases/interactomes and -omics data sets supplementing existing models and kinetics data) is used to inform the model building process via several processes: building of a prior knowledge network, of a data-derived (inferred) network, and their refinement and training to data (color coding and terminology are described in

The holy grail of cellular models

Why is this a powerful approach? Because it can greatly help to understand the system being modeled. We do not wish to engage here in discussing the meaning of 'understanding' and of the usefulness of models; these have been frequent topics in biological discourse in recent years. We will rather illustrate by a hypothetical example, in very broad and practical terms (the interested reader is referred to

How does this relate to large multi-scale models? Covert and colleagues

In fact, much more is swept under the rug than we have alluded to so far. Even the simple cartoon diagram shown in Figure

Does this detract from our praise of the advances in logic-derived models discussed above? No. To the contrary, this is why we are really excited. Let us return to those logic-derived ordinary differential equations (ODEs and how they (can) relate to the other side of the field. Of course, they are phenomenological constructs of whatever arbitrary mathematical form is being provided (in this case Hill-type equations, which can capture a variety of common non-linear relationships with only two parameters). But such mathematical approximations are sometimes the starting point for discovering the underlying mechanism. In what is arguably one of the most influential modeling works related to biology, almost exactly 100 years ago Leonor Michaelis and Maud Menten used a phenomenological equation to fit the experimental measurements of the initial velocity of the invertase-catalyzed reaction (at time zero, when no product has formed yet, the reaction can be simplified and modeled as being irreversible). Based on that approximation, they posited that the enzyme activity could be explained by mass-action kinetics involving an intermediary reaction complex - the fundamental mechanism of enzymatic catalysis that was confirmed three decades later

Moreover, if we can modify a logic-derived model and end up with a differential equations-based model, why not jump over the fence and use what is available in the world of kinetic models? For starters, much more powerful optimization algorithms and tools have been developed in that domain

It sounds trite to say that we need to use multiple approaches and tools in order to build truly complete and accurate cellular models. We are getting closer not only to integrating multiple logic-based formalisms easily, but also to crossing over into kinetic, spatial, rule-based models, and more. And the experimental data required for building all these different types of computational models at different scales and levels of detail will have to come from both 'small science' and 'big science'

Acknowledgements

The authors wish to acknowledge NIH grants RR013186 (IIM) and GM095485 (MLB, IIM) as well as many stimulating discussions with colleagues at the RD Berlin Center for Cell Analysis and Modeling.