With recent improvements of protocols for the assembly of transcriptional parts, synthetic biological devices can now more reliably be assembled according to a given design. The standardization of parts open up the way for in silico design tools that improve the construct and optimize devices with respect to given formal design specifications. The simplest such optimization is the selection of kinetic parameters and protein abundances such that the specified design constraints are robustly satisfied. In this work we address the problem of determining parameter values that fulfill specifications expressed in terms of a functional on the trajectories of a dynamical model. We solve this inverse problem by linearizing the forward operator that maps parameter sets to specifications, and then inverting it locally. This approach has two advantages over brute-force random sampling. First, the linearization approach allows us to map back intervals instead of points and second, every obtained value in the parameter region is satisfying the specifications by construction. The method is general and can hence be incorporated in a pipeline for the rational forward design of arbitrary devices in synthetic biology.
Synthetic biology places emphasis on small, standardized molecular parts and devices, mostly operating at the transcriptional level [1,2]. With standardization comes the need for rigorous quantitative characterization of such devices and for a compositional theory to reliably build larger systems from small canonical circuits. For now most synthetic circuits implemented in vivo were constructed from a small number of components with topology and parameter values found by trial-and-error. The development of larger synthetic systems necessitates the use of appropriate design methodologies. In silico analyses can provide significant insights into the construction of complex synthetic systems, but due to the poor quantification of experimental and micro-environmental conditions, the predictive capability of in silico models for in vivo implementations remains limited. Apart from experimental limitations, modeling attempts to date most often make simplifying assumptions about all the perturbations that a synthetic construct is facing in vivo. For instance, only a few studies account for the large extrinsic noise [3-5] and in particular the one introduced by variations of plasmid copy number . Incorporating those realistic in vivo constraints will make computational models more predictive, eventually enabling the upfront in silico optimization of transcriptional circuits. A first step toward this goal is to investigate the parameter dependency of certain behaviorial properties of a circuits. In systems biology attempts have already been made to address this problem, however, they either rely on purely local measures [7,8] such as considered in classical sensitivity analysis [9,10], or perform random parameter sampling  to determined parameter dependencies.
For a given circuit topology, kinetic parameters and other parameters that are involved in controlling the expression level of molecular species (e.g. promoter activity or number of ribosome binding sites) are important design parameters in synthetic biology. A major challenge is to find a set of parameters that satisfies the behavioral specification of a device . Computer science offers various languages to formally define the proper functioning of a piece of code or hardware. Such specification languages of formal verification are used to check important behavioral properties, such as liveness, safety or fairness . One convenient way to specify such properties is to use temporal logic, which is considered an extension of classical propositional reasoning, where propositional variables may change their truth values over time. A prominent such logic is the linear temporal logic (LTL), where the truth value of the propositions is interpreted over a linear timeline . Such techniques were already applied to investigate robustness of computational models in system biology .
Mathematically, the design problem is an inverse problem and hence inherits the general feature of such problems, namely ill-posedness [15,16]. More specifically, for a certain behavioral specification one aims to find the corresponding parameter set that gives rise to such behavior. An simple example for a quantity in feature space could be the concentration of a molecular species at particular time-points. The problem is closely related to parameter optimization and even more so to robust optimization, where an objective function - generally encoding some behavioral constraint (e.g. making model trajectories close to the measurements) - is optimized to yield the optimal parameter set. Ill-posedness refers to the observation that two close-by points in specification or behavioral feature space may map to very distant points in the parameter space, indicating that this mapping is generally not contractive but rather expansive. The inverse and corresponding forward problem is illustated in Figure 1.
Figure 1. (A) The forward problem of defining a parameter set from which trajectories and their behavioral features are computed. (B) The inverse problem of finding a parameter regions for a predetermined behavioral specification region S. Columns from left to right correspond to parameter space, trajectory space and behavioral feature space, respectively. Connected convex sets can map to nonconvex non-connected regions.
In the current analysis we restrict ourselves to models obeying the reaction rate equation and hence constitute a set of nonlinear ordinary differential equations. In general, connected domains may map to disconnected domains, for instance if the dynamical system contains bifurcation points (e.g. see Figure 1). For the proposed linearization approach we will further restrict ourselves to connected domains in the respective image space. Moreover, we will not resort to specifying behavior through temporal logics but will define general specification functionals. These are mappings ψ from an appropriate function space χ of n-dimensional trajectories (e.g. )) to the m-dimensional reals and we choose the form
with and the feature kernel , where . A special and more tractable version of the kernel is the convolution, i.e. g(t, x(t)) = h(T − t)x(t). In the following we will only require the map x → g(·, x) to be once-differentiable. With this, we can define the forward map from a p-dimensional parameter space to the feature space as the composition F ≡ ψ ο φ, with . The trajectories are generated by the reaction rate equation
The brute-force method of determining the parameter region that satisfies a certain behavioral specification usually proceeds by Monte Carlo sampling of parameter sets, generating corresponding trajectories according to (1), checking whether those satisfy S and finally retaining only those parameter sets that led to satisfied specification S. There are two immediate downsides of this approach. First, most draws will be unsuccessful for high dimensional parameter spaces, for tight specifications, or for both. Different approaches using an optimized sampling [11,17] have been developed to mitigate this problem, but are not solving it as they require convergence of the sampling. Second, drawing parameter points in does not provide guarantees that those points belong to a connected domain of consistent parameter sets. Here we provide first attempts to tackle both problems.
The main idea is to locally linearize the forward map F around some point and then locally invert it. Hence, a small enough local patch in feature space can be mapped backward to a small patch in parameter space. By successively sampling expansion points in their neighborhoods (e.g. by the ball-walk algorithm ) we can systematically cover the entire specification S and obtain the corresponding parameter region. A series expansion of F around some initial parameter set k0 reads
Defining df ≡ F (k0 + dk) − F (k0) we see that a neighborhood df in feature space to first order can be mapped backward using the Moore-Penrose pseudo-inverse
that we define with care as
where L denotes the linearized forward map and hence is just the m × p matrix
Note, that the limit in (2) exists even if the inverse of LT L and LLT do not exist. Such situations are encountered as soon as the number of specification features m are less than the number of parameters, i.e. the dimension p of the parameter space. Importantly, we can compute (3) efficiently using the variational equation for the system (1). Observe that
where the last terms in the integral is just the sensitivity of the solution of (1) to perturbations in k around k0. According to the variational equation the sensitivity obeys the following ordinary n × p matrix differential equation
where we skipped the explicit dependency on k0 for brevity. Note, that (4) is equivalent to the transient sensitivity analysis of metabolic networks [9,10], proposed as an extension of classical metabolic control analysis that only deals with steady state sensitivities. For a certain k0 the sensitivity of the kernel g is a constant m × n matrix that can be computed explicitly. Thus, by jointly solving (1) and (4) for some k0 together with
up to time T we obtain the linearized map L = L(T). Hence, for every sampled k0 and associated feature point f 0 we propose to design a feature ball
and map it backward using L†. According to the singular value decomposition L† = UΣV with Σ a diagonal matrix with non-negative entries , the backward transformation needs to be a sequence of a rotation, a scaling and another rotation and hence the image of under L† can only be a ellipsoid in the parameter space
Clearly, sampling a multivariate region with balls of same dimension allow for a complete coverage of the region - something that can only be extrapolated when using pointwise sampling . The question to efficiently sample a region with balls has been addressed in computational geometry and efficient randomized algorithms are available .
We remark that the map L is not the best local approximation to F(k) in some norm sense. More specifically we can improve on L if we are giving additional samples of the neighborhood . Consider we draw another , then we can construct a rank-one update to L
Thus, knowing how to construct rank-one updates over the domain of interest is equivalent to knowing F(k) locally. In fact, is the matrix closest to L, with respect to the Frobenius norm, that satisfies (6). Subsequently we will use this improved linear approximation to F to bound the error that one can incurrs if one uses the pseudoinverse L† for the backward map. This will also provide means to determine the maximal ball size δ to stay below a certain error bound. We quantify the error in the feature space by the backward map followed by a forward map. That is, we want to find a δ such that
Now suppose we know a bound ρ(δ) for the Frobenius norm of the rank-one perturbation, i.e. in the local domain of interest. Note, that ρ(δ) could and need to be estimated by sampling. Given a the maximal error of the inverse-forward map is
which is known from robust linear squares  to be equivalent to the error
Assuming that L has linearly independent rows, LL† is the identity matrix and thereby the error simplifies to
This result provides one way to determine the radius of the feature ball δ when relying on the pseudo-inverse
As a proof of concept of our method, we applied it to a simple synthetic sensor construct . The system is made of several gene copies (e.g. with plasmid transfection), expressing a protein that dimerizes and activates the gene by binding to the promoter. In presence of the inhibitor (input of the system), the dimer is trapped and cannot bind to the promoter. A schematic of the involved reactions is depicted in Figure 2.
Figure 2. Simple transcriptional sensor construct. The dimerized form (A2) of a protein (A) is its own positive regulator; the inhibitor (I) tethers the dimer away in an inactive form (A2 − I).
The system is simulated according to mass-action and obeys
where the states xi denote the concentration of mRNA, protein, protein-dimer and dimer-promoter complex, respectively. The quantities and y(t) refer the total number of promoters and the external inhibitor concentration, respectively. The nominal value and the meaning of the model parameters are summarized in Table 1. We remark that such continuous state-space model have their limitations for transcriptional circuits because they require several gene copies in order to neglect the discrete Boolean nature of a single gene.
Table 1. Nominal values and meaning of the kinetic parameters for the model of the synthetic sensor construct.
For the specified behavioral features, we expect the dimer to drop quickly after introduction of inhibitor and then quickly regain a high level after the inhibitor is washed out of the medium. We also constrain the monomeric protein. The specification functionals are the integral of the absolute difference to some target value x* (s) for the monomer and the dimer concentration over two small time intervals for each. More specifically,
where w is the temporal weight function chosen to be
The actual values for time-intervals for w1 and w2, as well as the target values are shown together with the trajectories for the nominal system (9) in Figure 3.
Figure 3. Time courses of monomer (A, x2) and dimer (A2, x3) concentration of (9) for an addition and removal of the inhibitor (I, y); the target values and time intervals chosen for the specification functionals are indicated by solid black lines.
For this case study we assume that we have means to design the binding rate of the inhibitor to the dimer k7 and the binding rate of the dimer to the promoter k9. To assess the error incurred by the linearization we consider the reverse-forward mapping as described in (7). Hence for various size of δ we perform the inverse mapping with L† and the forward mapping with F. If the inverse map is exact we should obviously obtain a ball with the same δ. Any deviation ε thereof reflects the approximation of F−1 by L†. In Figure 4 the images of under L† and F ◦ L† are shown for various radii δ.
Figure 4. Contours of (blue) in feature space (first row) are mapped back to the parameter space via L† (second row) and mapped forward using F (red) for increasing size of δ (from left to right).
Hence, for an intermediate size of δ a good trade-off between approximation accuracy and sampling coverage is achievable. A systematic sampling of a predetermined specification area S would proceed by successively sampling overlapping balls with radii adapted to maintain ε under a certain value as illustrated in Figure 5. In this example, the coverage of the region S is above 98% using 50 balls of different radii. The lower left corner of the specification space (Figure 5A) maps to a strongly nonlinear region of the parameter space (upper right corner in Figure 5B) and therefore forces the use of smaller balls to keep the error in acceptable range. On the contrary, the upper right region of the specification space is more linear and larger balls can be used with limited relative error (Figure 5C).
Figure 5. Covering a certain specification range S (black rectangle) by overlapping balls (A) which in turn yields overlapping ellipsoids in the parameter space (B). The precision of the mapping is illustrated by the reverse-forward map in (C). The centers of the balls are illustrated by crosses.
We presented a novel method to determine the parameter region of a biochemical reaction network that is consistent with a certain dynamical, behavioral specification. We defined specifications in a novel and general way that requires only the specification map to be once differentiable with respect to the states of the underlying differential equations. We showed that by locally linearizing this map we can solve the desired inverse problem of finding a parameter region for a given specification. As regions, instead of points, are mapped back to parameter space the scheme is in principle able to cover (given some regularity conditions) the feature and parameter space - something that is not possible with point-wise sampling. We also discuss means for estimating the size of the local neighborhood in order to guarantee certain approximation errors. The computational framework allows a very flexible definition of biologically relevant behavorial features and efficient determination of the corresponding parameter region. Hence, the range of experimentally modifiable parameters, such as promoter binding strength can be determined upfront before the experimental synthesis of a synthetic construct.
Throughout this work we only considered models based on ordinary differential equations, but the outlined framework can be extended to include stochastic dynamical models through the use of moment closure methods, for instance. In general, the specification functional will then involve the expectation operator and Monte Carlo sampling may be required to approximate it. Methods from stochastic sensitivity analysis  can be applied in order to perform the local inversion.
The authors declare that they have no competing interests.
HK, MH and JL devised the method, MH and JL implemented the algorithm and generated results for the case study. HK and MH wrote the paper.
Publication of this article was supported by the Swiss National Science Foundation (SNSF) grant number PP00P2_128503.
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 10, 2013: Selected articles from the 10th International Workshop on Computational Systems Biology (WCSB) 2013: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S10.
Miller M, Hafner M, Sontag E, Davidsohn N, Subramanian S, Purnick PEM, Lauffenburger D, Weiss R: Modular Design of Artificial Tissue Homeostasis: Robust Control through Synthetic Cellular Heterogeneity.
Theoretical Computer Science 2011, 412(26):2827-2839. Publisher Full Text