Abstract
With recent improvements of protocols for the assembly of transcriptional parts, synthetic biological devices can now more reliably be assembled according to a given design. The standardization of parts open up the way for in silico design tools that improve the construct and optimize devices with respect to given formal design specifications. The simplest such optimization is the selection of kinetic parameters and protein abundances such that the specified design constraints are robustly satisfied. In this work we address the problem of determining parameter values that fulfill specifications expressed in terms of a functional on the trajectories of a dynamical model. We solve this inverse problem by linearizing the forward operator that maps parameter sets to specifications, and then inverting it locally. This approach has two advantages over bruteforce random sampling. First, the linearization approach allows us to map back intervals instead of points and second, every obtained value in the parameter region is satisfying the specifications by construction. The method is general and can hence be incorporated in a pipeline for the rational forward design of arbitrary devices in synthetic biology.
Introduction
Synthetic biology places emphasis on small, standardized molecular parts and devices, mostly operating at the transcriptional level [1,2]. With standardization comes the need for rigorous quantitative characterization of such devices and for a compositional theory to reliably build larger systems from small canonical circuits. For now most synthetic circuits implemented in vivo were constructed from a small number of components with topology and parameter values found by trialanderror. The development of larger synthetic systems necessitates the use of appropriate design methodologies. In silico analyses can provide significant insights into the construction of complex synthetic systems, but due to the poor quantification of experimental and microenvironmental conditions, the predictive capability of in silico models for in vivo implementations remains limited. Apart from experimental limitations, modeling attempts to date most often make simplifying assumptions about all the perturbations that a synthetic construct is facing in vivo. For instance, only a few studies account for the large extrinsic noise [35] and in particular the one introduced by variations of plasmid copy number [6]. Incorporating those realistic in vivo constraints will make computational models more predictive, eventually enabling the upfront in silico optimization of transcriptional circuits. A first step toward this goal is to investigate the parameter dependency of certain behaviorial properties of a circuits. In systems biology attempts have already been made to address this problem, however, they either rely on purely local measures [7,8] such as considered in classical sensitivity analysis [9,10], or perform random parameter sampling [11] to determined parameter dependencies.
For a given circuit topology, kinetic parameters and other parameters that are involved in controlling the expression level of molecular species (e.g. promoter activity or number of ribosome binding sites) are important design parameters in synthetic biology. A major challenge is to find a set of parameters that satisfies the behavioral specification of a device [12]. Computer science offers various languages to formally define the proper functioning of a piece of code or hardware. Such specification languages of formal verification are used to check important behavioral properties, such as liveness, safety or fairness [13]. One convenient way to specify such properties is to use temporal logic, which is considered an extension of classical propositional reasoning, where propositional variables may change their truth values over time. A prominent such logic is the linear temporal logic (LTL), where the truth value of the propositions is interpreted over a linear timeline [13]. Such techniques were already applied to investigate robustness of computational models in system biology [14].
Mathematically, the design problem is an inverse problem and hence inherits the general feature of such problems, namely illposedness [15,16]. More specifically, for a certain behavioral specification one aims to find the corresponding parameter set that gives rise to such behavior. An simple example for a quantity in feature space could be the concentration of a molecular species at particular timepoints. The problem is closely related to parameter optimization and even more so to robust optimization, where an objective function  generally encoding some behavioral constraint (e.g. making model trajectories close to the measurements)  is optimized to yield the optimal parameter set. Illposedness refers to the observation that two closeby points in specification or behavioral feature space may map to very distant points in the parameter space, indicating that this mapping is generally not contractive but rather expansive. The inverse and corresponding forward problem is illustated in Figure 1.
Figure 1. (A) The forward problem of defining a parameter set from which trajectories and their behavioral features are computed. (B) The inverse problem of finding a parameter regions for a predetermined behavioral specification region S. Columns from left to right correspond to parameter space, trajectory space and behavioral feature space, respectively. Connected convex sets can map to nonconvex nonconnected regions.
In the current analysis we restrict ourselves to models obeying the reaction rate equation and hence constitute a set of nonlinear ordinary differential equations. In general, connected domains may map to disconnected domains, for instance if the dynamical system contains bifurcation points (e.g. see Figure 1). For the proposed linearization approach we will further restrict ourselves to connected domains in the respective image space. Moreover, we will not resort to specifying behavior through temporal logics but will define general specification functionals. These are mappings ψ from an appropriate function space χ of ndimensional trajectories (e.g. )) to the mdimensional reals and we choose the form
with and the feature kernel , where . A special and more tractable version of the kernel is the convolution, i.e. g(t, x(t)) = h(T − t)x(t). In the following we will only require the map x → g(·, x) to be oncedifferentiable. With this, we can define the forward map from a pdimensional parameter space to the feature space as the composition F ≡ ψ ο φ, with . The trajectories are generated by the reaction rate equation
with the stoichiometric matrix , the reaction flux vector and the parameter set.
Methods
The bruteforce method of determining the parameter region that satisfies a certain behavioral specification usually proceeds by Monte Carlo sampling of parameter sets, generating corresponding trajectories according to (1), checking whether those satisfy S and finally retaining only those parameter sets that led to satisfied specification S. There are two immediate downsides of this approach. First, most draws will be unsuccessful for high dimensional parameter spaces, for tight specifications, or for both. Different approaches using an optimized sampling [11,17] have been developed to mitigate this problem, but are not solving it as they require convergence of the sampling. Second, drawing parameter points in does not provide guarantees that those points belong to a connected domain of consistent parameter sets. Here we provide first attempts to tackle both problems.
The main idea is to locally linearize the forward map F around some point and then locally invert it. Hence, a small enough local patch in feature space can be mapped backward to a small patch in parameter space. By successively sampling expansion points in their neighborhoods (e.g. by the ballwalk algorithm [18]) we can systematically cover the entire specification S and obtain the corresponding parameter region. A series expansion of F around some initial parameter set k^{0 }reads
Defining df ≡ F (k^{0 }+ dk) − F (k^{0}) we see that a neighborhood df in feature space to first order can be mapped backward using the MoorePenrose pseudoinverse
that we define with care as
where L denotes the linearized forward map and hence is just the m × p matrix
Note, that the limit in (2) exists even if the inverse of L^{T }L and LL^{T }do not exist. Such situations are encountered as soon as the number of specification features m are less than the number of parameters, i.e. the dimension p of the parameter space. Importantly, we can compute (3) efficiently using the variational equation for the system (1). Observe that
where the last terms in the integral is just the sensitivity of the solution of (1) to perturbations in k around k^{0}. According to the variational equation the sensitivity obeys the following ordinary n × p matrix differential equation
where we skipped the explicit dependency on k^{0 }for brevity. Note, that (4) is equivalent to the transient sensitivity analysis of metabolic networks [9,10], proposed as an extension of classical metabolic control analysis that only deals with steady state sensitivities. For a certain k^{0 }the sensitivity of the kernel g is a constant m × n matrix that can be computed explicitly. Thus, by jointly solving (1) and (4) for some k^{0 }together with
up to time T we obtain the linearized map L = L(T). Hence, for every sampled k^{0 }and associated feature point f ^{0 }we propose to design a feature ball
and map it backward using L^{†}. According to the singular value decomposition L^{† }= UΣV with Σ a diagonal matrix with nonnegative entries [16], the backward transformation needs to be a sequence of a rotation, a scaling and another rotation and hence the image of under L^{† }can only be a ellipsoid in the parameter space
Clearly, sampling a multivariate region with balls of same dimension allow for a complete coverage of the region  something that can only be extrapolated when using pointwise sampling [11]. The question to efficiently sample a region with balls has been addressed in computational geometry and efficient randomized algorithms are available [18].
We remark that the map L is not the best local approximation to F(k) in some norm sense. More specifically we can improve on L if we are giving additional samples of the neighborhood . Consider we draw another , then we can construct a rankone update to L
where ΔF ≡ F(k^{i}) − F(k^{0}) and Δk ≡ k^{i }− k^{0}. In particular, the rankone term (5) captures the nonlinear part of F. From (5) it follows that the matrix satisfies the consistency property
Thus, knowing how to construct rankone updates over the domain of interest is equivalent to knowing F(k) locally. In fact, is the matrix closest to L, with respect to the Frobenius norm, that satisfies (6). Subsequently we will use this improved linear approximation to F to bound the error that one can incurrs if one uses the pseudoinverse L^{† }for the backward map. This will also provide means to determine the maximal ball size δ to stay below a certain error bound. We quantify the error in the feature space by the backward map followed by a forward map. That is, we want to find a δ such that
Now suppose we know a bound ρ(δ) for the Frobenius norm of the rankone perturbation, i.e. in the local domain of interest. Note, that ρ(δ) could and need to be estimated by sampling. Given a the maximal error of the inverseforward map is
which is known from robust linear squares [16] to be equivalent to the error
Assuming that L has linearly independent rows, LL^{† }is the identity matrix and thereby the error simplifies to
This result provides one way to determine the radius of the feature ball δ when relying on the pseudoinverse
Results
As a proof of concept of our method, we applied it to a simple synthetic sensor construct [19]. The system is made of several gene copies (e.g. with plasmid transfection), expressing a protein that dimerizes and activates the gene by binding to the promoter. In presence of the inhibitor (input of the system), the dimer is trapped and cannot bind to the promoter. A schematic of the involved reactions is depicted in Figure 2.
Figure 2. Simple transcriptional sensor construct. The dimerized form (A_{2}) of a protein (A) is its own positive regulator; the inhibitor (I) tethers the dimer away in an inactive form (A_{2 }− I).
The system is simulated according to massaction and obeys
where the states x_{i }denote the concentration of mRNA, protein, proteindimer and dimerpromoter complex, respectively. The quantities and y(t) refer the total number of promoters and the external inhibitor concentration, respectively. The nominal value and the meaning of the model parameters are summarized in Table 1. We remark that such continuous statespace model have their limitations for transcriptional circuits because they require several gene copies in order to neglect the discrete Boolean nature of a single gene.
Table 1. Nominal values and meaning of the kinetic parameters for the model of the synthetic sensor construct.
For the specified behavioral features, we expect the dimer to drop quickly after introduction of inhibitor and then quickly regain a high level after the inhibitor is washed out of the medium. We also constrain the monomeric protein. The specification functionals are the integral of the absolute difference to some target value x* (s) for the monomer and the dimer concentration over two small time intervals for each. More specifically,
where w is the temporal weight function chosen to be
The actual values for timeintervals for w_{1} and w_{2}, as well as the target values are shown together with the trajectories for the nominal system (9) in Figure 3.
Figure 3. Time courses of monomer (A, x_{2}) and dimer (A_{2}, x_{3}) concentration of (9) for an addition and removal of the inhibitor (I, y); the target values and time intervals chosen for the specification functionals are indicated by solid black lines.
For this case study we assume that we have means to design the binding rate of the inhibitor to the dimer k_{7} and the binding rate of the dimer to the promoter k_{9}. To assess the error incurred by the linearization we consider the reverseforward mapping as described in (7). Hence for various size of δ we perform the inverse mapping with L^{† }and the forward mapping with F. If the inverse map is exact we should obviously obtain a ball with the same δ. Any deviation ε thereof reflects the approximation of F^{−}1 by L^{†}. In Figure 4 the images of under L^{† }and F ◦ L^{† }are shown for various radii δ.
Figure 4. Contours of (blue) in feature space (first row) are mapped back to the parameter space via L^{† }(second row) and mapped forward using F (red) for increasing size of δ (from left to right).
Hence, for an intermediate size of δ a good tradeoff between approximation accuracy and sampling coverage is achievable. A systematic sampling of a predetermined specification area S would proceed by successively sampling overlapping balls with radii adapted to maintain ε under a certain value as illustrated in Figure 5. In this example, the coverage of the region S is above 98% using 50 balls of different radii. The lower left corner of the specification space (Figure 5A) maps to a strongly nonlinear region of the parameter space (upper right corner in Figure 5B) and therefore forces the use of smaller balls to keep the error in acceptable range. On the contrary, the upper right region of the specification space is more linear and larger balls can be used with limited relative error (Figure 5C).
Figure 5. Covering a certain specification range S (black rectangle) by overlapping balls (A) which in turn yields overlapping ellipsoids in the parameter space (B). The precision of the mapping is illustrated by the reverseforward map in (C). The centers of the balls are illustrated by crosses.
Conclusion
We presented a novel method to determine the parameter region of a biochemical reaction network that is consistent with a certain dynamical, behavioral specification. We defined specifications in a novel and general way that requires only the specification map to be once differentiable with respect to the states of the underlying differential equations. We showed that by locally linearizing this map we can solve the desired inverse problem of finding a parameter region for a given specification. As regions, instead of points, are mapped back to parameter space the scheme is in principle able to cover (given some regularity conditions) the feature and parameter space  something that is not possible with pointwise sampling. We also discuss means for estimating the size of the local neighborhood in order to guarantee certain approximation errors. The computational framework allows a very flexible definition of biologically relevant behavorial features and efficient determination of the corresponding parameter region. Hence, the range of experimentally modifiable parameters, such as promoter binding strength can be determined upfront before the experimental synthesis of a synthetic construct.
Throughout this work we only considered models based on ordinary differential equations, but the outlined framework can be extended to include stochastic dynamical models through the use of moment closure methods, for instance. In general, the specification functional will then involve the expectation operator and Monte Carlo sampling may be required to approximate it. Methods from stochastic sensitivity analysis [20] can be applied in order to perform the local inversion.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
HK, MH and JL devised the method, MH and JL implemented the algorithm and generated results for the case study. HK and MH wrote the paper.
Declarations
Publication of this article was supported by the Swiss National Science Foundation (SNSF) grant number PP00P2_128503.
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 10, 2013: Selected articles from the 10th International Workshop on Computational Systems Biology (WCSB) 2013: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S10.
References

Nandagopal N, Elowitz MB: Synthetic biology: integrated gene circuits.
Science (New York, NY) 2011, 333(6047):12448. PubMed Abstract  Publisher Full Text

Lu TK, Khalil AS, Collins JJ: Nextgeneration synthetic gene networks.
Nature Biotechnology 2009, 27(12):113950. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bowsher CG, Swain PS: Identifying sources of variation and the flow of information in biochemical networks.
Proceedings of the National Academy of Sciences of the United States of America 2012, 109(20):E13208. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hilfinger A, Paulsson J: Separating intrinsic from extrinsic fluctuations in dynamic biological systems.
Proceedings of the National Academy of Sciences of the United States of America 2011., 108(29) PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zechner C, Ruess J, Krenn P, Pelet S, Peter M, Lygeros J, Koeppl H: Momentbased inference predicts bimodality in transient gene expression.
Proceedings of the National Academy of Sciences of the United States of America 2012, 109(21):83405. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bleris L, Xie Z, Glass D, Adadey A, Sontag E, Benenson Y: Synthetic incoherent feedforward circuits show adaptation to the amount of their genetic template.
Molecular Systems Biology 2011, 7(519):519. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Brown SK, Sethna JP: Statistical mechanical approach to models with many poorly known parameters.

Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, Sethna JP: Universally Sloppy Parameter Sensitivities in Systems Biology Models.
PLoS Comput Biol 2007, 3(10):e189. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hatzimanikatis V, Bailey JE: MCA has more to say.
Journal of Theoretical Biology 1996, 182(3):23342. PubMed Abstract  Publisher Full Text

Ingalls BP, Sauro HM: Sensitivity analysis of stoichiometric networks: an extension of metabolic control analysis to nonsteady state trajectories.
Journal of Theoretical Biology 2003, 222:2336. PubMed Abstract  Publisher Full Text

Hafner M, Koeppl H, Hasler M, Wagner A: 'Glocal' robustness analysis and model discrimination for circadian oscillators.
PLoS Computational Biology 2009, 5(10):e1000534. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Miller M, Hafner M, Sontag E, Davidsohn N, Subramanian S, Purnick PEM, Lauffenburger D, Weiss R: Modular Design of Artificial Tissue Homeostasis: Robust Control through Synthetic Cellular Heterogeneity.
PLoS Comput Biol 2012, 8(7):e1002579. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Baier C, Katoen JP: Principles of Model Checking. The MIT Press, London; 2008.

Rizk A, Batt G, Fages F, Soliman S: Continuous valuations of temporal logic specifications with applications to parameter optimization and robustness measures.
Theoretical Computer Science 2011, 412(26):28272839. Publisher Full Text

Engl H, Hanke M, Neubauer A: Regularization of inverse problems, Mathematics and its applications. Kluwer; 1996.

Christian HP: RankDeficient and Discrete IllPosed Problems. Society for Industrial and Applied Mathematics; 1998.

ZamoraSillero E, Hafner M, Ibig A, Stelling J, Wagner A: Efficient characterization of highdimensional parameter spaces for systems biology.
BMC Systems Biology 2011, 5:142. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Hooshangi S, Thiberge S, Weiss R: Ultrasensitivity and noise propagation in a synthetic transcriptional cascade.
Proceedings of the National Academy of Sciences of the United States of America 2005, 102(10):35816. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sheppard P, Rathinam M, Khammash M: A pathwisederivative approach to the computation of parameter sensitivities in discrete stochastic chemical systems.
Journal of Chemical Physics 2012, 136:034115. PubMed Abstract  Publisher Full Text  PubMed Central Full Text