Skip to main content
  • Methodology article
  • Open access
  • Published:

Modeling miRNA-mRNA interactions: fitting chemical kinetics equations to microarray data

Abstract

Background

The miRNAs are small non-coding RNAs of roughly 22 nucleotides in length, which can bind with and inhibit protein coding mRNAs through complementary base pairing. By degrading mRNAs and repressing proteins, miRNAs regulate the cell signaling and cell functions. This paper focuses on innovative mathematical techniques to model gene interactions by algorithmic analysis of microarray data. Our goal was to elucidate which mRNAs were actually degraded or had their translation inhibited by miRNAs belonging to a very large pool of potential miRNAs.

Results

We proposed two chemical kinetics equations (CKEs) to model the interactions between miRNAs, mRNAs and the associated proteins. In order to reduce computational cost, we used a non linear profile clustering method named minimal net clustering and efficiently condensed the large set of expression profiles observed in our microarray data sets. We determined unknown parameters of the CKE models by minimizing the discrepancy between model prediction and data, using our own fast non linear optimization algorithm. We then retained only the CKE models for which the optimized fit to microarray data is of high quality and validated multiple miRNA-mRNA pairs.

Conclusion

The implementation of CKE modeling and minimal net clustering reduces drastically the potential set of miRNA-mRNA pairs, with a high gain for further experimental validations. The minimal net clustering also provides good miRNA candidates that have similar regulatory roles.

Background

Transcriptional and translational processes are fundamental cell mechanisms, involving three main molecular species: messenger RNA (mRNA) and their associated proteins, as well as microRNAs (miRNAs).

The miRNAs are small non-coding RNAs of roughly 22 nucleotides in length, which can bind with and inhibit protein coding mRNAs through complementary base pairing. A given miRNA can potentially bind and silence hundreds of mRNAs across a number of signaling pathways. These repressive miRNA-mRNA interactions occur in multiple cellular processes [13], and involve two distinct modalities: they may directly degrade their target mRNAs, or more often inhibit their translation [49].

The best characterized features determining the targets of a specific miRNA are the conserved Watson-Crick pairing to the 5’ region (positions 2–7) of the miRNA, which are the so-called “seed pairing rules” [3, 1013]. Since seed pairing rules are neither sufficient nor necessary for miRNA-target functions [4, 14], they have usually been combined with microarray or proteomic analysis to find potential miRNA-target pairs[1517]. Classical microarray data analysis relies mostly on massive application of correlation analysis and linear statistical techniques to simultaneously acquired gene expression profiles. Combined with profile thresholding and heat map displays, these techniques provide commonly used clues for qualitative inference.

In [18], Principal Component Analysis and linear correlation had been linked with comparative sequence analysis to study microarray data recorded during mouse stem cells differentiation, and to broadly predict potential miRNA-mRNA interactions.

To go beyond the results of the linear microarray analysis applied on time-course microarray data in [18], we have formalized two basic architectures for repressive miRNA-mRNA interactions: “Transcription Degradation” (TD) and “Translation Inhibition” (TI).

Traditional chemical kinetics equations had been proposed to model the transcriptional and translational processes without involving the interaction of miRNAs [19, 20].

We have derived Chemical Kinetics Equations (CKEs) to model the dynamics of TD and TI motifs, in the spirit of [2030]. The equations are algebraically invariant under affine transformation and allow data condensation to reduce computational cost. We have implemented “minimal net” clustering method, which can control the maximum diameter of the clusters, to condense large data sets of gene expression.

Modeling by nonlinear CKEs involves complicated parameter estimation problems to fit the very large set of expression profiles recorded by microarrays. We have developed innovative fast algorithms dedicated to CKE parameter estimation, by optimization of the quality of fit between model and data.

We validate only the parameterized motifs having a high quality of fit to data. To reach robust conclusions we apply a “parameter parsimony” principle, favoring the models having the smallest number of parameters. And we have also evaluated the robustness of our parameter estimation algorithm by algorithm by direct testing on simulated data. These tests did validate our parameter estimation method is quite robust. This nonlinear approach goes further than well-established analysis based on correlation techniques combined with heat map displays.

Methods

Basic interaction architectures

To validate if an miRNA M does indeed repress a given gene G, we model the chemical interactions of M and G within a small network containing the pair (M,G). We now sketch two basic interaction architectures and their CKE models.

Transcription Degradation Motifs (TD-motifs)

We call “TD-motif” any interaction architecture, as sketched in Figure 1, involving a single miRNA-mRNA pair (M,G) where G is in Target(M), and M degrades the transcription of G. The TD motif includes also two sets of proteins rep(G) and act(G), namely the transcriptional “repressors” and “activators” of G, denoted by

rep ( G ) = { R 1 , R k } and act ( G ) = { A 1 , A 2 , , A q }

Let g(t),p(t),m(t),r i (t),a j (t), be the expression levels at time t for the chemical species G,P,M,R i ,A j . We model the transcription process by a CKE similar to CKEs proposed in [20, 22, 25, 29, 30], but with a complementary term encoding the repressive impact of miRNA M on its target mRNA G:

dg ( t ) dt =-βg(t)-vg(t)m(t)+κREP(t)[1-ACT(t)]
(1)
Figure 1
figure 1

TD-motif and TI-motif.

where β>0 is the degradation rate of G, v>0 is the reaction rate between G and M, κ>0 is the product of the transcription rate by the concentration c of DNA templates, concentration which we assume to be some constant not depending on time (see [20]).

The percentage 0≤F(t)=R E P(t)[ 1-A C T(t)]≤1% is the fraction of existing DNA templates which are committed at time t to transcription of the mRNA gene G. Here the percentages R E P(t) and A C T(t) are modeled by the following products,

REP ( t ) = REP 1 ( t ) × × REP q ( t ) ACT ( t ) = ACT 1 ( t ) × × ACT k ( t )
(2)

where

REP i ( t ) = 1 ( 1 + u i r i ( t ) ) SR i ACT j ( t ) = 1 ( 1 + w j a j ( t ) ) SA j
(3)

The parameters S R i >0,u i >0 and S A j >0,w j >0 are the number of binding sites and the affinity constant for the transcriptional factors R i and A j .

Note that the transcription repressors R i combine multiplicatively their individual impacts R E P i in R E P(t), and that the R E P i are analogous to Hill function (see [19, 25]); similar remarks apply to the transcription activators. The multiplicative expressions of R E P(t) and A C T(t) are typical of a so-called “cis-regulatory” function and have been derived by J. Goutsias [20].

The term κ R E P(t)[ 1-A C T(t)]d t is the concentration of new G molecules synthesized by transcription during the small time interval [ t,t+d t], while the repressive interactions of M and G eliminates v g(t)m(t)d t molecules of G, and natural decay destructs β g(t)d t molecules of G.

Translation-Inhibition Motifs (TI-motifs)

We call “TI-motif” any interaction architecture, as sketched in Figure 1, involving a set M 1,…,M r of miRNAs inhibiting the translation of mRNA gene G, by repressing the expression of the protein P generated by G. Let (p(t),m i (t),g(t)) be the concentrations at time t of protein P, miRNA M i , and mRNA G. In the spirit of [20, 21], we model the translation inhibition dynamics by the CKE

dp dt =-γp(t)+λg(t)H(t)
(4)

where γ>0 and λ>0 are the degradation rate and the translation rate for protein P and where H(t) is the percentage of G molecules committed at time t to the translation of G. Thus H(t) encodes the inhibiting impact of the miRNAs M 1,…,M k on the translation of G, and is modeled as a product of terms similar to R E P(t):

H ( t ) = H 1 ( t ) × × H k ( t ) where H i ( t ) = 1 ( 1 + u i m i ( t ) ) SM i
(5)

The parameters S M i >0 and u i >0 are resp. the number of binding sites and the affinity constant controlling the inhibiting impact of miRNA M i on the translation of gene G. Note that H(t) decreases when the miRNA concentrations m i (t) increase.

Our model for TI motif has been inspired by J. Goutsias [20], and we present below the hypotheses and arguments justifying the expression of H(t). There is a key difference between the CKEs modeling TD and TI motifs. For TI motifs, the concentration of G-molecules committed to translation is g(t)H(t), where g(t) is the concentration of G-molecules and H(t)<100%.

For TD motifs, the concentration of G-molecules synthesized by transcription is κ F(t)=α c F(t) with F(t)<100%, where α and c are respectively the transcription rate and the concentration of DNA templates, assumed to be constant in time.

Derivation of chemical kinetics equations

Our derivation of the regulation equation for TI motif is quite similar to presentation given for TD motifs in [20], but has several changes in assumptions and formulations. To derive the CKEs (1) and (4), we propose a few hypotheses.

  • Hyp. 1: (TD-motifs) The molecules of the miRNA repressor of gene G can strongly bind only at one unique specific site of G- molecules, and once a single such strong bind occurs, the corresponding G molecule degrades extremely fast.

  • Hyp. 2: (TI-motifs) Each miRNA M j in the set r e p(G) of translation inhibitors of G can weakly bind with G but only at specific binding sites constituting a set B I N D j of size S j . The sets B I N D 1,B I N D 2,… are pairwise disjoint. Once a G molecule thus binds with one inhibitor M j , then this G molecule will fail to translate.

  • Hyp. 3: For any given G-molecule, call X j the random number of sites in B I N D j which actually bind with one “ M j ”-molecule. We assume that the random variables X 1,X 2,…. are independent.

Hyp.1 is based on the fact that only a small fraction of all messenger RNAs have more than a single miRNA binding site and miRNA bound with an mRNA gene G has a limited effect on the mRNA G, but affects more substantially the protein P generated by G ([4, 31].

Consider first any TI motif involving an mRNA gene G generating the protein P, as well as a set r e p(G)={M 1,M 2,…M k } of k translation inhibitors. We now derive the CKE (4) for this TI motif. Call p(t),g(t),m j (t) the concentrations of P,G,M j and let H(t) be at time t the percentage of existing G molecules committed to the translation of G into P. The basic CKE driving translation of G into P is then, as seen in [20],

dp dt =-γp(t)+λg(t)H(t)
(6)

where γ,λ, are the degradation and translation rates of P.

The main point is to compute H(t).

For k=0, there are no translation inhibitors, and hence H(t)=100%. For k=1, let Q[ S] be the set of G molecules with exactly S binding sites bound to M 1 molecules, where 0≤SS 1. Let q(S,t) be the concentration of Q[S] molecules. We have the forward and backward reactions

Q [ S ] + M 1 Q [ S + 1 ] and Q [ S + 1 ] Q [ S ] + M 1
(7)

By molecular collision theory [23], the concentration ρ(t) of free M 1 molecules which bind at time t on Q[ S]-molecules to produce Q[ S+1]-molecules by forward reaction 7, is proportional to the product of m 1(t)q(S,t) by the number (S 1-S) of vacant binding sites. Hence, for some constant c,

ρ(t)=c( S 1 -S) m 1 (t)q(S,t)
(8)

Similarly the concentration τ(t) of M 1 molecules freed by backward reaction 7 is given by

τ(t)= c ~ (S+1)q(S+1,t)
(9)

for some constant c ~ . At chemical equilibrium, we have τ(t)=ρ(t), and hence

q ( S + 1 , t ) = u m 1 ( t ) S 1 - S S + 1 q ( S , t )

where u is a new constant. By recurrence on S, this implies

q ( S , t ) = C S 1 S [ u m 1 ( t ) ] S q ( 0 , t ) where C S 1 S = ( S 1 ) ! S ! ( S 1 - S ) !
(10)

This entails

g ( t ) = S = 0 S 1 q ( S , t ) = q ( 0 , t ) S = 0 S 1 C S 1 S [ um ( t ) ] S = [ 1 + u m 1 ( t ) ] S 1 q ( 0 , t )
(11)

The set of G-molecules committed to translation into P at time t is identical to q(0,t). Thus H(t)=q(0,t)/g(t), and (11) yields H(t)=1/ [ 1 + u m 1 ( t ) ] S 1 .

Using hypothesis Hyp 3, a recurrence on k extends the argument just given for k=1, to prove that for k≥1, the percentage H(t) of G molecules committed to translation into P is given by:

H(t)= 1 [ 1 + u 1 m 1 ( t ) ] S 1 ×× 1 [ 1 + u k m k ( t ) ] S k
(12)

which proves CKE (4) for TI motifs.

For TD motifs, the cis-regulation function F(t) in CKE (1) has been derived in [20]. By using hypothesis 1 and molecular collision theory, we directly deduce that the concentration the concentration of mRNA G degraded by miRNA M is proportional to the product of both concentrations of G and M. Thus we justify the miRNA degradation term -v g(t)m(t).

Invariance by affine profile transformations

Affine profile transformations

Call ΓR q the set of all possible expression level profiles r(t) indexed by time t 1,…,t q . To any pair T=(a,b) of real numbers, we associate an affine profile transformation

r Tr defined by r ( t ) ( ar ( t ) + b )

for all time dates t. Let be the set of all such affine profile transformations.

In microarray data sets, expression levels of genes are recorded via optical analysis of fluorescence intensities, and hence depend strongly on experimental acquisition modalities. So relative expression levels between pairs of recorded chemical species are more meaningful quantities, and graphic displays of microarray data by “heat maps” often involve logarithms of raw data.

Our microarray data record expression levels which as a first approximation can be viewed as unknown affine transformations of concentrations. Since our CKEs (1), (4) were derived for concentrations, we need to check how these CKEs and their parameters change under generic affine profile transformations.

Affine invariance of CKE models

Consider first a TD-motif involving an mRNA gene G, a protein P, an miRNA M, transcription repressive proteins R 1,R 2,…, and transcription activating proteins A 1,A 2,…. The CKE model (1) links the concentrations g,p,m,r i ,a j of G,M,P,R i ,A j . Let ĝ, p ̂ , m ̂ , r ̂ i , â j be the corresponding recorded expression profiles. Assume that each such recorded profiles r ̂ is linked to the concentration profile 4 by some unknown affine profile transformation T r .

From CKE (1), one directly deduces that ĝ, p ̂ , m ̂ , r ̂ i , â j verify a new CKE having an algebraic form completely similar to CKE (1), but where the original parameters β,v,κ,u i ,w j are replaced by new parameters β ̂ , v ̂ , κ ̂ , û i , ŵ j , and where the integers S R 1,S R 2,…, and S A 1,S A 2,…, remain unchanged.

The new parameters are easily expressed in terms of the original ones and of the coefficients of the affine transformations, but this is irrelevant practically since we will use microarray data to directly compute the new parameters for a CKE of type (1) linking recorded expression levels.

Similar computations for TI-motifs show that this affine invariance property also holds for CKE (4). Hence to model a TD or a TI motif, we can fit a CKE model of type (1) or (4) to recorded expression profiles, even though the theoretical model justification involved true concentrations, which are not directly measured by microarrays.

The key assumption is that, for each chemical species C, the expression levels ĉ(t) of C recorded by microarray are approximately linked to the concentration c(t) of C by some affine relation ĉ(t)=a(C)c(t)+b(C), where the unknown coefficients a(C) and b(C) may depend on the species C.

The preceding algebraic model invariance under multiple affine profile transformations strongly suggests that adequate distances between dynamic profiles of recorded expression levels should be invariant under affine profiles transformations, as developed in the next section.

Condensation of expression levels profiles

Microarray data typically record several tens of thousands gene expression profiles. So computational costs to fit microarray data to all potential TD-motifs and TI-motifs would of course be prohibitive. A natural option to reduce combinatorial explosion is to cluster the observed profiles.

Since our goal is to model expression profiles by ODEs of type (1) and (4), we need to control the diameters of all clusters of expression profiles. This led us to reject hierarchical clustering as well as K-means clustering, and to implement in the space of expression profiles a “minimal-net” clustering technique, inspired by an innovative technique for automatic generation of prototypes in shape spaces (see [32]). In view of the preceding section, the diameters of clusters in the space of profiles should be measured by a distance invariant under affine profiles transformations.

Affine invariant distance between profiles

Recall that ΓR q is the set of all profiles r(t) indexed by time dates t 1,…,t q . The mean and variance of a profile r are denoted by

r ¯ = 1 q [ r ( t 1 ) + + r ( t q ) ] ; var ( r ) = 1 q [ ( r ( t 1 ) - r ¯ ) 2 + + ( r ( t q ) - r ¯ ) 2 ]

Define normalized profiles n o r(r) and profiles correlations c o r r(r 1,r 2) by

nor ( r ) ( t ) = r ( t ) - r ¯ var ( r ) ; corr ( r 1 , r 2 ) = < nor ( r 1 ) , nor ( r 2 ) >

where <·,·> is the usual scalar product in R p.

We then define a distance D(r 1,r 2) between profiles r 1 and r 2 by

D ( r 1 , r 2 ) = 2 - 2 corr ( r 1 , r 2 ) )

For any affine profile transformation T in , one has n o r(r)=n o r(T r), and hence D(r 1,r 2)=D(n o r(r 1),n o r(r 2). Thus for any affine profile transformations T 1,T 2, and any profiles r 1,r 2 one has

D ( T 1 r 1 , T 2 r 2 ) = D ( r 1 , r 2 )

So the profile distance D is invariant by affine profile transformations

Minimal net clustering

The set Γ of all profiles is now endowed with a distance D invariant by affine profile transformations. Call mPR the large set of miRNA profiles recorded at time t 1,…,t q by our microarrays. We fix a maximum radius ε for profiles clusters. We seek to partition mPR into disjoint clusters C L 1,…,C L r such that each C L j has diameter inferior to 2ε, and we also want the number NN of clusters to be as small as possible. We have implemented an iterative algorithm to generate this type of minimal net clustering, in the spirit of [32].

Define a distance function D(x,y), where x,y represent the observations of the data. Denote by D(x,Y) the distance between an observation x and a set Y, where D(x,Y)= minyY D(x,y). Let Let [ x] denote the cluster containing single point x, and Γ be the set of all observations, the minimal net algorithm is as follows:

  1. 1.

    step 1, let (x 1,y 1)=a r g m a x x,y D(x,y). Then let x 1 and y 1 become the representatives of two initial clusters. Let C L 1= [ x 1],C L 2= [ x 2], C 1={C L 1,C L 2} and R 1=ΓC 1 representing the remaining points in Γ excluding the 2 clusters.

  2. 2.

    After step n-1, we obtain C n-1={C L 1,C L 2,…C L n } and R n-1=ΓC n-1, where C L j is a cluster that has single point, j=1,…,n+1.

    In step n, let HD= max x R n - 1 D(x, C n - 1 ), representing the maximum distance between observation x in R n-1 and set C n-1. If H D>ε, find x HD = argmax x R n - 1 D(x, C n - 1 ). Let C n ={C n-1,x H D }, R n =ΓC n . Repeat until H Dε.

  3. 3.

    Assume the loop stop at step NN, we have C N N ={C L 1,C L 2,…C L N N+1}. For an observation x, find the point y CL k , belonging to cluster C L k in C N N , that is closest to x, i.e. y CL k = argmin y C NN (D(x, C NN )), then assign x to the cluster C L k .

We apply this minimal net clustering algorithm to the set of all miRNA profiles recorded by our microarrays. We define the cluster diameter d i a m(C L) of cluster CL as the maximum distance of any two observations in the cluster, i.e. d i a m(C L)= maxx,yC L D(x,y), where. Compared with the commonly used clustering method such as K-means and Hierarchical clustering, the minimal net algorithm allows us to control the diameter of all clusters by setting the threshold ε representing the supremum radius of the cluster, while the K-means and hierarchical clustering is often used to determine the number of of clusters but not their diameters. If we select a very small threshold ε, the expression levels of genes in the same cluster can be considered almost identical.

Parameter estimation for the CKE models

Parameters estimation strategy

A generic strategy for parameter estimation in CKEs systems is to minimize a cost function evaluating the discrepancy between model predictions and experimental data. Each one of our CKE models has a single output variable, namely the expression level g(t) of mRNA gene G for a TD motif, and the expression level p(t) of protein P for a TI motif. The output variable g(t) or p(t) of a CKE model can be estimated by a function ĝ(t) or p ̂ (t) once one knows the profiles of all other molecular species involved in the model. The estimators ĝ(t) or p ̂ (t) are respectively determined by CKE (1) or CKE (4).

Each CKE models is parameterized by a parameter vector w of dimension n p a r . The quality of fit of this model with recorded profiles data is quantified by the size E R R(w) of the estimation error defined as follows

ERR ( w ) = max t | f ( t ) - f ̂ ( t ) |

where the output variable f(t) is equal to g(t) for TD motifs and to p(t) for TI motifs. The concrete goal is to find the best parameter vector w by minimization of the lack of fit E R R(w) over all possible values of w.

There are no magic solutions for such non linear minimization problems. Moreover fast computing was essential here, since we usually have to solve a very large number of similar “quality of fit maximization” problems when dealing with large microarray datasets.

We have tested several generic cost minimization approaches (see [3335]) such as “genetic algorithms”, as well as “gradient descent” to minimize a sum of squared modeling residuals. These two techniques turned out to require far too much computing time and were often unreliable due to their high dependence on initialization values.

We hence developed our own fast CKE parametrization algorithms to optimize the quality of fit between CKE models and microarray profiles data. This optimized quality of fit, adequately balanced by a systematic emphasis on parsimoniously parameterized models, becomes an essential clue to decide which potential interactions one should validate between miRNAs, mRNAs, and associated proteins.

Parameters parsimony requirement

Robustness of CKE parametrization is the main motivation for our parameter parsimony requirements. Consider CKE (1) modeling a TD-motif TDM involving n a c t activators and n r e p repressors for the transcription of mRNA gene G, and one miRNA M degrading the transcription of G. Then the number n p a r of unknown parameters is n p a r =3+2(n a c t +n r e p ). Each profile g(t),m(t),a i (t),r j (t) is recorded at t=t 1,…,t p . The CKE outputs for TDM are g(t 1),…,g(t p ). Hence each microarray data set provides (p-1) equations linking the n p a r unknown parameters, since the recorded g(t) should be very close to the predicted values ĝ(t) obtained by solving the ODE (1) with initial value g(t 1).

Vapnik’s results on model fitting (see [36]) show that robust accuracy of parameter estimates requires fairly high values of the ratio of (p-1)/n p a r . So n p a r <<(p-1) is a necessary constraint, and we will impose the parameter parsimony requirement n p a r ≤(p-1)/4. Indeed when n p a r >(t p -1), CKE models are overfitted and parameters are poorly estimated.

CKE Parameter estimation

TI-motifs: plausible ranges for parameters

Consider a generic TI motif TIM involving an mRNA gene G, its associated protein P, and k translation inhibiting miRNAs [ M 1,…,M k ]. Let p(t),g(t),m i (t) be the expression levels of P,G,M i . We want to model TIM by CKE (4).

Parametrized by the vector

w = [ γ , λ , u 1 , S 1 , , u k , S k ]

which involves (2k+2)≤8 parameters.

The degradation and translation rates γ and λ of protein P were unknown for most proteins.

According to results from [31], only a small percentage 2% of our 30,000 mRNAs have more than 2 potential binding sites for miRNAs, and only 0.02% mRNAs have as many as 7 such binding sites. So in our parameter estimations it is reasonable to restrict the number of binding sites S j for miRNA M j to be at most 5.

TI-motifs: optimizing the quality of fit

The inputs of CKE (4) are the initial value p(t 1) and the expression levels g(t),m 1(t),…,m k (t) recorded at time dates t 1,…,t q .

Discretizing the ODE (4) at time t 1,…,t q , we get

p( t j + 1 )-p( t j )=-γp( t j )+λg( t j )H( t j )
(13)

where the percentage H(t) is as recalled above. By summation this implies, for j=2,3,.…,q-1, the relation

p( t j + 1 )-p( t 1 )=-γQ( t j )+λL( t j )
(14)

where

Q t j = n = 1 j g t n ; L t j = sum n = 1 j g t n H t n

In view of equation (14), when all expression profiles involved have been recorded until time t j , one can predict the still unknown value of p(t j+1) by the following natural estimator p ̂ ( t j + 1 ),

p ̂ ( t j + 1 )=p( t 1 )-γQ( t j )+λL( t j )
(15)

The quality of fit of this CKE model with recorded profiles data will be quantified by the size E R R(w ) of the estimation error numerically computed as follows

ERR(w)= max j = 1 , , q |p( t j )- p ̂ ( t j )|
(16)

which in view of (15) can be reformulated as

ERR(w)= max j = 1 , , T | π j + μ j γ- ν j λ|
(17)

where for j=1,…,q we have set

π j =p( t j )-p( t 1 ) μ j =Q( t j ) ν j =L( t j )
(18)

We seek a parameter vector w minimizing the cost function E R R(w) in the parametric domain defined by

γ > 0 ; λ > 0 ; u i > 0 ; 1 S i 5

TI-motif: parameter estimation algorithm

For each i=1…k, fix an arbitrary integer 1≤S i ≤5. Call m ̄ i and H ̄ i the respective medians over time t of the functions m i (t) and H i (t). Since the function H i (t)=1/ [ 1 + u i m i ( t ) ] S i is monotonous in m i (t), we have

H ̄ i =1/ [ 1 + u i m ̄ i ] S i
(19)

The assumption 0<H i (t)<1 yields 0< H ̄ i (t)<1. We will discretize the possible values of H ̄ i (t) by constraining them to belong to a grid h 1,…,h s of 1≤s≤99 percentage values equally spaced in [ 0,1]. Since m ̄ i is known, we invert equation (19) to compute a corresponding grid G R D i of s potential values for u i = 1 m ̄ i ( ( 1 H ̄ i ) 1 / S i -1).

Now for 1≤ik, select and fix arbitrary values u i in the grid G R D i . After selecting as restrictive above the set of parameter vector U= [ u 1,S 1,…u k ,S k ], the function H(t) is then completely determined for all t.

Note that the set of all possible such choices for U is of cardinal inferior to N=(5s)k. Since we do explore each one of these possibilities separately, we need to keep the number N at a reasonable level, so we selected s=99 for k=1, s=60 for k=2, s=20 for k=3 etc.

Fix any U in . Since all recorded profiles involved in the TD motif are available at time t 1,…,t q , we can use the U value to directly compute all the values H(t j ), and then all the numbers π j ,μ j ,ν j defined by (18). We want to find values of the two last parameters γ>0 and λ>0 which will minimize

ERR ( γ , λ ) = max j = 1 , , q | π j + μ j γ - ν j λ |

This problem is equivalent to minimizing the linear objective function:

Ψ ( γ , λ , z ) = z

under the (2q+3) linear inequality constraints

γ > 0 ; λ > 0 ; z > 0 z - ( π j + μ j γ - ν j λ ) 0 for j = 1 , , q z + ( π j + μ j γ - ν j λ ) 0 for j = 1 , , q

This is a classical constrained linear programming problem, which can be solved by well known fast linear programming algorithms [37], to provide the optimal values γ ,λ ,z . Then z =E R R(γ ,λ ) is the minimal value of E R R(γ,λ).

These optimal values are functions γ (U),λ (U),z (U) of the partial vector of parameters U in . We then select the optimal U in as the value of U which minimizes z (U) over . The optimal parametrization w of our TD motif is then given by w = [ γ (U ),λ (U ),U ]. This new parametrization algorithm is fairly fast and has good accuracy. On a current laptop PC, our non-optimized MATLAB code implementation required less than 5 minutes of CPU time for the parametrization of a typical TI-model with 38 time points and 9 parameters [38]. After code optimization and a re-implementation in C, we expect this CPU time to be reduced to 2 minutes. The algorithm does not require any knowledge of the parameters ranges except for the number of binding sites S, which is an advantage for the range of reaction rates of of many molecules of interest are usually unknown.

TD-motifs: parameter estimation algorithm

The parametrization algorithms just presented also apply to TD models, as we now sketch. The output variable is now the expression level g(t) of mRNA gene G.

Discretize the ODE (1) at time dates t 1,…,t q , to get

g( t j + 1 )-g( t j )=-βg( t j )-vg( t j )m( t j )+κF( t j )
(20)

where F(t) is defined in (1). By summation this implies the relations

g( t j + 1 )-g( t 1 )=-βB( t j )-vV( t j )+κK( t j )
(21)

where

B ( t j ) = n = 1 j g ( t n ) ; V ( t j ) = n = 1 j g ( t n ) m ( t n ) ; K ( t j ) = sum n = 1 j F ( t n )

When all expression profiles involved have been recorded until time t j , one can predict the unknown value of g(t j+1) by the estimator ĝ( t j + 1 ),

ĝ( t j + 1 )=g( t 1 )-βB( t j )-vV( t j )+κK( t j )
(22)

The quality of fit of this CKE model with the recorded profiles data is quantified by the size of the prediction error which is a function E R R(w) of the parameter vector w

ERR w = max j = 1 , , q | g t j - g ̂ t j |

so that

ERR w = max j = 1 , , q | τ j + ρ j β+ η j v- θ j κ|
(23)

where we have set

τ j = g ( t j ) - g ( t 1 ) ρ j = B ( t j ) η j = V ( t j ) θ j = K ( t j )

The TD model parameters S R i >0,u i >0 and S A j >0,w j >0 are the number of binding sites and the affinity constants for the transcriptional factors R i and A j . They constitute a partial parameter vector U which we restrict as above by first imposing a moderate upper bound S m a x on all the integers S R i ,S A j . and by selecting, also as done above, adequate finite grids for the values of the u i and the w j . This constrains U to belong to a finite set . The cardinal N of is forced to remain at most of order 105, by adequate constraints on S m a x and on the coarseness of the u i grids and the v j grids.

To minimize E R R(w), we fix as above an arbitrary U in . We can then compute all the numbers τ j ,ρ j ,η j ,θ j . Since U is fixed, the error size E R R(w) becomes a function E(β,v,κ) of the last 3 positive parameters (β,v,κ), still given by equation (23). As above the minimization of E(β,v,κ) is equivalent to a constraint linear programming problem where we want to minimize the linear function Φ(β,v,κ,z)=z over the following set of (2q+4) linear inequalities

β > 0 ; v > 0 ; κ > 0 ; z > 0 z - ( τ j + ρ j β + η j v - θ j κ ) 0 for j = 1 , , q z + ( τ j + ρ j β + η j v - θ j κ ) 0 for j = 1 , , q

Solving this constraint linear programming problem generates optimal parameters (β ,v ,κ ) and a minimal error z , which are all functions of U. One concludes as above by selecting an optimal U minimizing z (U) over all U in .

Quality of fit for CKE models

Consider any TD motif or TI motif . We have seen how to compute a parameter vector w optimizing the quality of fit between microarray data and our CKE model for . This was done by minimizing ERR(w)= max t |f(t)- f ̂ (t)|, where f(t) is the main output variable of , and f ̂ (t) is the estimation of f(t) based on the CKE parametrized by w.

For the optimal CKE parametrization w , the lack of fit to data can then be evaluated by E R R(w ). However we have seen in section ‘Invariance by affine profile transformations’ that when comparing two expression profiles recorded by microarray, natural distances between profiles should be roughly invariant by changes of scale for these profiles. It would then be tempting to replace the absolute error of estimation |f(t)- f ̂ (t)| at time t by the relative error of estimation | f ( t ) - f ̂ ( t ) | f ( t ) . But relative errors become quite large whenever the output profile f((t) is close to zero. To avoid such spuriously large error values, while still preserving scale invariance whenever f(t) is not close to zero, we define the Smoothed Relative Error of estimation S R E(t) at time t by the following formula, where f ̄ denotes the mean value of the profile f,

SRE(t)= | f ( t ) - f ̂ ( t ) | f ( t ) whenf(t)>0.15 f ̄
(24)
SRE(t)= | f ( t ) - f ̂ ( t ) | f ̄ whenf(t)0.15 f ̄
(25)

We finally quantify the Modeling Error MODER for the optimally parametrized CKE model of motif by

MODER = max t SRE ( t )

Results and discussion

Examples of application

We implemented our model of TI on microarray data of mouse stem cells undergoing RA-induced differentiation, as provided by LC Science Inc, and previously analyzed by classical techniques in [18]. We took the recorded expression profiles for proteins/mRNAs GCNF, Oct4, Nanog and Sox2 at time points (0, 1.5, 3, 6)/(0, 3, 6) and expression levels for 266 miRNAs on days 0, 1, 3, 6 from [38] during ES cell differentiation. These profile data were interpolated at 19 intermediary time points, by Piecewise Cubic Hermite Interpolation (PCHIP) and the number of parameters were limited to be 4, i.e. only 1 upstream miRNA was selected for the model, to satisfy the parameter parsimony requirement.

For miRNA M i , the following linear transformation, which could be viewed as normalization, was done:

m i ̂ t = m i t - m i ̄ t σ m i + 1

where σ( m i )= t ( m i ( t ) - m i ̄ ( t ) ) 2 , t=0,1/3,…,6. Since m i ( t ) - m i ̄ ( t ) σ ( m i ) 1, m i ̂ (t) is positive for t=0,1/3,…,6. Taking ε=0.15, we applied Minimal Net clustering (the MATLAB code can be downloaded through Additional file 1) to the transformed data of miRNAs, m i ̂ (t),i=1,2,,266, and obtained 107 clusters, in which the maximum cluster contains 14 miRNAs. We consider that the miRNAs belonging to the same cluster share the same normalized expression level within a negligibly small error. For each cluster C L j , j=1,…,107, we determine the miRNA M C j which is the representative of the cluster C L j for the distance D, and we let m c j (t) be the expression level of M C j at time t. We call m c j (t) is the expression level of cluster C L j .

We successively implemented the TI model with only one repressor miRNA (the MATLAB code can be downloaded through Additional file 2). After parametric modeling of our pre-selected 107 TI motifs, and evaluation of their quality of fit, we have validated only 3 clusters of miRNAs as translational inhibitors repressing protein Oct4. If an miRNA in C L j is validated by modeling as a potential repressor, all other miRNAs belonging to C L j are also potential repressors and can be validated numerically as well by the form invariability of the model under affine transformation. Here we present one of these three validated miRNAs of Oct4 (see Figure 2), where the centroid of the cluster is miRNA mmu-miR-10a, while the other 6 miRNAs in the cluster are mmu-miR-203, 330, 342, 470 and 99b. We used TarBase [39] to search all the experimentally validated miRNAs targeting Oct4 (Pou5f1) in Mus musculus. By TarBase, only mmu-miR-470 has been experimentally validated and it is also numerically validated by our TI model.

Figure 2
figure 2

Example of validated TI motif. Example of TI motif repressing Oct4. All expression profiles are over days 0–6. Upper 6 profiles: normalized expression levels of numerically validated miRNAs in the same cluster: mmu-miR-10a, 203, 330, 342, 470 and 99b. Bottom 2 profiles: Blue line = recorded levels. Red dash line = predicted levels. “MODER” is the model global relative error of prediction.

For protein GCNF, only two miRNAs, mmu-let-7b and mmu-let-181a, have been experimentally validated by TarBase and both of them belong to the list of the 20 miRNAs numerically validated by our TI modeling (we presented the validated cluster containing mmu-let-7b in Figure 3). Our modeling approach did not validate any miRNA repressing both proteins Nanog and Sox2, while there are 3 miRNAs and 1 miRNA experimentally validated as separate repressors of Nanog and Sox2 respectively.

Figure 3
figure 3

Example of validated TI motif. Example of TI motif repressing GCNF. All expression profiles are over days 0–6. Upper 6 profiles: normalized expression levels of numerically validated miRNAs in the same cluster: mmu-let-7b, let-7g, let-7i, miR-106b, miR-122a, miR-98. Bottom 2 profiles: Blue line = recorded levels. Red dash line = predicted levels. “MODER” is the model global relative error of prediction.

We here also present one example of TD motif for downstream factor mRNA Sox2 (Figure 4). With the assumption that the transcription factors are proteins Oct4 and Nanog [38], we validated cluster mmu-miR-134, 30a-3p, 30b, 335, 431, 433-3p, 434-3p, and 487b as degraders. Although (mmu-miR-134, Sox2) is also an experimentally tested pair, we will not discuss deep in detail the validation results of the TD motifs in this paper. The main reason is that the validation results of TD motifs depends much on our knowledge of the transcription factors. More discussion is in the subsection below.

Figure 4
figure 4

Example of validated TD motif. Example of TD motif repressing Sox2. All expression profiles are over days 0–6. Upper 8 profiles: normalized expression levels of numerically validated miRNAs in the same cluster: mmu-miR-134, 30a-3p, 30b, 335, 431, 433-3p, 434-3p, 487b. Bottom 2 profiles: Blue line = recorded levels. Red dash line = predicted levels. “MODER” is the model global relative error of prediction.

Discussion

In [38], we pre-selected the potential miRNAs for each gene/protein by TargetScan 5.0 and miRanda before applying the two models. The pre-selection of miRNA candidates were not necessary though it greatly reduced the computational cost. However, the application of the CKE modeling was dependant on the target prediction algorithm, such as TargetScan or miRanda. Therefore, we introduced Minimal Net Clustering in this paper so that the data was condensed and the computational cost could be reduced by a purely numerical method without biological bias.

Since the results of TD model depend on information of transcription factors, the modeling validates not only miRNAs acting as mRNA degraders but also upstream transcription factors simultaneously. In [38] we numerically validated proteins GCNF, Oct4 and Nanog as transcription factors for mRNAs Oct4 and Nanog, while Marson et al. and Boyer et al. [40, 41] claimed that Oct4 and Sox2 are bounded and act together with Nanog as transcription factors. Considering the impact of the transcription factors, the TD modeling may be less convincing unless the transcription factors are fixed as experimentally validated. In this paper we presented more details on the implementation and validation results of the TI model in order to focus on the validation of miRNAs and avoid the influence of assumptions of transcription factors.

TarBase shows that, for protein GCNF the experimentally validated miRNAs are mmu-let-7b and mmu-miR-181a; for protein Oct4 the experimentally validated miRNA is mmu-miR-470; for protein Nanog the experimentally validated miRNAs are mmu-miR-134, mmu-miR-470, and mmu-miR-296; for protein Sox2 the experimentally validated miRNA is mmu-miR-134. In this paper, we have validated by TI modeling and minimal net clustering all the experimentally tested miRNA repressors of GCNF and Oct4. In our previous work [38], actually none of these miRNA repressors had yet been studied for modeling for the four proteins because of the restriction of pre-selection. And only the pair (mmu-miR-181a, GCNF) was validated by the classical correlation analysis done in Gu et al. [18] for the 4 proteins GCNF, Oct4, Nanog, Sox2. Therefore, the TI modeling combined with data condensation not only reduced computational cost but also clearly extended the set of miRNA inhibitors validated by model fitting to microarray data.

Since each numerically validated miRNA cluster may contain two or more miRNAs, the miRNAs in the same cluster could also be considered as potential candidate inhibitors for further experiments to validate. For instance, as Figure 2 shows, mmu-miR-203, 330, 342, 10a and 99b are potential candidates for protein Oct4 for they are in the same cluster as mmu-miR-470, which is validated by both the numerical modeling and experiments.

After the pair (mmu-miR-181a, GCNF) was well validated by our CKE model, we found that the cluster (see Figure 5, top left) containing mmu-miR-181a also includes mmu-miR-103 and mmu-miR-107, which are two known miRNAs that have the same roles in regulating insulin sensitivity and promoting metastasis of colorectal cancer [42, 43]. We also checked that miR-103 and miR-107 have almost the same mature sequences:

  • mmu-miR-103: AGCAGCAUUGUACAGGGCUAUGA

  • mmu-miR-107: AGCAGCAUUGUACAGGGCUAUCA

Figure 5
figure 5

miRNA clusters and analysis of mature sequences and expression levels. Top left, expression profiles for a cluster of 7 miRNAs. Top right, expression profiles for a cluster of 6 miRNAs. Bottom left, the blue curve is the quantile curve of expression distances for miRNA pairs with high alignment scores (N W A>10), the red curve is the analogous quantile curve for miRNA pairs with low alignment scores (N W A≤10). Note that the blue curve lies below the red curve indicates that when NWA increases, then expression distance tends to stochastically decrease. Bottom right, similar to bottom left but the NWA threshold changes to N W A=15.

After the pair (mmu-let-7b, GCNF) was well validated by our CKE model, we observed mmu-let-7g, and mmu-let-7i are in the same cluster as mmu-let-7b (see Figure 5, top right). It was claimed that let-7b and 7g reduce tumor growth in mouse models of lung cancer [44]. We then checked that indeed these three miRNAs have very similar mature sequences, namely

  • mmu-let-7b: UGAGGUAGUAGGUUGUGUGGUU

  • mmu-let-7g: UGAGGUAGUAGUUUGUACAGU

  • mmu-let-7i: UGAGGUAGUAGUUUGUGCUGU

To evaluate the correlation between mature sequence and expression profile of our set of 266 miRNAs, we systematically explored all the 266×265/2=35,245 pairs (m i r i ,m i r j ) of distinct miRNAs in this set, ij,i,j=1,…,35245. For each such pair we then computed the Euclidean distance between the expression level profiles (expression distance in short) and the Needleman-Wunsch alignment score (NWA) between the mature sequences of each miRNA pair. We then divided the 35,245 miRNA pairs into two groups: GroupHigh includes the pairs with high alignment score (N W A>10), i.e. GroupHigh includes miRNA pairs with similar mature sequences, and GroupLow includes the pairs with low alignment score (N W A≤10). We then compared the distribution of expression distances for all miRNA pairs in GroupHigh with the distribution of these distances for miRNA pairs in GroupLow. As seen in Figure 5 (bottom left), the quantiles of these distances in GroupLow are consistently larger than the corresponding quantiles in GroupHigh. This is fully confirmed by Kolmogorov-Smirnov test which yielded the very significant p-value 7×10-32. The result still holds when we change the alignment score threshold N W A=10 used to define GroupHigh and GroupLow (see Figure 5, bottom right, where the NWA threshold is now N W A=15). We conclude that miRNAs having similar mature sequences tend, with high probability, to have similar expression levels. The above analysis and examples indicate that miRNAs belonging to the same cluster are good candidates to have similar mature sequence. Since the match between miRNA mature sequence and target sites is the main determinant for miRNA targets, miRNAs belonging to same cluster may hence also have similar regulatory roles.

Robustness of parameter estimation

Estimation algorithms for nonlinear models may yield parameter estimates that are dependent on the particular set of data or on initial estimates of parameters. Since our parameter estimation algorithm is independent from initial estimates of parameters, we now focus on on the measurements errors affecting microarray data and on their impact for parameter estimation. Considering that the noises of the microarray data are not negligible, we have analyzed the robustness of our parameters estimators when one perturbs randomly the observed expression levels of miRNAs. We have selected an arbitrary validated model MD, where the corresponding downstream factor is denoted as D, and has expression levels d(t). Denote miRNAs M 1,…,M j pertaining to model MD and call their expression levels m 1(t),…,m j (t). Then we have perturbed the expression levels of the miRNAs by independent random noises having the recorded standard deviation σ 1(t),…,σ j (t) and obtained simulated expression levels s m 1(t),…,s m j (t). After injecting the perturbed expression levels s m 1(t),…,s m j (t) into the model MD and get the predicted expression level p d(t) of D. With p d(t),m 1(t),…,m j (t) and expression levels of other upstream factors, we then applied our parameter estimation algorithm to re-estimate the model parameters. This procedure was repeated 100 times. Then for each model parameter, we plotted the histograms of those 100 re-estimated parameter values and compared them with the parameter values estimated from unperturbed data of model MD. This analysis showed that our parameter estimation algorithm is quite robust. Here we present the histograms of perturbed estimates of model parameters for one TD motif (Figure 6) and one TI motif (Figure 7).

Figure 6
figure 6

Histogram of re-estimated parameters for a TD model. a) the histogram of re-estimated parameter β, degradation rate of Sox2, the originally estimated β value of the model is below the graph. b) the histogram of re-estimated parameter κ, transcription rate of Sox2, the originally estimated κ value of the model is below the graph. c) the histogram of re-estimated parameter v, reaction rate of Sox2 and mmu-mir-21, the originally estimated v value of the model is below the graph. d) the histogram of re-estimated parameter μ 1, reaction constant of Sox2 and Oct4, the originally estimated μ 1 value of the model is below the graph. e) the histogram of re-estimated parameter μ 2, reaction constant of Sox2 and Nanog, the originally estimated μ 2 value of the model is below the graph. f) the histogram of re-estimated parameter S 1,S 2, number of binding sites of Oct4 and Nanog on Sox2 respectively, the originally estimated S 1 and S 2 value of the model is below the graph.

Figure 7
figure 7

Histogram of re-estimated parameters for a TI model. a) the histogram of re-estimated parameter γ, degradation rate of GCNF, the originally estimated γ value of the model is below the graph. b) the histogram of re-estimated parameter λ, translation rate of GCNF, the originally estimated λ value of the model is below the graph. c) the histogram of re-estimated parameter μ, reaction constant of GCNF and mmu-let-7b, the originally estimated μ value of the model is below the graph. d) the histogram of re-estimated parameter S, number of binding sites of mmu-let-7b on GCNF respectively, the originally estimated S value of the model is below the graph.

Conclusion

We have separately modeled by chemical kinetics equations the 2 distinct modalities of the repressive actions of miRNAs on post-transcriptional processes of mRNA genes and the associated proteins. This was achieved by first defining the formal structure of two types of interaction architectures (Transcription Degradation motifs and Translation Inhibition motifs) linking miRNAs to subgroups of mRNA genes. The plausibility of each one of these potential TD motifs or TI-motifs was then evaluated by computerized parametric modeling, based on microarray data, of adequate formal chemical kinetics equations (CKEs).

We have sketched the formal derivation of 2 specific CKEs modeling by dynamic ODEs the interactions between concentrations of different species of molecules involved in each architecture. This led to a motif validation strategy based on the quantified quality of fit between our optimally parametrized models and the corresponding microarray data.

Our computerized parameter estimation is implemented by an innovative fast algorithm that does not require knowledge of range of molecular reaction rates. On a current standard laptop PC, our implementation of parameter estimation for a typical 9-parameters CKE model requires about 5 minutes of computing time.

Our parameter estimation algorithm also provides relatively high-quality optimization for the fit between model and microarray data, by integrating both global and local cost minimization techniques, in contexts where plausible ranges of values for most of the unknown parameters are not available in the literature. By perturbing the expression levels of miRNAs and re-estimating the parameters, we showed that our parameter algorithm has a satisfactory level of robustness. We believe that our parameter estimation technique with associated evaluation of quality of fit would be quite applicable as a generic algorithm to similar problems in chemical kinetics modeling of molecular interactions.

Modeling very large microarray data is computationally quite expensive. We have hence sketched clustering methods to condense large microarray data. This approach has of course been attempted before our work, but the main point is that we have carefully studied the mathematical compatibility of our CKE models with condensation of the profiles data. Since we have proved that the abstract form of our CKE models is invariant by arbitrary multiple affine transformations of profiles data, we have made sure to constrain the distance of two expression levels profiles to be invariant by these types of affine transformations.

We have implemented a Minimal Net Clustering algorithm based on this distance, which allows us to control the radius of the clusters. The number of CKEs to parameterize can be strongly reduced after condensation of the large data sets, and the affine invariance of our CKEs show that the condensed genes network can then still be modeled by similar CKEs.

By applying our TI modeling to multiple proteins such as GCNF, Oct4, Nanog, Sox2, we showed that 3 miRNA-target pairs experimentally validated can be also validated by the TI model.

References

  1. Ambros V:The functions of animal microRNAs. Nature. 2004, 431 (7006): 350-355. 10.1038/nature02871.

    Article  PubMed  Google Scholar 

  2. Aravin A, Tuschl T:Identification and characterization of small RNAs involved in RNA silencing. FEBS Lett. 2005, 579 (26): 5830-5840. 10.1016/j.febslet.2005.08.009.

    Article  PubMed  Google Scholar 

  3. Bartel DP:MicroRNAs: target recognition and regulatory functions. Cell. 2009, 136 (2): 215-233. 10.1016/j.cell.2009.01.002.

    Article  PubMed Central  PubMed  Google Scholar 

  4. Tay Y, Zhang J, Thomson AM, Lim B, Rigoutsos I:MicroRNAs to Nanog, Oct4 and Sox2 coding regions modulate embryonic stem cell differentiation. Nature. 2008, 455: 1124-1128. 10.1038/nature07299.

    Article  PubMed  Google Scholar 

  5. Wang XJ, Reyes JL, Chua NH, Gaasterland T:Prediction and identification of arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol. 2004, 5 (9): R65-10.1186/gb-2004-5-9-r65.

    Article  PubMed Central  PubMed  Google Scholar 

  6. Kawasaki H, Taira K:MicroRNA-196 inhibits HOXB8 expression in myeloid differentiation of HL60 cells. Nucleic Acids Symp Ser. 2004, 48 (48): 211-212.

    Article  Google Scholar 

  7. Moxon S, Jing R, Szittya G, Schwach F, RPR L, Moulton V, Dalmay T:Deep sequencing of tomato short RNAs identifies microRNAs targeting genes involved in fruit ripening. Genome Res. 2008, 18 (10): 1602-1609. 10.1101/gr.080127.108.

    Article  PubMed Central  PubMed  Google Scholar 

  8. Williams AE:Functional aspects of animal microRNAs. Cell Mol Life Sci. 2008, 65 (4): 545-562. 10.1007/s00018-007-7355-9.

    Article  PubMed  Google Scholar 

  9. Mazière P, Enright AJ:Prediction of microRNA targets. Drug Discov Today. 2007, 12 (11–12): 452-458.

    Article  PubMed  Google Scholar 

  10. Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge C:Prediction of mammalian microRNA targets. Cell. 2003, 115: 787-798. 10.1016/S0092-8674(03)01018-3.

    Article  PubMed  Google Scholar 

  11. Lewis BP, Burge CB, Bartel D:Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005, 120: 15-20. 10.1016/j.cell.2004.12.035.

    Article  PubMed  Google Scholar 

  12. Brennecke J, Stark A, Russell RB, Cohen S:Principles of microRNA-target recognition. PLoS Biol. 2005, 3: e85-10.1371/journal.pbio.0030085.

    Article  PubMed Central  PubMed  Google Scholar 

  13. Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein E, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N:Combinatorial microRNA target predictions. Nat Genet. 2005, 37: 495-500. 10.1038/ng1536.

    Article  PubMed  Google Scholar 

  14. Chi SW, Hannon GJ, Darnell RB:An alternative mode of microRNA target recognition. Nat Struct Mol Biol. 2012, 19 (3): 321-327. 10.1038/nsmb.2230.

    Article  PubMed Central  PubMed  Google Scholar 

  15. Baek D, Villé J, Shin C, Camargo FD, Gygi SP, Bartel DP:The impact of microRNAs on protein output. Nature. 2008, 455: 64-71. 10.1038/nature07242.

    Article  PubMed Central  PubMed  Google Scholar 

  16. Selbach M, Schwanhäusser B, Thierfelder N, Fang Z, Khanin R, Rajewsky N:Widespread changes in protein synthesis induced by microRNAs. Nature. 2008, 455: 58-63. 10.1038/nature07228.

    Article  PubMed  Google Scholar 

  17. Mourelatos Z:Small RNAs: The seeds of silence. Nature. 2008, 455: 44-45. 10.1038/455044a.

    Article  PubMed  Google Scholar 

  18. Gu P, Reid JG, Gao X, Shaw CA, Creighton C, Tran PL, Zhou X, Drabek RB, Steffen DL, Hoang DM, Weiss MK, Naghavi AO, El-daye J, Khan MF, Legge GB, Wheeler DA, Gibbs RA, Miller JN, Cooney AJ, Gunaratne PH:Novel miRNA candidates and miRNA-mRNA pairs in ES cells. PLoS ONE. 2008, 3 (7): e2548-10.1371/journal.pone.0002548.

    Article  PubMed Central  PubMed  Google Scholar 

  19. de Jong H:Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002, 9 (1): 67-103. 10.1089/10665270252833208.

    Article  PubMed  Google Scholar 

  20. Goutsias J, Kim S:A nonlinear discrete dynamical model for transcriptional regulation: construction and properties. Biophys J. 2004, 86 (4): 1922-1945. 10.1016/S0006-3495(04)74257-5.

    Article  PubMed Central  PubMed  Google Scholar 

  21. Cornish-Bowden A: Fundamentals of Enzyme Kinetics, 3rd Edition. 2004, London: Portland Press

    Google Scholar 

  22. Goutsias J, Lee NH:Computational and experimental approaches for modeling gene regulatory networks. Curr Pharm Des. 2007, 13: 1415-1436. 10.2174/138161207780765945.

    Article  PubMed  Google Scholar 

  23. Moore JW, Pearson RG: Kinetics and Mechanism, 3rd Edition. 1981, New York: Wiley

    Google Scholar 

  24. Slonim DK, Yanai I:Getting started in gene expression microarray analysis. PLoS Comput. Biol. 2009, 5 (10): e1000543-10.1371/journal.pcbi.1000543.

    Article  PubMed Central  PubMed  Google Scholar 

  25. Engl HW, Flamm C, Kügler P, Lu J, Müller S, Schuster P:Inverse problems in systems biology. IOP Sci. 2009, 25: 123014-

    Google Scholar 

  26. Arkin A, Ross J, McAdams HH:Stochastic kinetic analysis of developmental pathway bifurcation in phage l-infected escherichia coli cells. Genetics. 1998, 149: 1633-1648.

    PubMed Central  PubMed  Google Scholar 

  27. Wang Y, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, Brown PO:Precision and functional specificity in mRNA decay. Proc Natl Acad Sci. 2002, 99 (9): 5860-5865. 10.1073/pnas.092538799.

    Article  PubMed Central  PubMed  Google Scholar 

  28. Meir E, Munro EM, Odell GM, von Dassow G:Ingeneue: a versatile tool for reconstituting genetic networks, with examples from the segment polarity network. J Exp Zool. 2002, 294: 216-251. 10.1002/jez.10187.

    Article  PubMed  Google Scholar 

  29. Müller S, Hofbauer J, Endler L, Flamm C, Widder S, Schuster P:A generalized model of the repressilator. J Math Biol. 2006, 53: 905-937. 10.1007/s00285-006-0035-9.

    Article  PubMed  Google Scholar 

  30. Widder S, Schicho J, Schuster P:Dynamic patterns of gene regulation I: simple two-gene systems. J Theor Biol. 2007, 246: 395-419. 10.1016/j.jtbi.2007.01.004.

    Article  PubMed  Google Scholar 

  31. Hornstein E, Shomron N:Canalization of development by microRNAs. Nat Genet. 2006, 38: S20-S24. 10.1038/ng1803.

    Article  PubMed  Google Scholar 

  32. Azencott R, Coldefy F, Younes L:A distance for elastic matching in object recognition. Pattern Recognition, Proceedings of 13th Int. Conf. ICPR 96, vol. 1. 1996, IEEE, 687-691.

    Chapter  Google Scholar 

  33. Periwal V, Chow CC, Bergman RN, Ricks M, Vega GL, Sumner AE:Evaluation of quantitative models of the effect of insulin on lipolysis and glucose disposal. Am J Physiol Regul Integr Comp Physiol. 2008, 295: R1089-R1096. 10.1152/ajpregu.90426.2008.

    Article  PubMed Central  PubMed  Google Scholar 

  34. Gregory PC: Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach With Mathematical Support. 2005, London: Cambridge University Press

    Book  Google Scholar 

  35. Küegler P, Gaubitzer E, Müller S:Parameter identification for chemical reaction systems using sparsity enforcing regularization: a case study for the Chlorite-Iodide Reaction. J Phys Chem A. 2009, 113: 2775-2785. 10.1021/jp808792u.

    Article  Google Scholar 

  36. Vapnik V: Statistical Learning Theory. 1998, New York: Wiley

    Google Scholar 

  37. Boyd S, Vandenberghe L: Convex Optimization. 2004, Cambridge: Cambridge University Press

    Book  Google Scholar 

  38. Luo Z, Xu X, Gu P, Lonard D, Gunaratne P, Cooney AJ, Azencott R:Regulatory circuits of miRNAs in ES cell differentiation: a chemical kinetics modeling approach. PLoS One. 2011, 6 (10): e23263-10.1371/journal.pone.0023263.

    Article  PubMed Central  PubMed  Google Scholar 

  39. Vergoulis TI, Vlachos P, Alexiou G, Georgakilas M, Maragkakis M, Reczko M, Gerangelos S, Koziris N, Dalamagas T, Hatzigeorgiou A:Tarbase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucl Acids Res. 2012, 40 (D1): D222-D229. 10.1093/nar/gkr1161.

    Article  PubMed Central  PubMed  Google Scholar 

  40. Marson A, Levine SS, Cole MF, Frampton GM, Brambrink T, Johnstone S, Guenther MG, Johnston WK, Wernig M, Newman J, Calabrese JM, Dennis LM, Volkert TL, Gupta S, Love J, Hannett N, Sharp PA, Bartel DP, Jaenisch R, YR A:Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell. 2008, 10: 1016-

    Google Scholar 

  41. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA:Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005, 122 (6): 947-956. 10.1016/j.cell.2005.08.020.

    Article  PubMed Central  PubMed  Google Scholar 

  42. Trajkovski M, Hausser J, Soutschek J, Bhat B, Akin A, Zavolan M, Heim MH, Stoffel M:MicroRNAs 103 and 107 regulate insulin sensitivity. Nature. 2011, 474 (7353): 649-653. 10.1038/nature10112.

    Article  PubMed  Google Scholar 

  43. Chen HY, Lin YM, Chung HC, Lang YD, Lin CJ, Huang J, Wang WC, Lin FM, Chen Z, Huang HD, Shyy JY, Liang JT, Chen RH:miR-103/107 promote metastasis of colorectal cancer by targeting the metastasis suppressors DAPK and KLF4. Cancer Res. 2012, 72 (14): 3631-3641. 10.1158/0008-5472.CAN-12-0667.

    Article  PubMed  Google Scholar 

  44. Esquela-Kerscher A, Trang P, Wiggins JF, Patrawala L, Cheng A, Ford L, Weidhaas JB, Brown D, Bader AG, Slack FJ:The let-7 microRNA reduces tumor growth in mouse models of lung cancer. Cell Cycle. 2008, 7 (6): 759-764. 10.4161/cc.7.6.5834.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We would like to thank Drs P. Gunaratne, A. J. Cooney, D. Lonard, X. Xu for multiple discussions concerning their experimental data and results on miRNA-mRNA interactions. These discussions led us to a joint publication.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zijun Luo or Yi Zhao.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

ZL developed the chemical kinetics equations and the parameter estimation algorithm; RA proposed the transformation invariance of chemical kinetics equations and data condensation method; YZ did the computation work. ZL and RA wrote the paper. All authors read and approved the final manuscript.

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Luo, Z., Azencott, R. & Zhao, Y. Modeling miRNA-mRNA interactions: fitting chemical kinetics equations to microarray data. BMC Syst Biol 8, 19 (2014). https://doi.org/10.1186/1752-0509-8-19

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1752-0509-8-19

Keywords