Laboratory of Biosystem Dynamics, Computational Systems Biology Research Group, Department of Signal Processing, Tampere University of Technology, 33101 Tampere, Finland

Institute for Systems Biology, 1441N 34th St, Seattle, WA, 98103-8904, USA

Abstract

Background

In

Results

We measured the

Conclusions

The two rate-limiting steps in initiation are found to be important regulators of the dynamics of RNA production under the control of the lar promoter in the regimes of weak and medium induction. Variability in the intervals between consecutive RNA productions is much lower than if there was only one rate-limiting step with a duration following an exponential distribution. The methodology proposed here to analyze the

Background

Gene expression is inherently stochastic and most RNA molecules exist in very low copy numbers in

RNA numbers depend on the kinetics if its production and degradation. A genome wide study of degradation rates of RNA molecules in

The mean rate of transcription of a gene is mostly determined by the promoter sequence as well as by the present concentrations of possible activator and repressor molecules. In bacteria, the process of transcription initiation at the promoter region includes diffusion of the RNA polymerase (RNAp) along the template until reaching a transcription start site (TSS), DNA bending and loading in the active site of the RNAp, DNA unwinding and positioning in the TSS, loading of the NT strand, and assembly of the clamp/jaw on downstream DNA

The durations of the rate-limiting steps in initiation vary widely between promoters, even when the sequences only differ slightly ^{2+ }and other metabolites

The initiation mechanism is dynamically complex as it involves, e.g., uni-dimensional diffusion of the RNAp on the DNA template and conformational changes of the RNAp and template _{1 }to I_{3 }are intermediates of the isomerization step. The last step in (1) competes with abortive initiation

All steps in (1), except for the last one, are reversible

Let t(RP_{c}) be the duration of the closed complex formation (first step in (2)), which includes the time for the RNAp to find the TSS. Also, let t(RP_{o}) be the duration of the open complex formation (second step in (2)), and let t(RP_{cl}) be the time for RNA chain elongation initiation and promoter clearance (third step in (2)). Finally, let t_{pt }be the time to start a productive transcription, equal to the sum of t(RP_{c}), t(RP_{o}) and t(RP_{cl}). _{pt }is of the order of 10-1000 seconds, depending on the concentrations of inducers and environmental factors such as temperature.

The

Recently, a method was developed in

The kinetics of transcription initiation of the lar promoter, as well as of several variants, have been studied

Here, we report

Results

We measured the dynamics of transcript production for weak, medium and full induction of the lar promoter (see Methods and Additional File

**Example of a movie generated from temporal images of a cell**. Images were taken approximately 7 min after induction, one every minute, for approximately 2 hours. The cell was induced with 0.01 mM of IPTG and 0.067 mM of Arabinose. The cell identification number and the time (s) when the frame was captured are shown in the top right and left corners, respectively.

Click here for file

The difference in the mean rate of production of mRNAs between weak, medium and high induction levels was confirmed with qPCR (Additional File

**Supplementary information**. Supplementary information: qPCR analysis of the target RNA; image analysis and cell segmentation, detection and counting of mRNA in cells; analyses of the intervals between production events assuming an ON-OFF mechanism of RNA production; measurements of RNA numbers under full induction.

Click here for file

The distributions of intervals between consecutive productions of transcripts, for weak and medium induction are shown in Figure

Histogram of the measured intervals superimposed with the probability density functions of the models

**Histogram of the measured intervals superimposed with the probability density functions of the models**. Distributions of intervals between consecutive transcription events for weak (left) and medium (right) inductions. Each bar is 180 s. Measurement time is 2 hours (measured every 60 s). (A) mean of measured intervals is 2233 s and standard deviation is 1506 s (data from 233 intervals extracted from 283 cells). (B) mean of measured intervals is 1433 s and standard deviation is 1243 s (data from 99 intervals extracted from 40 cells). The histograms of measured intervals are superimposed with probability density functions of models with 1, 2 and 3 steps that best fit the data. Dotted line: 1-step model, solid line: 2-step model and dashed line: 3-step model (partially covered by solid line).

Log-likelihood and duration of the steps of the models

**Induction**

**Weak**

**Medium**

**d**

**Log-likelihood**

**Duration of steps**

**Log-likelihood**

**Duration of steps**

1

-2029.0

2233

-818

1433

2

-2000.8

1116

1116

-801

716

716

3

-2000.5

1099

1099

35

-800

640

640

152

4

-2000.4

1095

1095

21

21

-800

640

640

152

0

Log-likelihood and duration of the steps of the models with

The goodness of fit of the models can be assessed by a likelihood-ratio test between pairs of models to reject a null model in favour of the alternative. The results in Table

Likelihood-ratio test between the models

**Induction**

**Weak**

**Medium**

**(d**
_{
0
}
**, d**
_{
1
}
**)**

**p-value**

**p-value**

(1, 2)

3.10 × 10^{-14}

3.57 × 10^{-9}

(2, 3)

0.4451

0.0955

(3, 4)

0.7186

1

Likelihood-ratio test between pairs of models. Null model is d_{0 }step model (where d_{0 }equals 1, 2, or 3) while the alternative model is a d_{1 }step model (where d_{1 }= d_{0}+1), in the regimes of weak and medium induction. For p-values smaller than 0.01, it is generally accepted that the null model should be rejected in favour of the alternative. The single step model is insufficient in both regimes.

Since tagged RNA molecules are visible soon after completion, or even while elongating ^{3 }s, elongation only takes tens of seconds (two orders of magnitude smaller)

From all of the above, the events that shape the observed distributions of intervals need to occur during transcription initiation, between the finding of the TSS and initiation of a productive elongation by an RNA polymerase.

From Table

Finally, we compared our measurements with previous

Under maximum induction, we observed a production rate of approximately 4 RNA/h per cell. However, this rate of production was only observed if the cells are kept in liquid culture until the moment when they are imaged (see Additional File

Aside from the regime of full induction, the mean intervals measured

Discussion

Information on the kinetics of the intermediate steps of the multi-step process of transcription initiation in prokaryotes has been limited so far to mean values in

From the distribution of intervals between consecutive transcription events under the control of the lar promoter in the regimes of weak and medium induction, we inferred the number and duration of the rate-limiting steps in initiation. In both regimes, two rate-limiting steps with approximately equal duration were identified. Their durations were found to be longer, but of the same order of magnitude as the

The measurements in the regimes of weak and medium induction reflect the activity of the promoter, which is regulated by the repressors (LacI and AraC) and activators (IPTG and Arabinose), since the binding and unbinding of these molecules to the promoter is a process whose speed is orders of magnitude faster than the process of initiation. Due to this, the intervals between consecutive productions of RNA molecules reflect the kinetics of the promoter, rather than the binding and unbinding dynamics of these molecules

It is known from studies of models of gene expression

Recently, the mRNA copy-number statistics of various promoters were studied in

Our measurements for the lar promoter favour the model proposed here. However, to account for more complex repression and activation mechanisms that other promoters may have, our model may need to relax the assumption of the exponential duration of the closed complex formation.

Perhaps the most intriguing result here reported is that the inferred time scales of the two rate limiting steps are identical for both induction regimes. It is of interest to speculate whether this is due to some unknown artefact of the inference method or is representative of the real kinetics of transcription initiation of the lar promoter. As noted, we verified that our method of inference reliably distinguishes the duration of each step when they differ by ~25% or more in duration, from 200 intervals sampled from a model of gene expression. However, for smaller differences, the solution is biased towards inferring steps with identical durations, for unknown reasons. Given this, we believe that the inference method is biased towards identical values when and only when the two steps are similar in duration, resulting in a gamma distribution.

In vitro measurements also suggest that, while not identical, the two steps are similar in duration for both weak and medium inductions

Finally, the method used to fit the measured distributions cannot determine the order of the rate-limiting steps. Further measurements and analysis will be required to do so. Nevertheless, our results are informative of the

Conclusions

The intermediate steps of transcription initiation are a key regulator of the dynamics of RNA production under the control of the lar promoter in the regimes of weak and medium induction. Since transcription initiation in this promoter has at least two rate-limiting steps, the fluctuations in RNA numbers will be weaker than if there was only one rate-limiting step with the same total mean duration, exponentially distributed. Consequently, cell-to-cell diversity in RNA numbers will also be smaller.

Since most genes in

Methods

Expressing mRNA tagged with MS2d-GFP fusion protein in

The method of RNA detection and quantification was proposed in

Cells with both MS2d-GFP and transcript target plasmids were grown in Miller LB medium, supplemented by antibiotics according to the specific plasmids. For full induction of protein and RNA, cells were grown in overnight at 37°C with aeration, diluted into fresh medium to maintain exponential growth until reaching an optical density of OD600 ≈ 0.3-0.5. Inducer aTc (100 ng/ml) was added to get full induction of MS2d-GFP production. Approximately 60 min incubation allows sufficient production for RNA detection. After, expression of target RNA is induced (see below).

Following induction, cells are placed on a microscopic slide between a cover slip and 0.8% LB-agarose gel pad set, and visualized by fluorescence microscopy, using a Nikon Eclipse (TE2000-U, Nikon, Tokyo, Japan) inverted C1 confocal laser-scanning system with a 100× Apo TIRF (1.49 NA, oil) objective. GFP fluorescence is measured using a 488 nm laser (Melles-Griot) and a 515/30 nm detection filter. Images of cells are taken from each slide using C1 with Nikon software EZ-C1, approximately 7 min after induction, one per minute, for approximately 2 hours. Measurements under the microscope were made at room temperature (~ 24°C).

Maximum induction of target RNA is achieved with 1 mM of IPTG and 6.7 mM of arabinose

We measured the relative changes in mean mRNA numbers with induction strength with quantitative real time PCR. Target RNA was induced with low and high concentrations of inducers. From isolated RNA, complementary DNA was prepared and used for expression analysis

Forward: 5' TAC GAC GCC GAG GTC AAG 3'

Reverse: 5' TTG TGG GAG GTG ATG TCC A 3'

and for 16S rRNA:

Forward: 5'CGT CAG CTC GTG TTG TGA A 3'

Reverse: 5' GGA CCG CTG GCA ACA AAG 3'

For details and results of qPCR measurements see Additional File

Segmentation of cells, MS2-GFP-RNA spots in cells, RNA molecules from spots, and intervals between transcription events

We detect cells from raw images as in

The automatic spot detection method segments the MS2d-GFP-RNA spots with the kernel density estimation method for spot detection as in

MS2d-GFP-tagged RNA molecules in E. coli cells

**MS2d-GFP-tagged RNA molecules in E. coli cells**. Unprocessed image of MS2d-GFP-tagged RNA molecules in

By performing this analysis for each frame, it is possible to determine when new RNA molecules appear in the cell. From that, we calculate intervals between the productions of consecutive RNAs in individual cells. For a detailed description of this analysis, as well as examples, refer to

Finally, we only count intervals between consecutive RNA molecules which are produced in the same cell. If a cell division occurs, the interval between the last RNA produced in the mother cell and the first RNA produced in a daughter cell is not included in the counts of intervals between consecutive RNA molecules.

Fitting the model to a d-step model, each step with an exponentially distributed duration

Given the distribution of time intervals between consecutive transcription events, obtained from multiple cells subject to the same induction, it is possible to determine the maximum likelihood fit of a model with _{1}, μ_{2,}..., μ_{d}], and given N measured intervals between transcription events, Δt_{k}, where k goes from 1 to N, the log-likelihood is:

where π_{d }is the probability density function for a sum of d exponential random variables with means μ_{d}. The probability density function for the sum can be found by the convolution of the probability density functions of the individual exponential random variables. The density functions for d = 1,..3 are:

The values of μ = [μ_{1}, μ_{2 }... μ_{d}] are the expected means and standard deviations of the durations of each of the steps composing the intervals between production events. We use this procedure to find the values of μ that provide the highest log-likelihood for d = 1,...,4. No significant improvement of fit was observed for values of d > 2 (Table

We note that the singularities of the probability density functions, formulas (5) and (6), were not problematic since the maximum likelihood estimate of the μ's differed from the second decimal onward. Furthermore, the singularities can be removed. For example, in (5), if μ_{1 }= μ_{2}, the singularity can be removed by various means (e.g. L'Hôpital rule), so that π_{2 }equals the density function of the gamma distribution with parameters k = 2 and θ = μ_{1 }= μ_{2}.

The goodness of fit of the models can be assessed by comparison. For that, we perform a likelihood-ratio test between pairs of models to reject a null model in favour of the alternative. Finally, we verified that the method reliably distinguishes the duration of each step, when they differ by ~25% in duration, from 200 intervals sampled from a model of gene expression.

Authors' contributions

ASR conceived the manuscript. MK executed the experiments. HM, AH, JLP and AG executed the analytical studies. OYH participated in the planning with ASR. ASR wrote most of the manuscript. All authors contributed in the writing, read and approved the final manuscript.

Acknowledgements

This work was supported by Academy of Finland (MK) and FiDiPro programme of Finnish Funding Agency for Technology and Innovation (HM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank S. Chowdhury, and E. Lihavainen for useful advice.