Abstract
Background
Reverseengineering regulatory networks is one of the central challenges for computational biology. Many techniques have been developed to accomplish this by utilizing transcription factor binding data in conjunction with expression data. Of these approaches, several have focused on the reconstruction of the cell cycle regulatory network of Saccharomyces cerevisiae. The emphasis of these studies has been to model the relationships between transcription factors and their target genes. In contrast, here we focus on reverseengineering the network of relationships among transcription factors that regulate the cell cycle in S. cerevisiae.
Results
We have developed a technique to reverseengineer networks of the timedependent activities of transcription factors that regulate the cell cycle in S. cerevisiae. The model utilizes linear regression to first estimate the activities of transcription factors from expression time series and genomewide transcription factor binding data. We then use least squares to construct a model of the time evolution of the activities. We validate our approach in two ways: by demonstrating that it accurately models expression data and by demonstrating that our reconstructed model is similar to previouslypublished models of transcriptional regulation of the cell cycle.
Conclusion
Our regressionbased approach allows us to build a general model of transcriptional regulation of the yeast cell cycle that includes additional factors and couplings not reported in previouslypublished models. Our model could serve as a starting point for targeted experiments that test the predicted interactions. In the future, we plan to apply our technique to reverseengineer other systems where both genomewide time series expression data and transcription factor binding data are available.
Background
Reverseengineering networks of transcriptional regulation is a major goal of biology [16]. In the past few years, dramatic progress has been made in this effort due to significant improvements in experimental methodologies. Using chromatin immunoprecipitation in conjunction with microarrays, it is now possible to measure the binding of many transcription factors to the promoters of most genes, thus elucidating the underlying structure of the transcriptional regulatory network. For example, in the case of the yeast Saccharomyces cerevisiae, the binding profiles of 204 transcription factors to all promoters have been measured in rich media [7,8].
Despite the significant advance in our understanding that this data permits, it leaves a number of important questions unanswered. Does the binding of a specific transcription factor to the promoter of a gene regulate its expression? Does the transcription factor act alone or in cooperation with other factors? Under which conditions is a particular transcription factor active? In general, how does the connectivity of the network change in different conditions (e.g., during the cell cycle)?
To answer these questions, one must analyze the transcription factor binding data in conjunction with expression data collected from multiple conditions. Expression data is typically obtained using microarrays that allow monitoring of changes in the transcription of all mRNAs between samples. A variety of methods have been developed that utilize both expression data and transcription factor binding data to model transcriptional regulation. The simplest models identify transcription factor binding sites (or cis regulatory motifs) that are overrepresented in the promoters of differentiallyexpressed genes [9]. Another approach estimates which transcription factors are globally active in a particular expression profile using multiple linear regression [5,10,11]. A more complex approach models the coupling strengths between transcription factors and their targets using Network Component Analysis [12].
These approaches usually work with the logarithm of the expression and binding data, and typically assume a linear or affine relationship between the two. However, it is likely that transcription factor binding affects gene expression in a nonlinear fashion, e.g., below some level it has no effect and above some higher level the effect might saturate. This type of behavior can be modelled using linear splines [13] or sigmoidal functions [14,15]. Moreover, there is abundant evidence that cooperative effects among transcription factors also play a critical role in regulating gene expression. As a result, a variety of approaches have been developed to identify nonlinear cooperative effects between factors [13,1619].
These methods allow one to identify biologicallymeaningful couplings between transcription factors and their target genes in specific conditions. Other approaches attempt to model timedependent changes in the transcriptional regulatory network. The dynamics of transcriptional regulation are usually studied during the cell cycle because time courses of gene expression of synchronized yeast cells have been measured [20,21]. Various approaches, including stochastic differential equations [14] and maximum likelihoodbased methods [15], have been used to parameterize models of transcriptional regulation during the yeast cell cycle. Other groups have used experimental data to parameterize differential equations that describe a model of the yeast cell cycle [22].
All of these previous studies have focused on modelling the interactions between transcription factors and their target genes. In contrast, here we focus on the regulatory network of interactions among transcription factors themselves – such interactions play a central role in regulating cellular programs. Thus, we set out to construct a dynamical model of the couplings between the activities of transcription factors, hoping to elucidate the dynamics of this module. As in previous studies, we focus on the cell cycle of S. cerevisiae due to the availability of genomewide time series expression data. The resulting model gives a view of the temporal program of transcriptional regulation for the key cell cycle regulators, thus defining the basic machinery that regulates this fundamental process.
Previous studies have attempted to reconstruct the network of transcription factors that regulate the cell cycle primarily using transcription factor binding data [23,24]. Other studies have utilized expression and proteinprotein interaction data to study the dynamics of transcription factor complex formation during the cell cycle [25]. These studies were able to show that the primary regulators of the cell cycle – the canonical transcription factors – are the Swi4Swi6Mbp1, Fkh2Ndd1Mcm1, and Mcm1Swi5Ace2 complexes. These regulate each other in a serial fashion: the transcription factors that control one phase of the cycle activate those that control the next phase (see Figure 1). The underlying assumption of this model – the canonical model – is that if a transcription factor binds to the promoter of another factor, it likely regulates its expression. However, as we have discussed above, these models of transcription factor binding do not offer a complete picture of transcriptional regulation since they do not explicitly predict changes in gene expression. To obtain a more complete understanding of transcriptional regulation, one must construct a dynamical model that accounts for expression data in terms of binding data. Here, we construct such a model and compare it to the canonical model.
Figure 1. The canonical model of cell cycle regulation. The canonical model is determined primarily from transcription factor binding data. It contains 8 transcription factors that comprise 3 complexes that activate each other in a serial fashion to regulate the expression of genes during the cell cycle.
For the sake of simplicity, we use a linear regression approach to estimate transcription factor activities. Our approach is similar to that used in Motif Regressor [5] and REDUCE [26] except that we are fitting expression data with transcription factor binding data instead of measurements of the presence of cis regulatory sequences in promoters. The "activity" of a transcription factor is measured by computing an αcoefficient, which is the regression coefficient of that factor in our linear model (see Methods below). The αcoefficient describes whether the genes a transcription factor is bound to are differentially expressed or not were the effects of other factors to be held constant. Factors with α > 0 are generally bound to the promoters of overexpressed genes while factors with α < 0 are binding to the promoters of underexpressed genes. If a transcription factor is an activator, then α > 0 implies that the factor is active and α < 0 implies that it is inactive. For repressors, the interpretation is the opposite: α > 0 and α < 0 correspond to the repressing transcription factor being inactive and active, respectively.
When these calculations are applied to a time series of expression profiles, we obtain the αcoefficients ("activities") of each transcription factor over time. Here, we are interested in the changes in transcription factor αcoefficients during the cell cycle of S. cerevisiae. Yeast cells were synchronized using yeast mating pheromone and the expression of all genes were profiled over two periods of the cell cycle [20]. From this data, we were able to not only compute the αcoefficients of the canonical cellcycleregulating transcription factors, but also able to identify additional factors that may play a significant role in regulation of the cell cycle.
We then investigated whether it was possible to model the temporal couplings of these factors in the form of ordinary differential equations (or difference equations, as expression data was only available at a small number of equallyspaced discrete times). Such a model would enable us to predict which factors are affecting each other's αcoefficients across the different phases of the cell cycle. We were able to construct such a model by generating a timetranslation matrix of transcription factor αcoefficients using least squares [27]. The significant couplings in this model may be displayed in the form of a network and this network can be compared to the canonical model of transcriptional regulation of the yeast cell cycle. Such a comparison demonstrates that our model is consistent with the canonical model. At the same time, the new model provides a more comprehensive view of transcriptional regulation that includes additional factors and couplings.
Results
We set out to reverseengineer the interactions of the transcription factors comprising the core module of regulators of the S. cerevisiae cell cycle. The model captures the timedependent αcoefficients of transcription factors and how they are coupled to control this important cellular program. The first step in our approach is to determine the αcoefficients of transcription factors during the cell cycle and to identify the factors that are the most significant regulators of the cell cycle.
We assume that changes in expression of a specific gene depend on the product of the binding ratios of all the transcription factors that bind to its promoter, i.e.,
where R_{i }is the ratio of the expression level of gene i in two conditions, the αcoefficient α_{j }is a measure of the contribution of transcription factor j, b_{ij }is a constant (equal to 1 when transcription factor j is unrelated to gene i) giving the coupling between gene i and factor j, and N is the number of transcription factors in the model. In practice, we work with logarithms of expression ratios and logarithms of binding profiles so that (see Methods) one may solve for the α_{j }by multiple linear regression. As discussed earlier, α_{j }is a surrogate for the activity of transcription factor j and positive values indicate an active activator or inactive repressor and negative values indicate an active repressor or inactive activator. Without prior knowledge of the nature of transcription factor j, we do not know from its α_{j }if that factor is active or inactive; rather (and more of concern as regards gene expression), we know that it is tending to bind overexpressed (α_{j }> 0) or underexpressed (α_{j }< 0) genes. Of course, in some cases we do know whether a transcription factor is an activator or repressor and can therefore easily transform its αcoefficient to activity. Nonetheless, for the remainder of this manuscript we focus on αcoefficients and not activities.
We applied the above to model the αcoefficients of transcription factors during two periods of the yeast cell cycle. We concentrated on modelling the expression data that was obtained by using yeast mating pheromone to synchronize yeast cells, although we also present models based on data from synchronization using a temperaturesensitive cdc15 mutant and from elutriation [20]. The binding was measured for 204 transcription factors in rich media using chromatin immunoprecipitation [7]. We computed αcoefficients not only for the canonical transcription factors, but also for additional factors we identified to be significant in regulating the cell cycle. These additional factors were selected by determining the significance of the contribution of each coefficient in regressions. For each time point, we identified the most significant transcription factors based on pvalue, and factors were rankordered based on the number of time points in which they were significant. We filtered out factors that did not show periodic behavior. We identified four additional transcription factors – Bas1, Spt2, Ste12, and Yox1 – that are likely significant regulators of the cell cycle; stopping at four additional factors maintains a reasonable datatoparameter ratio in the model. Details of the procedure used appear in the Methods section.
Having selected significant transcription factors and found their αcoefficients at each time point in the cell cycle, the second step in our approach is to model the system's dynamics. We accomplished this by computing a transition (or timetranslation) matrix T that can be used to determine the αcoefficients of transcription factors at other time points from the αcoefficients at the current time point by matrix multiplication. Estimated using least squares, the matrix T permits inference of the network of couplings between transcription factors (see Methods).
We chose to perform constrained least squares and require the estimate of T to have only nonnegative entries. This choice was made for several reasons. First, it generates models that are more readily interpretable biologically. It is clear that a positive entry may be interpreted as the positive αcoefficient of transcription factor A increasing the αcoefficient of transcription factor B at the next time point, as one might expect for activators. A positive entry may also be seen as the negative αcoefficient of transcription factor A decreasing the αcoefficient of transcription factor B at the next time point, as one might expect for repressors. In contrast, a negative entry suggests that the negative αcoefficient of a factor increases the αcoefficient of another, or viceversa, which we expect to be less likely in 7minute intervals for both activators and repressors. A second reason to apply nonnegativity constraints to the entries of the timetranslation matrix is that an unconstrained matrix tends to have no zero entries (i.e., all entries nonzero). This leads to an undesirable model in which all transcription factors depend on all other factors, which we know not to be true. Finally, applying constraints to the timetranslation matrix can reduce the number of parameters needed to be fit and improve the model's datatoparameter ratio.
We compared our model to the canonical model of the network of cellcycleregulating transcription factors (Figure 1) that has been inferred primarily from binding data by Simon, et al. [23,24]. Three groups of transcription factors – Swi4Swi6Mbp1, Fkh2Ndd1Mcm1, and Mcm1Swi5Ace2 – are known to be active in three phases of the cell cycle (G1/S, G2/M, and M/G1, respectively), binding to each other in a serial fashion: the first group binds to promoters of the second, the second to the third, and the third to the first. This basic mechanism serves to regulate the transcription of genes during the cell cycle.
To compare this model to the one generated using our approach, we started with the 8 canonical transcription factors. We computed the αcoefficients of these transcription factors in the cell cycle time course data (see Figure 2) and then computed the transition matrix that describes their dynamics. This matrix may be viewed as a network, seen in Figure 3. (Note that if two transcription factors are connected in a model by arcs in both directions, we display only the more significant direction.) In Additional files 1 and 2, we show the same network computed using two other time courses in the Spellman, et al. data set. 1 shows the network derived from the cdc15 synchronization experiments and 2 displays the network derived from the elutriation time series. In all three networks, we see the same central relationship Fkh2 → Ndd1 → Mcm1 → Ace2 → Swi4/Swi6/Mbp1 between transcription factors, demonstrating the robustness of our approach in reconstructing the core regulatory cycle independently of the synchronization method used to generate the expression data.
Figure 2. αcoefficients of transcription factors regulating the cell cycle. A heat map of the αcoefficients of the canonical transcription factors during two periods of the cell cycle. Red indicates α > 0 (transcription factor generally bound to overexpressed genes), green indicates α < 0 (transcription factor generally bound to underexpressed genes), and black indicates α = 0. Factors are ordered based on their phase in the longterm asymptotic behavior of our dynamical model. The αcoefficients have been quantile normalized for each factor across the time points.
Figure 3. Transcription factor network of canonical cell cycle regulators. We depict the network of canonical transcription factors derived using our approach. We construct a dynamical model of their αcoefficients and show the nonzero edges in this matrix as directed links between factors. A line from factor A to factor B indicates that the activity of A affects the activity of B at the next time point. The displayed order of factors minimizes the number of upward arcs (these arcs being grouped on the right side of the figure).
Additional File 1. Transcription factor network of canonical cell cycle regulators as derived from the cdc15 time course. We show the nonzero entries in the model's timetranslation matrix as directed arcs between transcription factors. We note the general similarity of the causal flow in this network to Figure 3 which was derived from the mating pheromone arrest time course. The order of factors displayed here minimizes the number of upward arcs (these arcs being grouped on the right side).
Format: TIFF Size: 981KB Download file
Additional File 2. Transcription factor network of canonical cell cycle regulators as derived from the elutriation time course. We show the nonzero entries in the model's timetranslation matrix as directed arcs between transcription factors. We note the general similarity of the causal flow in this network to Figure 3 which was derived from the mating pheromone arrest time course. The order of factors displayed here minimizes the number of upward arcs (these arcs being grouped on the right side).
Format: TIFF Size: 908KB Download file
The resulting models are quite similar to the canonical one. They show a flow of influence from Ace2, a component of the M/G1 complex, to the Swi4Swi6Mbp1 complex active in G1/S. The Swi4 subunit of this complex influences the αcoefficient of Fkh2, which in turn affects Ndd1 and subsequently Mcm1, both subunits of the G2/M complex. The cycle is then completed with interactions from Ndd1 and Mcm1 back to Ace2 and Swi5. All of these interactions recapitulate the basic flow of the canonical model indicating strong support for this model using our approach. Nonetheless, we note that although 8 of the 16 edges in Simon, et al.'s model [23] are recapitulated in our 25arc model derived from the pheromonesynchronized data, the resulting pvalue for the overlap of arcs between the two models is not statistically significant. We believe that this is due in part to limitations inherent in Simon's model which captures only binding data and omits influences of the activity of one factor on another which may be mediated by mechanisms other than by the binding of a transcription factor to the promoter of another. The arcs missing from our model and present in Simon's may be explained as well. If multiple transcription factors are activating transcription of another factor (as in Simon's model), our model may select only a single one of these to best fit the data.
We next built a more general model by identifying in an unsupervised fashion which additional transcription factors most significantly regulate the cell cycle. As explained in the Methods section, for each of the 204 transcription factors we measure not only its αcoefficient at each time point, but also the significance of this estimate in the form of a pvalue. We used an iterative approach, keeping only transcription factors that had a pvalue below 0.1, recomputing the αcoefficients and pvalues for the reduced set of transcription factors. We retained the factors that were significant in the largest number of time points. Among these, we selected those that had periodic αcoefficients. Periodicity was determined using a discrete Fourier transform to compute the fraction of the power spectrum of the αcoefficients of a given transcription factor that laid within a range consistent with the measured time period of 1 cell cycle of S. cerevisiae (see Methods).
As a result, we selected four additional transcription factors – Bas1, Spt2, Ste12, and Yox1 – that significantly regulate the cell cycle. Their αcoefficients are shown in 3. As shown in Table I, most of these have some supporting evidence that they are involved in cell cycle regulation. The exception is Bas1, a factor involved in the purine and histidine pathways that has no known connection to the cell cycle. The robustness of our unsupervised selection methodology is evident by its recovery of 6 of the 8 canonical transcription factors among the top 10 factors (and the probability of seeing such a significant overlap by chance is less than 10^{3}). Finally, we computed the transition matrix for the 12 factors (the 8 canonical and 4 additional factors). This gives our most complete model of transcription factor couplings in the yeast cell cycle (see Figure 4).
Additional File 3. αcoefficients of transcription factors regulating the cell cycle. A heat map (in the style of Figure 2) of the αcoefficients of the transcription factors from our extended 12factor model during two periods of the cell cycle.
Format: PDF Size: 28KB Download file
This file can be viewed with: Adobe Acrobat Reader
Figure 4. Full network of transcription factors regulating the cell cycle. We identified the 4 transcription factors that were most significantly regulating the cell cycle and considered them along with the 8 canonical factors. We constructed a dynamical model of their αcoefficients and show here the nonzero entries in this matrix as directed links between the 12 factors. The resulting network represents the complete network of couplings between cell cycle regulating transcription factors. A line from factor A to factor B indicates that the activity of A affects the activity of B at the next time point. The displayed order of factors minimizes the number of upward arcs (these arcs being grouped on the right side of the figure).
Table 1. Predicted cell cycle regulators.
Although the model with 12 transcription factors is significantly more complex than the network of canonical transcription factors, it recapitulates much of the basic flow observed earlier. For example, we still see the flow of influence from Swi4 to Fkh2 to Ndd1 to Swi5 and back to Swi6. Furthermore, we see additional interactions that are supported by experimental evidence. For instance, the edge between Ste12 and Mcm1 is supported by experimental evidence that Ste12 interacts physically with Mcm1 [28]. Similarly, the link between Mcm1 and Yox1 is supported by experimental evidence that these two interact [29]. We show the mean absolute error (i.e., average magnitude of residuals) in the 12factor model in Figure 5. The model has significant predictive power, although (not unexpectedly) this diminishes as the cell cycle progresses; the residuals are an increasing fraction of mean absolute activity as time increases. This is consistent with the experimental yeast cells gradually losing synchrony in the real data but not in the model.
Figure 5. Model residuals. Shown is a comparison of the αcoefficients of transcription factors estimated using regression and displayed in Additional file 3 to those predicted by our dynamical model. We plot the mean absolute value of the αcoefficients across two periods of the cell cycle versus the mean absolute value of the residuals (i.e., the difference between the measured and predicted αcoefficients). Each time step represents a 7minute interval. We see that the model has significant predictive power, although this diminishes as the cell cycle progresses. This may be due in part to gradual loss of synchrony over time in the experimental yeast cells from which expression was measured.
We performed an analysis of the longterm dynamical properties of the model. As explained in detail in the Methods section, an eigensystem decomposition of the model allows us to identify its modes. In particular, we are interested in the modes whose complex eigenvalues are of magnitude closest to unity, since these modes give the longterm cyclical behavior of the system. Modes whose eigenvalues have magnitude less than 1 die off exponentially over time. Our model also has a single real eigenvalue greater than 1. The mode associated with this eigenvalue captures the fact that αcoefficients of the transcription factors grow over multiple cycles.
The modes from the eigenvalues of magnitude closest to unity generate periodic behavior with a period of 60 minutes, which is very close to the 63minute period observed in the Spellman, et al. data. Furthermore, we can estimate the longterm (i.e., asymptotic) behavior of each factor, which we describe in terms of amplitude and phase (see Methods) and report in Table I and Figure 6. We see that the order of the phases of each factor corresponds closely to those expected from the canonical model: Swi5 and Ace2 are around 0°, Swi4, Swi6, and Mbp1 are around 270°, and Ndd1 is close to 90°. In contrast to the canonical model, however, we see that the αcoefficient of Fkh2 peaks before that of Ndd1. We note that the αcoefficient of Mcm1 peaks between that of Ndd1 and those of Swi5 and Ace2, which is not surprising since it is involved both in the G2/M and M/G1 complexes. We also see from our analysis that Mbp1 and Ndd1 appear to have the highest amplitudes during the course of the cell cycle, while Bas1 – the factor that appears to have no other evidence of its involvement in the cell cycle – has the smallest amplitude.
Figure 6. Asymptotic magnitudes and phases of transcription factor αcoefficients. We perform a dynamical analysis of our 7minute timetranslation matrix to identify the mode that corresponds to the longterm oscillatory behavior of our system. For this mode, we compute the amplitudes and phases that describe the αcoefficients of each factor. The cell cycle proceeds clockwise. Relative to Figure 1, this figure is rotated by approximately 180°.
Discussion
Many cellular programs, such as control of the cell cycle or the metabolic cycle [30], are regulated through the temporallypatterned activation of specific transcription factors. In order to understand these programs more completely, it is necessary to build dynamical models of transcriptional regulation. Approaches to the construction of these models often involve either the collection of coupling parameters between molecules by direct experimental measurements or estimation of model parameters from highthroughput data.
Model building using the former approach by using sets of individual molecular coupling measurements can severely limit the scope of the model. Such measurements are often only available for a small number of molecules. As a result of this limitation, most of these types of models capture couplings between proteins and not between transcription factors and their target genes due to the larger number of parameters such would involve. Furthermore, these types of models are difficult to validate since they cannot be compared to systematicallycollected comprehensive time series data covering all model variables.
A previous dynamical model [22] of the yeast cell cycle following the directmeasurement paradigm included about a dozen molecules that are known to be involved in regulating the cell cycle (e.g., clb2, clb5, cln2, and cdc14), including the transcription factors SBF (composed of Swi4 and Swi6), MBF (composed of Swi6 and Mbp1), and Mcm1. The more than 100 parameters that are involved in this model were estimated from individual experiments and so only a very limited number of couplings between the transcription factors in the model and their targets were included. One appealing characteristic, however, is that this model allows one to explicitly predict the dynamics of mutant strains that lack individual components of the model. In fact, the authors were able to demonstrate that their model reproduced the phenotypes of many cellcycledeficient mutants. However, these comparisons are inevitably more qualitative than quantitative and can only be performed for mutations of modelled genes.
Utilizing the second paradigm, various methodologies have been developed to reconstruct the regulatory network of the yeast cell cycle from highthroughput data [14,15]. These approaches allow systematic comparison of model predictions to genomewide data and they are not restricted a priori to include only a small subset of molecules. However, since these models only account for the regulation of genes by transcription factors, they have not been used to predict the phenotype of cell cycle mutants as such phenotypes do not usually involve mutations in transcriptional regulation.
In contrast to both of these approaches, we have presented a methodology for constructing dynamic models of the core modules of transcriptional regulation from highthroughput data sets. Our model describes the timedependent αcoefficients of transcription factors during the cell cycle. As described above, the model was estimated using both expression and transcription factor binding data. As with other cell cycle models derived from highthroughput data, we are able to directly measure the fit of our model with respect to the data (e.g., as in Figure 5). Our approach need not make any a priori assumptions about which proteins should be included as these may be determined directly from the data.
Using our model, we are able to incorporate the effect of transcription factors binding to all of their known targets rather than just a select subset and demonstrate that couplings between transcription factors are similar to those found in the canonical models of cell cycle regulation. These couplings permit the activity of a transcription factor to affect the activity of another by, for example, activating its transcription, as well as allowing for the possibility that two factors act cooperatively to turn on genes. The underlying principle of our model (e.g., as seen in Equation 1) is that all factors act cooperatively at each promoter at some level. The αcoefficient of a given factor at a given time point is a measure of the extent of the global influence that factor has at that time. Cooperation between factors in our model is therefore manifested by correlation in the αcoefficients of two factors. For example, in Figure 2, it is clear that Mbp1 and Swi6 have correlated αcoefficients and they are indeed known to act in a cooperative fashion since they form the MBF complex. The link between these factors in our networks is capturing cooperative activity (and not transcriptional activation of one factor by another).
In conclusion, unlike other genomewide models of cell cycle transcription, our approach focuses on the relationships between transcription factors and not on the identification of the regulators of individual genes. Since transcription factors are the key regulators of transcriptional programs, we believe this approach provides a more comprehensive view of how these processes are regulated temporally. Furthermore, although it is not possible to completely validate the resulting networks, the results are reasonable from a biological perspective and largely conform to previous models of transcriptional regulation of the yeast cell cycle. As a result, our methodology should reasonably reconstruct regulatory networks whenever expression time series and transcription factor binding data are both available.
Conclusion
We have presented a methodology for describing the temporal regulation of the yeast cell cycle. Our approach allows us to first identify the key regulators of this process, recovering most of the known regulators as well as new ones for which there is some supporting evidence for their involvement. More importantly, our methodology allows us to synthesize the manner in which these transcription factors coordinately regulate the cell cycle. This approach goes beyond existing static models of transcriptional factor networks involved in cell cycle regulation by demonstrating how the estimated transcriptional activities of these proteins are coupled in time.
We believe the methodology we have presented may be applied to model many different biological phenomena. In the case of Saccharomyces cerevisiae, time series expression data of a variety of cellular programs (e.g., the diauxic shift [31] and the metabolic cycle [30]) are already available. In higher eukaryotes, highthroughput data measuring transcriptional changes during circadian cycles has been measured (e.g., [32]). The availability of this and other data presents us a unique opportunity to model the key modules of the associated transcriptional networks, shedding new light on how cells control these complex phenomena.
Methods
Estimation of transcription factor αcoefficients
Taking a logarithm of both sides of Equation 1, we model the logarithm of the ratio of expression data as a linear combination of the logarithms of the transcription factor binding data:
Here, R_{i }is the ratio of expression of gene i to control, α_{j }is the regression coefficient of transcription factor j, and b_{ij }is the measured binding of transcription factor j to the promoter of gene i. We estimate the coefficients α_{j }using multiple linear regression as implemented in the MATLAB function robustfit, a function that also returns an estimate of the standard error of each α_{j}. The binding coefficients b_{ij }are from Harbison, et al. [7] and we treat these as known parameters that are not to be fitted. The expression ratios R_{i }are from Spellman, et al. [20].
We prefiltered the binding data by eliminating all transcription factors that had more than 10 missing values across all genes, leaving 181 factors. We next eliminated all genes that had more than 3 missing values across the 18 time points, leaving 4,033 genes.
For each of the 181 transcription factors, robustfit computes the probability that each α_{j }is nonzero. Roughly, a ttest is made for the ratio of each coefficient divided by its standard error. A detailed derivation of the pvalue, which is beyond the scope of this manuscript, may be found in standard statistical texts [33,34]. We next use an iterative approach to select only transcription factors that have a pvalue below 0.1. We first perform a regression step (initially with the full transcription factor binding matrix) and obtain a pvalue for each of the α_{j}. We then remove from the binding matrix b_{ij }all columns that are associated with transcription factor αcoefficients whose pvalues do not pass the threshold. This procedure is repeated with the reduced binding matrix until all transcription factor αcoefficients have pvalues below 0.1. The choice of 0.1 as threshold is somewhat arbitrary, but this choice has only a minimal effect on the calculation; we have repeated the procedure with other pvalue thresholds such as 0.01 and 0.001 and we obtain similar lists of significant factors and αcoefficients as seen in 4. We also note that the pvalues for the final αcoefficients of the factors that are selected by the iterative procedure are very significant (p < 10^{10}) in all cases.
Additional File 4. Transcription factor rankings used to select the 4 noncanonical transcription factors. Factors were ranked based on the number of time points in which their pvalues were significant beyond some threshold (p < 0.1, 0.01, or 0.001). They were further ranked based on the periodicity of their αcoefficients. Canonical factors are shown in red, significant noncanonical factors are shown in yellow, and other factors are shown in white. We see that the same set of 5 noncanonical factors is selected as most significant for all of the pvalue thresholds.
Format: PDF Size: 14KB Download file
This file can be viewed with: Adobe Acrobat Reader
Determination of timetranslation matrices
We model the dynamics of the system by determining the transition matrix T that predicts the αcoefficients at the next time point from the αcoefficients at the current time point:
A(t+1) = TA(t).
The matrix T is computed using least squares constrained to produce only nonnegative entries via the MATLAB function lsqnonneg. From this matrix, we can infer the network of couplings between transcription factors by identifying the most significant interactions. In cases where two factors were linked by arcs in both directions, we display only the more significant arc.
Computation of asymptotic amplitudes and phases of transcription factor αcoefficients
To analyze the dynamics of the timetranslation matrix, we compute an eigensystem of it using the MATLAB function eigen. Since we are interested in the longterm asymptotic behavior of our model, we focus on the two nonreal conjugate eigenvalues whose magnitude is closest to unity and the two conjugate eigenvectors associated with them. The eigenvectors with eigenvalues smaller than these in magnitude decay exponentially in time compared to this mode and hence contribute nothing asymptotically. The spectrum of eigenvalues is shown in 5. The spectrum does contain a single real eigenvalue greater than 1. This generates growing αcoefficients, and is likely a result of the cyclical nature of the network with only nonnegative coefficients, these generating positive feedback. However, for the purpose of estimating the relative phases and amplitudes of the transcription factor αcoefficients, it is appropriate to focus on the alreadymentioned nonreal eigenvalues of greatest magnitude.
Additional File 5. Eigenvalues of the timetranslation matrix of the 12factor model. Each application of the timetranslation matrix moves time forward 7 minutes.
Format: PDF Size: 37KB Download file
This file can be viewed with: Adobe Acrobat Reader
Using the polar form
λ = λe^{iθ}
for the dominant nonreal eigenvalue of the timetranslation matrix in the upper halfplane, the period of the associated mode is given by
where Δt is the time (here, 7 minutes) between successive points (e.g., microarrays) in the expression time series.
Collect the eigenvectors of the timetranslation matrix as the columns of a complex matrix V and let complex matrix W be the matrix inverse of V. To find the longterm asymptotic phase and amplitude of each transcription factor, let z be the complex number that is the dot product of the row of W corresponding to one of the dominant nonreal eigenvalues with the initial state, and let v be the column of V corresponding to this same eigenvalue. Then the asymptotic amplitudes and phases of the transcription factors are the polar forms of the successive entries of 2zv. A detailed derivation is provided in 6.
Additional File 6. Asymptotic phases and amplitudes. A derivation of the formulas used for the asymptotic phases and amplitudes of components of real systems governed by a general class of real timetranslation matrices is given.
Format: PDF Size: 69KB Download file
This file can be viewed with: Adobe Acrobat Reader
Identification of periodic transcription factors
To identify the transcription factors that have periodic αcoefficients, we compute the discrete Fourier transform of the αcoefficient profile for each factor. Taking the magnitudes of Fourier components (i.e., the "power spectrum") we calculate the fraction of spectral power that falls in components 7, 8, 9, and 10 as these correspond to an interval of periods containing the expected period for a single cycle of the Spellman data.
Authors' contributions
The calculations described in this manuscript were performed in roughly equal contributions by SC, SR, and MP. DH and NGJ provided essential comments and guidance. The manuscript was written by MP and edited by SC. Network layouts, Figure and Additional file drawings, and Additional File 6 were by SC.
Acknowledgements
SR was supported by the Southern California Bioinformatics Summer program.
References

Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture.
Nat Genet 1999, 22(3):281285. PubMed Abstract  Publisher Full Text

Pilpel Y, Sudarsanam P, Church GM: Identifying regulatory networks by combinatorial analysis of promoter elements.
Nat Genet 2001, 29(2):153159. PubMed Abstract  Publisher Full Text

Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their conditionspecific regulators from gene expression data.
Nat Genet 2003, 34(2):166176. PubMed Abstract  Publisher Full Text

Wyrick JJ, Young RA: Deciphering gene expression regulatory networks.
Curr Opin Genet Dev 2002, 12(2):130136. PubMed Abstract  Publisher Full Text

Conlon EM, Liu XS, Lieb JD, Liu JS: Integrating regulatory motif discovery and genomewide expression analysis.
Proc Natl Acad Sci U S A 2003, 100(6):33393344. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M: Genomic analysis of regulatory network dynamics reveals large topological changes.
Nature 2004, 431(7006):308312. PubMed Abstract  Publisher Full Text

Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome.
Nature 2004, 431(7004):99104. PubMed Abstract  Publisher Full Text

Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae.
Science 2002, 298(5594):799804. PubMed Abstract  Publisher Full Text

Haverty PM, Hansen U, Weng Z: Computational inference of transcriptional regulatory networks from expression profiling and transcription factor binding site identification.
Nucleic Acids Res 2004, 32(1):179188. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bussemaker HJ, Li H, Siggia ED: Regulatory element detection using correlation with expression.
Nat Genet 2001, 27(2):167171. PubMed Abstract  Publisher Full Text

Gao F, Foat BC, Bussemaker HJ: Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data.
BMC Bioinformatics 2004, 5:31. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP: Network component analysis: reconstruction of regulatory signals in biological systems.
Proc Natl Acad Sci U S A 2003, 100(26):1552215527. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Das D, Banerjee N, Zhang MQ: Interacting models of cooperative gene regulation.
Proc Natl Acad Sci U S A 2004, 101(46):1623416239. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chen KC, Wang TY, Tseng HH, Huang CY, Kao CY: A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae.
Bioinformatics 2005, 21(12):28832890. PubMed Abstract  Publisher Full Text

Chen HC, Lee HC, Lin TY, Li WH, Chen BS: Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle.
Bioinformatics 2004, 20(12):19141927. PubMed Abstract  Publisher Full Text

Banerjee N, Zhang MQ: Identifying cooperativity among transcription factors controlling the cell cycle in yeast.
Nucleic Acids Res 2003, 31(23):70247031. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tsai HK, Lu HH, Li WH: Statistical methods for identifying yeast cell cycle transcription factors.
Proc Natl Acad Sci U S A 2005, 102(38):1353213537. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Yu X, Lin J, Masuda T, Esumi N, Zack DJ, Qian J: Genomewide prediction and characterization of interactions between transcription factors in Saccharomyces cerevisiae.
Nucleic Acids Res 2006, 34(3):917927. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chang YH, Wang YC, Chen BS: Identification of transcription factor cooperativity via stochastic system model.

Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.
Mol Biol Cell 1998, 9(12):32733297. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW: A genomewide transcriptional analysis of the mitotic cell cycle.
Mol Cell 1998, 2(1):6573. PubMed Abstract  Publisher Full Text

Chen KC, Calzone L, CsikaszNagy A, Cross FR, Novak B, Tyson JJ: Integrative analysis of cell cycle control in budding yeast.
Mol Biol Cell 2004, 15(8):38413862. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Simon I, Barnett J, Hannett N, Harbison CT, Rinaldi NJ, Volkert TL, Wyrick JJ, Zeitlinger J, Gifford DK, Jaakkola TS, Young RA: Serial regulation of transcriptional regulators in the yeast cell cycle.
Cell 2001, 106(6):697708. PubMed Abstract  Publisher Full Text

Wittenberg C, Reed SI: Cell cycledependent transcription in yeast: promoters, transcription factors, and transcriptomes.
Oncogene 2005, 24(17):27462755. PubMed Abstract  Publisher Full Text

de Lichtenberg U, Jensen LJ, Brunak S, Bork P: Dynamic complex formation during the yeast cell cycle.
Science 2005, 307(5710):724727. PubMed Abstract  Publisher Full Text

Roven C, Bussemaker HJ: REDUCE: An online tool for inferring cisregulatory elements and transcriptional module activities from microarray data.
Nucleic Acids Res 2003, 31(13):34873490. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Holter NS, Maritan A, Cieplak M, Fedoroff NV, Banavar JR: Dynamic modeling of gene expression data.
Proc Natl Acad Sci U S A 2001, 98(4):16931698. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Primig M, Winkler H, Ammerer G: The DNA binding and oligomerization domain of MCM1 is sufficient for its interaction with other regulatory proteins.
Embo J 1991, 10(13):42094218. PubMed Abstract  PubMed Central Full Text

Pramila T, Miles S, GuhaThakurta D, Jemiolo D, Breeden LL: Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECBdependent transcription to the M/G1 phase of the cell cycle.
Genes Dev 2002, 16(23):30343045. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tu BP, Kudlicki A, Rowicka M, McKnight SL: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes.
Science 2005, 310(5751):11521158. PubMed Abstract  Publisher Full Text

Gasch AP, WernerWashburne M: The genomics of yeast responses to environmental stress and starvation.
Funct Integr Genomics 2002, 2(45):181192. PubMed Abstract  Publisher Full Text

Blasing OE, Gibon Y, Gunther M, Hohne M, Morcuende R, Osuna D, Thimm O, Usadel B, Scheible WR, Stitt M: Sugars and circadian regulation make major contributions to the global regulation of diurnal gene expression in Arabidopsis.
Plant Cell 2005, 17(12):32573281. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Dumouchel W, O'Brien F: Integrating a robust option into a multiple regression computing environment.
Ima Volumes In Mathematics And Its Applications 1992, 4148.