Abstract
Background
Living cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. Deciphering the mechanisms of how the TFs control the expression of target genes is a challenging task, especially when multiple TFs collaboratively participate in the transcriptional regulation.
Results
We model the underlying regulatory interactions in terms of the directions (activation or repression) and their logical roles (necessary and/or sufficient) with a modified association rule mining approach, called mTRIM. The experiment on Yeast discovered 670 regulatory interactions, in which multiple TFs express their functions on common target genes collaboratively. The evaluation on yeast genetic interactions, TF knockouts and a synthetic dataset shows that our algorithm is significantly better than the existing ones.
Conclusions
mTRIM is a novel method to infer TF collaborations in transcriptional regulation networks. mTRIM is available at http://www.msu.edu/~jinchen/mTRIM webcite.
Background
The complex gene expression programs in living cells are moderated by regulatory proteins called transcription factors (TFs) [1]. In the context of a transcriptional regulatory network (TRN), a TF may act independently or collaboratively with other TFs [2], leading to complex regulatory interactions that influence the transcription of target genes [3,4]. A regulatory interaction includes target genes and all the TFs that control their transcriptional activities. An individualTF regulatory interaction has been defined in terms of two properties: the TF's functional role as an activator or a repressor, and its logical role as being necessary or sufficient (see Figure 1a) [3,5]. The categories in the TF's functional and logical roles are combinable; they can be activator necessary (AN), activator sufficient (AS), or activator necessary and sufficient (ANS). For example, pheromone response elements are necessary and sufficient for basal and pheromoneinduced transcription of the FUS1 gene of yeast [6]. Similarly for TFs that are repressors, they can be RN, RS or RNS [7]. In a multipleTF regulatory interaction, a group of TFs collaborate to control the expression levels of the same target genes. The directions of all the TFs in the group, therefore, form a transcriptional regulation pattern of the target genes. Recent developments in biotechnology (such as ChIP [8] and yeast onehybrid [9]) have been applied to uncover TFtarget binding relationships [10,11] to reconstruct draft regulatory circuits at a systems level [3,4,12]. Furthermore, to identify regulatory interactions in vivo and consequently reveal their functions, TF single/double knockouts and overexpression experiments have been systematically carried out [13]. However, the results of many single or doubleknockout (or overexpression) experiments are often nonconclusive [14], since many genes are regulated by multiple TFs with complementary functions [4]. For example, in yeast (one of the most wellstudied eukaryotic organisms), 47% of genes are bound by at least two TFs [15], and approximately 73% (~4,500) of the known genes are nonessential [16], suggesting that higher order genetic variations are needed for precise inference of transcriptional regulations.
Figure 1. (a) Concepts of regulatory interaction and (b) an illustrative example of regulatory interaction in a TRN, in which three TFs collaboratively coregulate two target genes.
Considering the prohibitive costs and the tremendous number of possible combinations of higherorder gene knockouts, it is currently impossible for researchers to examine all of possible gene knockout combinations experimentally. One solution to this problem is to select only the TF groups that are most likely to bring about the phenotypic change. In order to accomplish this, we need to understand the interactions employed by multiple TFs (called regulatory interactions) to regulate their common target genes. However, this is a difficult task, because when multiple TFs simultaneously or sequentially control their target genes, a single gene responds to merged inputs, resulting in complex gene expression patterns [17,18]. The exhaustive approach requires enumerating all TF combinations, which, given the high complexity of combinatorial, is simply impractical at the whole genome level.
In our previous research [19], a Hidden Markov model was developed to relate gene expression patterns to regulatory interactions, in order to solve a relatively simpler subproblem that considers only two TFs. To predict regulatory interactions for all possible collaborative TFs, we propose an algorithm called "mTRIM" (multiple Transcriptional Regulatory Interaction Mechanism) in this paper. By uncovering the regulatory interactions in terms of their directions (activation or repression) and corresponding logical roles (necessary and/or sufficient) from gene expression and TFDNA binding data, mTRIM identifies TF groups that are collaboratively responsible for target gene expressions. Such inferences may provide highquality candidate sets for further experimentally detecting the collaborative functions of gene regulations that are largely unknown [18]. Yeang and Jaakkola [3] attempted to characterize the combinatorial regulation of multipleTF regulatory interactions using a heuristic approach to measure how well a regulatory module fits the associated binding and gene expression data with a loglikelihood function. The regulatory module's likelihood is maximized with a greedy approach by incrementally adding genes to the module and monitoring the predictions of the regulatory interactions for optimality. However, this incremental approach does not study the functions of the TFs simultaneously because of the scalability issue introduced by the greedy search. This method also uses a probabilitybased approach to calculate the significance of the combinatorial property of TFs, determined by the gap of likelihood scores between their model and a model built on randomized data in the entire time frame. However, as stated in [4], a TF usually functions at specific "activation time points" instead of throughout the entire time course, meaning that the identification of regulatory interaction modules should be focused on activation timepoints rather than the entire time frame.
To derive dynamic regulatory networks that associate TFs with target genes at their activation timepoints, an algorithm called DREM was proposed [4]. DREM integrates timeseries gene expression data and proteinDNA binding data to build a global temporal map, in order to uncover transcriptional regulatory events leading to the observed temporal expression patterns and the underlying factors that control these events during a cell's response to stimuli. The method mainly works by identifying bifurcation timepoints where the expression of a subset of genes diverges from the rest of the genes. The bifurcation points are then annotated with the TFs regulating these transitions, which result in a unified temporal map. The method can therefore facilitate the determination of the time when TFs are exerting their influence, and assigns genes to paths in the map based on their expression profiles and the TFs that control them. Unlike the method by Yeang and Jaakkola [3], DREM's ability to derive dynamic maps that associate TFs with the genes they regulate and their activation timepoints has indeed led to better insights for the regulatory module being studied. However, DREM does not infer the logical roles of the TFs (i.e., whether a specific TF is necessary or sufficient for regulating a set of target genes). Such knowledge is extremely useful for designing highorder genetic variation experiments to understand the complex regulatory mechanisms of biological processes.
TRIM is an HMM based model which was developed to infer the collaboration of at most two TFs that regulate the same target genes. In the HMM, the functions of a TF are hidden states. The model starts with random priors, and then is iteratively trained using EM till convergence. Since each possible function of a TF is a node in the HMM, there are four nodes (AS, AN, RS, and RN) for each TF. With the design of HMM (and the limited training data), the number of TFs TRIM can handle is limited.
The enumeration of all TF combinations is clearly a NP problem. Therefore, we focused on the most important biological problem (i.e., 2TF combination) and therefore "hardcoded the problem in TRIM. In this paper, we solve the efficiency problem by developing an association rule mining algorithm which is capable to handle a large amount of data with highlevel combinations.
In this paper, we propose a new model mTRIM for inferring regulatory interactions for multiple TFs with an EMbased Bayesian inference approach [20,21] and a modified bottomup association rule mining method. Experimental results evaluated with yeast genetic interactions, TF knockouts and a synthetic dataset shows that our algorithm is significantly better than the existing ones.
Methods
mTRIM is developed to efficiently infer regulatory interactions for all possible collaborative TFs in a TRN. The feasibility is achieved in two steps. First, an EMbased Bayesian inference approach is developed to identify all the significant individual TF regulatory interactions, meaning that individual TFs that can regulate the target genes independent to the existence of other TFs. For the TFs which require collaborations with other TFs to drive the target genes, or are actually nondeterministic (meaning lack of clear evidence of regulation), their pvalues are insignificant. They are considered as the inputs of the second step.
Second, in order to identify the collaboration of k TFs (k ≥ 2), i.e., kTF regulatory interaction, a bottomup association rule mining approach is developed. While the significant TF groups are reported to the users, the insignificant ones are joined with each other to mine (k + 1)TF regulatory interactions. It should be noted that unlike the conventional association rule mining which seeks the longest possible patterns, mTRIM outputs the shortest significant results, in that the goal of mTRIM is to discover the smallest group of TFs that can regulate the target genes, so that biological experiments with highorder genetic variations can be subsequently carried out for the understanding of the behavior of TRNs. In terms of time complexity, consider a candidate kTF regulatory interaction . The algorithm computes AfnScore and pvalues of all of the subsets, I  {tf_{j}} (∀j = 1, 2, ..., k). If one of them is significant, I is immediately pruned. Hence the time complexity is O(k) for each candidate kTFs regulatory interaction. Every merging operation requires at most k  2 equality comparisons. In the bestcase scenario, it produces a viable candidate kTF interaction. In the worst case, the algorithm merges every pair of infrequent (k  1)TF candidates. Therefore, the overall cost of merging candidates is between , where P_{k }is the candidate set of kTF regulatory interactions. To improve the algorithm efficiency, a hash tree is constructed for the storage and quick access to all of the candidates. Because the maximum depth of the hash tree is k, the cost for populating the hash tree of candidates is . During candidate pruning, it is required to verify whether the k  1 subsets of every candidate kTF regulatory interactions are significant. Since the cost for looking up an item in a hash tree is O(k), the time complex of candidate pruning step is .
Concepts
A TRN can be represented as a directed graph in which each node is a TF or a gene, and each edge pointing from a TF to a gene represents a regulation relationship between them. In many organisms, indepth transcriptome analysis has revealed the modular architecture of gene expression [22]. A regulatory module is a selfconsistent regulatory unit R(TF, G, I) representing a set of coexpressed genes G = {g_{1}, g_{2}, ..., g_{n}} regulated in concert by a group of TFs in TF = {tf_{1}, tf_{2}, ..., tf_{m}} that govern the target genes' behaviors via regulatory interaction I [5]. An example of the regulatory module is shown in Figure 1b.
A regulatory interaction (which is the final output of mTRIM) is defined as a set of TFs {tf_{1}, ..., tf_{m}} coregulating a set of genes {g_{1}, ..., g_{n}}, where is the behavior of TF i; h_{g }is the behavior of all the target genes in R, and h_{x }∈ {↑, ↓, }, meaning upexpress, downexpress and no change respectively. For example, if tf_{1 }↑ and tf_{2 }↓ always cause the target genes g_{1 }and g_{2 }to be upregulated, the regulatory interaction is <tf_{1 }↑, tf_{2 }↓> ⇒ g ↑. For individual regulatory interactions, I ∈ {AN, AS, RN, RS, ANS, RNS}. In this work, we assume that a regulatory interaction is consistent in the context of transcriptional control as long as the experimental conditions are unchanged. Note that binaries gene expression values are used in mTRIM, since TF activity is not always proportional to its mRNA abundance [23].
mTRIM Step 1. Inferring individual regulatory interactions
To solve a relatively easier problem of inferring the regulatory interactions for each individual TF and to prepare input for multiTF regulatory interaction inference, an EMbased Bayesian inference algorithm has been developed [20,21].
To define the probabilities in Eq. 2 and Eq. 3, we followed the definitions in [20]. Eq 2 represents the prior probability of the interaction model I_{m}, and Eq 3 represents the probability of gene expression correlation between TFs and targets given the interaction model I_{m}. In the Bayesian model, the training dataset is a matrix that contains gene expression levels of TFs and their targets, from which Γ(I_{m}) is estimated using Eq 4. And then, the likelihood is calculated using Eq 3. The prior probabilities are randomly assigned initially. In each iteration, the posterior probabilities and the frequency of I_{m }are updated. The iteration will continue till the posterior probabilities converge.
Let Pos be the posterior probability of a TF tf_{m }to have a specific regulatory interaction I_{m }in regulatory module R_{k}, where I_{m }∈ {AN, AS, RN, RS} (ANS and RNS will be discussed later). To infer Pos, both the prior probabilities Pri and the likelihood Lk of the same TF need to be computed, given that:
where Pri(I_{m}) is the prior probability of regulatory interaction I_{m }(defined in Eq 2) and the likelihood Lk(tf_{m}, R_{k}, I_{m}) is defined in Eq 3.
The prior probability Pri(I_{m}) captures how likely a given interaction I_{m }exists given the background of all of the other TFs:
where fre(I_{m}) is the frequency of regulatory interaction I_{m }in all of the regulatory modules, R is the number of the regulatory modules, TF is the number of TFs, and I_{m }∈ {AS, RS, AN, RN}.
Given the definition of a regulatory interaction, the likelihood Lk(tf_{m}, R_{k}, I_{m}) indicates how likely tf_{m }in R_{k }has regulatory interaction I_{m}, which is defined by the expression level changes of the TF and its targets:
where T is the number of timepoints in the training data, G is the number of genes in regulatory module R_{k}, and Γ(I_{m}) is defined as:
An expectationmaximization (EM) algorithm is adopted to maximize the posterior probabilities Pos(tf_{m}, R_{k}, I_{m}). The EM model is initialized with each TF assigned a random regulatory interaction. In the expectation step, we compute the likelihood of each TF to be a specific interaction using Eq 3. Consequently, the posterior probabilities of interactions for every TF is updated with Eq 1. As a result, each TF is assigned with the regulatory interaction with the highest posterior probability. In the maximization step, we maximize the scoring function for each regulatory module R_{k}, which measures how the interaction of each TF in R_{k }matches the target gene expression changes. Note that in the iteration the priors are updated but the likelihoods are constant.
Finally, in order to determine whether I_{m }is "necessary and sufficient" (ANS and RNS) or "no decision", the following strategy is adopted: if none of the posterior probabilities are significant, the output is "no decision"; if the probabilities of both N and S states are significant, and there is no significant difference between them, the output is ANS or RNS depending on the target gene expression direction; otherwise the output is the regulatory interaction with the highest posterior probability.
An illustrative example is shown in Figure 1b, in which tf_{1}, tf_{2 }and tf_{3 }regulate target genes g_{1 }and g_{2}, and they all belong to the same regulatory module R_{k}. With the gene expression changes in Table 1, we start with equal prior probabilities, i.e., Pri(AS) = Pri(RS) = Pri(AN) = Pri(RN) = 0.25, so Lk(tf_{1}, R_{k}, AN) = 12/26 = 0.461, (Eq 3). After 10 iterations, in the expectation step, Pri(AN) is updated to 0.70 (Eq 2), hence Pos(tf_{1}, R_{k}, RS) = 0.70 × 0.461 = 0.323 (Eq 1). In the maximization step, we have <tf_{1 }↓> =>g ↓, because the maximum posterior probability is assigned to AN with pvalue 0.05 (see Table 2 row 1).
Table 1. Illustrative example of timeseries gene expression data for the genes in Figure 1b.
Table 2. Illustrative example of regulatory interaction identification on the TRN in Figure 1b.
mTRIM Step 2. Mining multipleTF regulatory interactions
Besides the individual TF regulatory interactions, a significant portion of TFs collaboratively work together to regulate the same target genes. In order to identify these multipleTF regulatory interactions, a new association rule mining approach has been developed. Instead of using the concepts of support and confidence that are commonly used in a conventional association rule mining application [24], we define an affinity scoring function (called AfnScore) according to the gene expression agreement between the TF groups and their target genes, to meet the biological meaning of a multipleTF regulatory interaction (see Section Background). Mathematically, AfnScore of each candidate regulatory interaction is calculated with:
where P(x) is the number of times that x appears in the given time series gene expression dataset divided by the product of the total number of time points and the total number of target genes. The pvalue of each candidate regulatory interaction is computed by considering the distribution of AfnScore for the regulatory interactions with the same number of TFs. Only the candidate interactions with pvalues smaller than 0.05 are reported to the user. Specifically, if all the TFs in I are upregulated, the TFs are "sufficient"; if they are all downregulated, the TFs are "necessary"; otherwise, each TF acts differently to drive the target genes to the same direction.
To identify all the significant kTF regulatory interactions, the new association rule mining algorithm starts with an empty set Q_{k }and all the insignificant (k  1)TF interactions saved in P_{k1 }(see pseudocode in Figure 2 line 1). For interactions and in P_{k1}, we combine them and compose a new interaction I_{12 }(line 3), if I_{1 }and I_{2 }are combinable. We define that I_{1 }and I_{2 }are combinable if and only if they satisfy the conditions that (for i = 1, 2, .., k  2) and . If none of the (k  1)TF subsets of I_{12 }is significant (line 28), I_{12 }is added to candidate set C and its AfnScore is computed. Finally, we compute pvalues for all of the kTF candidates in C using ttest, report all of the significant regulatory interactions to the user, and save all the insignificant ones P_{k }to for the identification of the (k + 1)TF regulatory interactions (line 917).
Figure 2. Procedure of identifying significant multipleTF regulatory interactions.
For an illustrative example, there are 40 possible multipleTF regulatory interactions in the regulatory module shown in Figure 1b. Using the timeseries gene expression data in Table 1 all the 2TF regulatory interaction candidates are screened and their pvalues are computed (see Table 2 row 24). Since none of the 2TF regulatory interaction candidates is significant, a 3TF interaction I_{4 }=<tf_{1 }↑, tf_{2 }↑, tf_{3 }↓>=>g ↑ is generated by merging I_{2 }and I_{3}. The AfnScore of I_{4 }is ((10/24) * (10/24))/(12/24)) = 0.347 and its pvalue is 0.04 (see Table 2 row 5). Based on I_{0 }and I_{4}, we conclude that the target genes g_{1 }and g_{2 }are induced by the upexpression of tf_{1 }and tf_{2 }and the downexpression of tf_{3}, and the same target genes are repressed by the downexpression of tf_{1}.
Experimental results
mTRIM was applied on two independentlyconstructed yeast transcriptional regulatory networks (the Harbison dataset [15] and the Reimand dataset [12]) to identify regulatory interactions. For performance comparison, DREM v3.0 [17] and TRIM [19] were both applied on the same datasets. We did not compare mTRIM with Yeang's method [3] because the latter's objective is to build a reliable TRN instead of predicting regulatory interactions. We evaluated these methods systematically with three independent sources: single TF knockouts [16] for individual regulatory interactions, genetic interactions (GI) [25] for 2TF regulatory interactions and synthetic data for highorder regulatory interactions.
Using the EMBased Bayesian inference approach, 658 significant individual regulatory interactions were mined in the Harbison dataset and 164 significant ones were mined in the Reimand dataset (Table 3). The results show that while many individual TFs drive target genes' behaviors, it is clear that most of them (4,414 in the Harbison dataset and 1,539 in the Reimand dataset) are "no decision". It indicates that a large proportion of TFs need to work collaboratively with other TFs.
Table 3. The number and type of the regulatory interactions for individual TFs predicted by mTRIM.
MultipleTF regulatory interactions were inferred with a new association mining algorithm. In total, 670 regulatory interactions with multiple TFs were discovered (Table 4). The results show that at most 6 TFs collaboratively regulate the same target genes. All the TF combinations with more than 6 TFs are either insignificant or have a significant subset. The whole experiments finished in 30 minutes on a high performance computer cluster.
Table 4. Number of the multipleTF regulatory interactions identified by mTRIM.
Data preparation
Yeast ChlPchip binding data [15] was downloaded from http://younglab.wi.mit.edu/regulatory_code webcite, and a pvalue cutoff of 0.001 was applied (the same threshold used in [4]) to obtain the Harbison dataset. It contains 169 TFs, 2,864 target genes and 6,253 TFDNA bindings. Next we applied the same statistical approach as in [12] to filter the union of the yeast ChlPchip binding data [26] and the bindingsite predictions [27,28] to generate the Reimand dataset with 2,230 TFDNA binding relationships between 268 TFs and 1,509 target genes. To obtain the regulatory modules in the TRNs, all the target genes were clustered based on their gene expression values with Cluster 3.0 (specifically, kmeans), which uses Pearson correlation coefficient for gene similarity metric [29], resulting in 50 clusters. The clusters are then evaluated with Gene Ontology enrichment analysis using Bingo [30], and unenriched clusters are discarded. To construct regulatory modules from the clustering results, the target genes that are regulated by the same TFs were partitioned if they are not in the same cluster. Finally, 2,172 and 1,031 regulatory modules were obtained in the Harbison and Reimand networks respectively. The distribution of genes and regulatory modules (Figure SI and Table S2 in Additional file 1) reveal that many genes are bound by multiple TFs.
Additional file 1. Supplementary Materials for Awad et al. Figures SI, S2, Table SI, and S2. This file contains Figures SI, S2, Tables SI and S2.
Format: PDF Size: 1MB Download file
This file can be viewed with: Adobe Acrobat Reader
To identify the individual and collaborative regulatory interactions in the above datasets, three widely used timeseries microarray datasets (alpha, CDC28 and elu) from yeast cell cycle studies were collected [31] as training data. These datasets contain 49 time points in total. In these experiments, yeast cells were first synchronized to the same cell cycle stage, released from synchronization, and then the total RNA samples were taken at even intervals for a period of time (Table SI in Additional file 1). In order to decide whether a gene is significantly up or down regulated, a gene expression change cutoff of 0.35 was applied (the same threshold used in [19]).
To evaluate the individual regulatory relations, singleTF knockout microarray data were collected [16], and a pvalue cutoff of 0.05 (as used in [16]) was applied to determine whether a gene is significantly affected by a TF knockout. To evaluate the 2TF regulatory interactions, we downloaded the SGA genetic interaction dataset [25], which is composed of 1,711 queries crossed to 3,885 array strains. Of the 1,711 queries, 1,377 are deletion mutants of nonessential genes and 334 are essential gene alleles. The SGA dataset contains 762,146 genetic interactions. Two genes are genetically interacted if mutations in both of them produce a phenotype that is significantly different to each mutation's individual effects. In a 2TF regulatory interaction, if TFs collaboratively regulate the same target genes, the downregulation of both TFs should have a significantly different phenotype as the down regulation of each individual TF. Therefore, such TF pairs should have a significant pvalue in the GI dataset. To evaluate the highorder multipleTF regulatory interactions, a synthetic binding network were built, which contains 11 TFs, 17 target genes and 58 regulation/binding relationships. The network also contains two feed forward loops. Corresponding timeseries gene expression data containing 500 timepoints were randomly generated with 10% or 40% noise rate.
Evaluation 1. Single TF knockouts
We used the single TF knockout microarray data to evaluate the performance of mTRIM on individual TF regulatory interaction predictions in terms of the identification of "necessary" TFs (i.e., if the expression values of the target genes are significantly changed when the TF is knocked out). For the Harbison dataset, the prediction precision of mTRIM is 94.44%, higher than the results of TRIM (82.50%). Using the Reimand dataset, mTRIM has a precision of 91.94%, significantly higher than the results of TRIM (61.54%). DREM is not compared since it does not predict "necessary" TFs.
Evaluation 2. Genetic interaction
In a regulatory module with two TFs, if both TFs collaborate to regulate the same target genes, the down regulation of both TFs should have significantly different phenotypes from the downregulation of each individual TF. Therefore, such TF pairs should have a significant pvalue in the GI dataset. To this end, for the pairs of TFs that are predicted by mTRIM to work collaboratively, we adopted the GI dataset [25] for evaluation. Figure 3 (a) and 3 (b) shows the Receiver Operating Characteristic curve (ROC) of mTRIM, TRIM and DREM on Harbison dataset and Reimand dataset respectively. For Harbison dataset, the area under curve (AUC) of mTRIM is 0.81, much higher than the AUC of DREM (0.51) and TRIM (0.75). For Reimand dataset, the AUC of mTRIM is 0.80, higher than DREM (0.52) and TRIM (0.64). In addition, to explore whether the performance of mTRIM is sensitive to parameter settings, we altered its parameters systematically. For the Harbison dataset, Figure S2 in Additional file 1 shows the AUC values with different gene expression cutoffs, GI cutoffs, and pvalue cutoffs of AfnScore respectively. Similarly, for Reimand dataset, Figure S2 in Additional file 1 shows the varying of the AUC values using different thresholds. These show that our method is robust with the GI cutoff and pvalue cutoff of AfnScore, although its performance gradually decreases with the increase of gene expression cutoffs.
Figure 3. Evaluation of the 2TF regulatory interactions using genetic interactions on (a) Harbison dataset and (b) Reimand dataset.
Evaluation 3. Synthetic transcriptional regulatory networks
A synthetic transcriptional regulatory network was generated to evaluate the performance of mTRIM in detecting highorder multipleTF regulatory interactions (see Figure 4). The synthetic network has 28 nodes (11 TFs and 17 target genes) and 58 edges, in which the solid line represents a real transcriptional regulation and 12 (20.69%) dotted lines represent TFDNA bindings but no regulation. The dotted lines were added to the network in order to test the precision of mTRIM. For the synthetic network, two time series gene expression datasets with 500 timepoints were generated. In order to test the robustness of mTRIM, we repeated the simulation test twice with different rates of noises added to the simulated gene expression data sets.
Figure 4. A synthetic transcriptional regulatory network, in which the solid lines represent transcriptional regulations and the dotted lines represent TFDNA bindings only (meaning binding but not regulation).
A comparison between all the three algorithms (see Figure 5) indicates that the performance of mTRIM is constantly the best on precision, specificity and sensitivity (equivalent to recall). Precisely, the precision of mTRIM is 87.5%, while the precisions of DREM and TRIM are 62.5% and 66.67% respectively. The recall of mTRIM is significantly higher than TRIM because it identified 4 out of 5 regulatory interactions with more than two TFs, while TRIM, because of the scalability issue, cannot find any regulatory interactions with more than two TFs. It also shows that mTRIM is less sensitive to the change of the noise rates from 10% to 40% in the gene expression data than the other two algorithms.
Figure 5. Comparing mTRIM with DREM and TRIM on two sets of synthetic data with different noise rates.
Case studies
In Figure 6a, a 2TF regulatory interaction that controls 12 target genes were found in the Harbison dataset. The yellow colored nodes are TFs and the green colored nodes are their target genes. The red boxes of the dotted lines represent the time points when the TFs collaborate with each other to regulate the target genes. STE12 (which is activated by a MAP kinase signaling cascade) activates genes involved in mating or pseudohyphal/invasive growth pathways. DIG1 is the MAP kinaseresponsive inhibitor of STE12. The target genes are enriched in "response to pheromone" (6 genes), "growth" (3 genes) and so on. The collaboration between STE12 and DIG1 on cell growth was captured by mTRIM successfully. Another interesting result found in the same dataset is a 6TF regulatory interaction (Figure 6b). All the six TFs are wellcharacterized in yeast but are considered to function in different pathways. Our finding connects the distinct biological processes, revealing potential TF collaborations at the transcription level.
Figure 6. Case studies of regulatory interactions in the Harbison dataset: (a) a 2TF (STE12 and DIG1) regulatory interaction controlling 12 genes; (b) a 6TF regulatory interaction, which reveals the collaboration of multiple pathways.
Conclusion
Revealing the mechanisms of the transcriptional regulatory programs in TRNs is essential for understanding the complex control by which genes are expressed in living cells. The inference of collaborative proteinDNA functions helps paving the critical path for new drug development. In this work, we identify the regulatory interactions between TFs and target genes with mTRIM, an integration of an EMbased Bayesian inference and a new association rule mining approach built on a set of basic constraints that relate gene expression patterns to regulatory interactions. mTRIM is not limited by the number of TFs. The experimental results show that mTRIM is clearly better than the existing algorithms. Since it is difficult to obtain the ground truth for algorithm performance evaluation on real data, we generated two sets of synthetic data and used them to validate the results of our algorithm. In our future work, we will use thirdparty biological evidences including multiple TF knockouts, metabolic pathways, proteinprotein interactions, etc., for biological validation. In our future work, we would like to extend this work by including extra data in addition to wildtype gene expression datasets. For example, since miRNA can degrade the genes induced by certain TFs [32], we will consider miRNAtarget bindings and miRNA expressions, aiming to understand how miRNAs and TFs collaborate to regulate target gene expressions.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JC conceived the project. Sa and JC designed the algorithm and experiments. SA implemented the algorithm and finished the experiments.
Acknowledgements
This project has been funded by the Egyptian Government GM 845.
Declarations
The publication costs for this article were funded by the corresponding author's institution.
This article has been published as part of BMC Systems Biology Volume 8 Supplement 1, 2014: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/8/S1.
References

Qiu P: Recent advances in computational promoter analysis in understanding the transcriptional regulatory network.
Biochem Bioph Res Co 2003, 309:495501. Publisher Full Text

MaienscheinCline M, Zhou J, White K, Sciammas R, Dinner A: Discovering Transcription Factor Regulatory Targets Using Gene Expression and Binding Data.
Bioinformatics 2011, 28:206213. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Yeang H, Jaakkola T: Modeling the combinatorial functions of multiple transcription factors.
J Comput Biol 2006, 13:463480. PubMed Abstract  Publisher Full Text

Ernst J, Vainas O, Harbison C, Simon I, BaxJoseph Z: Reconstructing dynamic regulatory maps.

Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their conditionspecific regulators from gene expression data.
Nat Genet 2003, 34:166167. PubMed Abstract  Publisher Full Text

Hagen D, McCaffrey G, Sprague G: Pheromone Response Elements Are Necessary and Sufficient for Basal and PheromoneInduced Transcription of the FUS1 Gene of Saccharomyces cerevisiae.

Babur O, Demir E, Gonen M, Sander C, Dogrusoz U: Discovering modulators of gene expression.
Nucleic Acids Res 2010, 38:56485656. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Park P: ChlPseq: advantages and challenges of a maturing technology.
Nat Rev Genet 2009, 10(10):669680. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Deplancke B, Dupuy D, Vidal M, Walhout A: A gatewaycompatible yeast onehybrid system.
Genome Res 2004, 14(10b):20932101. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Deplancke B, Mukhopadhyay A, Ao W, et al.: A genecentered C. elegans proteinDNA interaction network.
Cell 2006, 125:11931205. PubMed Abstract  Publisher Full Text

Ren B, Robert F, Wyrick J, et al.: Genomewide location and function of DNA binding proteins.
Science 2000, 290:23062309. PubMed Abstract  Publisher Full Text

Reimand J, Vaquerizas J, Todd A, Vilo J, Luscombe N: Comprehensive reanalysis of transcription factor knockout expression data in Saccharomyces cerevisiae reveals many new targets.
Nucleic Acids Res 2010, 38:47684777. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hoth S, Morgante M, Sanchez J, et al.: Genomewide gene expression profiling in Arabidopsis thaliana reveals new targets of abscisic acid and largely impaired gene regulation in the abil1 mutant.

Tong A, Boone C: Synthetic genetic array analysis in Saccharomyces cerevisiae.

Harbison C , B G, Lee T, et al.: Transcriptional regulatory code of a eukaryotic genome.
Nature 2004, 431:99104. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hu Z, Killion P, Iyer V: Genetic reconstruction of a functional transcriptional regulatory network.
Nat Genet 2007, 39:683687. PubMed Abstract  Publisher Full Text

BarJoseph Z, Gerber G, Lee T, et al.: Computational discovery of gene modules and regulatory networks.
Nat Biotechnol 2003, 21:13371342. PubMed Abstract  Publisher Full Text

Balaji S, Babu M, Iyer M, Luscombe M, Aravind L: Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast.

Awad S, Panchy N, Ng S, Chen J: Inferring the regulatory interaction types of transcription factors in transcriptional regulatory networks.
J Bioinfo Comp Bio 2012, 10(5):1250012. Publisher Full Text

Duda R, Hart P, Stork D: Pattern Classification. 2nd edition. John Wiley and Sonss; 2001.

Thorne T, Stumpf M: Inference of temporally varying Bayesian Networks.
Bioinformatics 2012, 28(24):32983305. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ihmels J, Friedlander G, et al.: Revealing modular organization in the yeast transcriptional network.
Nat Genet 2002, 31:370377. PubMed Abstract  Publisher Full Text

Gygi S, Rochon Y, Franza B, Aebersold R: Correlation between protein and mRNA abundance in yeast.
Molecular and cellular biology 1999, 19(3):17201730. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Agrawal R, Srikant R: Fast algorithms for mining association rules.

Costanzo M, Baryshnikova A, Bellay J, et al.: The Genetic Landscape of a Cell.
Science 2010, 327:425431. PubMed Abstract  Publisher Full Text

Lee T, Rinaldi N, Robert F, et al.: Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Science 2002, 298:799804. PubMed Abstract  Publisher Full Text

Erb I, Nimwegen E: Statistical features of yeast's transcriptional regulatory code.

Maclsaac K, Wang T, Gordon B, Gifford D, Stormo G, Fraenkel E: An improved map of conserved regulatory sites for Saccharomyces cerevisiae.
BMC Bioinformatics 2006, 7:113. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genomewide expression patterns.
P Natl Acad Sci USA 1998, 95:1486314868. Publisher Full Text

Maere S, Heymans K, Kuiper M: BiNGO a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks.
Bioninformatics 2005, 21:34483449. Publisher Full Text

Spellman P, Sherlock G, Zhang M, et al.: Comprehensive Identification of Cell Cycleregulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization.
Mol Biol Cell 1998, 9:32733297. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Joung J, Hwang K, et al.: Discovery of microRNAmRNA modules via populationbased probabilistic learning.
Bioinformatics 2007, 23:11411147. PubMed Abstract  Publisher Full Text