<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1752-0509-6-101</ui>
	<ji>1752-0509</ji>
	<fm>
		<dochead>Methodology article</dochead>
		<bibl>
			<title>
				<p>Integrating external biological knowledge in the construction of regulatory networks from time-series expression data</p>
			</title>
			<aug>
				<au id="A1"><snm>Lo</snm><fnm>Kenneth</fnm><insr iid="I1"/><email>kenchlo2@gmail.com</email></au>
				<au id="A2"><snm>Raftery</snm><mi>E</mi><fnm>Adrian</fnm><insr iid="I2"/><email>raftery@u.washington.edu</email></au>
				<au id="A3"><snm>Dombek</snm><mi>M</mi><fnm>Kenneth</fnm><insr iid="I3"/><email>kmd@u.washington.edu</email></au>
				<au id="A4"><snm>Zhu</snm><fnm>Jun</fnm><insr iid="I4"/><email>jun.zhu@mssm.edu</email></au>
				<au id="A5"><snm>Schadt</snm><mi>E</mi><fnm>Eric</fnm><insr iid="I4"/><email>eric.schadt@gmail.com</email></au>
				<au id="A6"><snm>Bumgarner</snm><mi>E</mi><fnm>Roger</fnm><insr iid="I1"/><email>rogerb@u.washington.edu</email></au>
				<au id="A7" ca="yes"><snm>Yeung</snm><mnm>Yee</mnm><fnm>Ka</fnm><insr iid="I1"/><email>kayee@u.washington.edu</email></au>
			</aug>
			<insg>
				<ins id="I1"><p>Department of Microbiology, University of Washington, Box 358070, Seattle, WA, 98195, USA</p></ins>
				<ins id="I2"><p>Department of Statistics, University of Washington, Box 354320, Seattle, WA, 98195, USA</p></ins>
				<ins id="I3"><p>Department of Biochemistry, University of Washington, Box 357350, Seattle, WA, 98195, USA</p></ins>
				<ins id="I4"><p>Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA</p></ins>
			</insg>
			<source>BMC Systems Biology</source>
			<section><title><p>Methods, software and technology</p></title></section><issn>1752-0509</issn>
			<pubdate>2012</pubdate>
			<volume>6</volume>
			<issue>1</issue>
			<fpage>101</fpage>
			<url>http://www.biomedcentral.com/1752-0509/6/101</url>
			<xrefbib><pubidlist><pubid idtype="doi">10.1186/1752-0509-6-101</pubid><pubid idtype="pmpid">22898396</pubid></pubidlist></xrefbib>
		</bibl>
		<history><rec><date><day>25</day><month>1</month><year>2012</year></date></rec><acc><date><day>24</day><month>7</month><year>2012</year></date></acc><pub><date><day>16</day><month>8</month><year>2012</year></date></pub></history>
		<cpyrt><year>2012</year><collab>Lo et al.; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
		<kwdg>
			<kwd>Systems biology</kwd>
			<kwd>Network inference</kwd>
			<kwd>Data integration</kwd>
			<kwd>Statistics</kwd>
			<kwd>Time-series expression data</kwd>
			<kwd>Model uncertainty</kwd>
		</kwdg>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st><p>Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st><p>We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st><p>We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st><p>With recent advances in high-throughput biological data collection, reverse engineering of regulatory networks from large-scale genomics data has become a problem of broad interest to biologists. The construction of regulatory networks is essential for defining the interactions between genes and gene products, and predictive models may be used to develop novel therapies <abbrgrp>
					<abbr bid="B1">1</abbr>
					<abbr bid="B2">2</abbr>
				</abbrgrp>. Both microarrays and more recently next generation sequencing provide the ability to quantify the expression levels of all genes in a given genome. Often, in such experiments, gene expression is measured in response to drug treatment, environmental perturbations, or gene knockouts, either at steady state or over a series of time points. This type of data captures information about the effect of one gene&#8217;s expression level on the expression level of another gene. Hence, such data can, in principle, be reverse engineered to provide a regulatory network that models these effects.</p><p>A regulatory network can be represented as a directed graph, in which each node represents a gene (in our case an mRNA level) and each directed edge (<it>r&#8594;g</it>) represents the relationship between regulator <it>r</it> and gene <it>g</it>. We aim to infer the directed edges that describe the relationships among the nodes. In this case, the causal relationship is statistically inferred, in contrast to the classic definition of causality used in biology to imply direct physical interaction leading to a phenotypic change. This is a challenging problem, especially on a genome-wide scale, since the goal is to unravel a small number of regulators (parent nodes) out of thousands of candidate nodes in the graph. Even with high-dimensional gene expression data, network inference is difficult, in part because of the small number of observations for each gene. In order to improve network inference, one would like a coherent approach to integrate external knowledge and data to both fill in gaps in the gene expression data and to constrain or guide the network search.</p><p>In this article, we present a network inference method that addresses the dimensionality challenge with a Bayesian variable selection method. Our method uses a supervised learning framework to incorporate external data sources. We applied our method to a set of time-series mRNA expression profiles for 95 yeast segregants and their parental strains, over six time points in response to a drug perturbation. This extends our previous work <abbrgrp>
					<abbr bid="B3">3</abbr>
				</abbrgrp> by incorporating prior probabilities of transcriptional regulation inferred using external data sources. Our method also accommodates feedback loops, a feature allowed only in some current network construction methods.</p>
			<sec>
				<st>
					<p>Previous work</p>
				</st><p>Bayesian networks <abbrgrp>
						<abbr bid="B4">4</abbr>
						<abbr bid="B5">5</abbr>
						<abbr bid="B6">6</abbr>
					</abbrgrp> are one of the most popular modeling approaches for network construction using gene expression data <abbrgrp>
						<abbr bid="B7">7</abbr>
						<abbr bid="B8">8</abbr>
						<abbr bid="B9">9</abbr>
						<abbr bid="B10">10</abbr>
						<abbr bid="B11">11</abbr>
						<abbr bid="B12">12</abbr>
						<abbr bid="B13">13</abbr>
						<abbr bid="B14">14</abbr>
						<abbr bid="B15">15</abbr>
						<abbr bid="B16">16</abbr>
						<abbr bid="B17">17</abbr>
					</abbrgrp>. A Bayesian network is a probabilistic graphical model for which the joint distribution of all the nodes is factorized into independent conditional distributions of each node given its parents. The goal of Bayesian network inference is to arrive at a directed graph such that the joint probability distribution is optimized globally. While different Bayesian network structures may give rise to the same probability distribution, so that such networks in general do not imply causal relationships, prior information can be used to break this nonidentifiability so that causal inferences can be made. For example, systematic sources of perturbation such as naturally occurring genetic variation in a population or specific drug perturbations in which response is observed over time can lead to reliable causal inference <abbrgrp>
						<abbr bid="B1">1</abbr>
						<abbr bid="B2">2</abbr>
						<abbr bid="B18">18</abbr>
						<abbr bid="B19">19</abbr>
					</abbrgrp>. A Bayesian network is a directed acyclic graph (DAG). Therefore, cyclic components or feedback loops cannot be accommodated. This DAG constraint is an obstacle to using the Bayesian network approach for modeling gene regulatory networks because feedback loops are typical in many biological systems <abbrgrp>
						<abbr bid="B20">20</abbr>
					</abbrgrp>. The DAG constraint is removed when dynamic Bayesian networks are used to model time-series expression data <abbrgrp>
						<abbr bid="B19">19</abbr>
						<abbr bid="B21">21</abbr>
						<abbr bid="B22">22</abbr>
						<abbr bid="B23">23</abbr>
						<abbr bid="B24">24</abbr>
					</abbrgrp>. Dynamic Bayesian networks represent genes at successive time points as separate nodes, thus allowing for the existence of cycles. Bayesian network construction is an NP-hard problem <abbrgrp>
						<abbr bid="B25">25</abbr>
						<abbr bid="B26">26</abbr>
					</abbrgrp>, with computational complexity increasing exponentially with the number of nodes considered in the network construction process. In spite of some attempts to reduce the computational cost <abbrgrp>
						<abbr bid="B27">27</abbr>
					</abbrgrp>, the Bayesian network approach in general is computationally intensive to implement, especially for network inference on a genome-wide scale.</p><p>In regression-based methods, network construction is recast as a series of variable selection problems to infer regulators for each gene. The greatest challenge is the fact that there are usually far more candidate regulators than observations for each gene. Some authors have used singular value decompositions to regularize the regression models <abbrgrp>
						<abbr bid="B28">28</abbr>
						<abbr bid="B29">29</abbr>
						<abbr bid="B30">30</abbr>
					</abbrgrp>. Others have built a regression tree for each target gene, using a compact set of regulators at each node <abbrgrp>
						<abbr bid="B31">31</abbr>
						<abbr bid="B32">32</abbr>
						<abbr bid="B33">33</abbr>
						<abbr bid="B34">34</abbr>
					</abbrgrp>. Huang et al. <abbrgrp>
						<abbr bid="B35">35</abbr>
					</abbrgrp> used regression with forward selection after pre-filtering of candidates deemed irrelevant to the target gene, and Imoto et al. <abbrgrp>
						<abbr bid="B16">16</abbr>
					</abbrgrp> used nonparametric regression embedded within a Bayesian network. <it>L</it>1-norm regularization, including the elastic net <abbrgrp>
						<abbr bid="B36">36</abbr>
						<abbr bid="B37">37</abbr>
					</abbrgrp> and weighted LASSO <abbrgrp>
						<abbr bid="B38">38</abbr>
					</abbrgrp>, has also been widely used <abbrgrp>
						<abbr bid="B39">39</abbr>
						<abbr bid="B40">40</abbr>
						<abbr bid="B41">41</abbr>
						<abbr bid="B42">42</abbr>
						<abbr bid="B43">43</abbr>
						<abbr bid="B44">44</abbr>
						<abbr bid="B45">45</abbr>
						<abbr bid="B46">46</abbr>
						<abbr bid="B47">47</abbr>
						<abbr bid="B48">48</abbr>
						<abbr bid="B49">49</abbr>
					</abbrgrp>.</p><p>Ordinary differential equations (ODE) provide another class of network construction strategies <abbrgrp>
						<abbr bid="B50">50</abbr>
						<abbr bid="B51">51</abbr>
						<abbr bid="B52">52</abbr>
						<abbr bid="B53">53</abbr>
					</abbrgrp>. Using first-order ODEs, the rate of change in transcription for a target gene is described as a function of the expression of its regulators and the effects caused by applied perturbations. ODE-based methods can be broadly classified into two categories, depending on whether the gene expressions are measured at steady state <abbrgrp>
						<abbr bid="B54">54</abbr>
						<abbr bid="B55">55</abbr>
						<abbr bid="B56">56</abbr>
						<abbr bid="B57">57</abbr>
						<abbr bid="B58">58</abbr>
					</abbrgrp> or over time <abbrgrp>
						<abbr bid="B51">51</abbr>
						<abbr bid="B52">52</abbr>
						<abbr bid="B53">53</abbr>
					</abbrgrp>. As an example, the TSNI (Time Series Network Identification) algorithm used ODEs to model time series expression data subject to an external perturbation <abbrgrp>
						<abbr bid="B53">53</abbr>
					</abbrgrp>. To handle the dimensionality challenge (i.e. the number of observations per gene is much smaller than the number of genes), Bansal et al. employed a cubic smoothing spline to interpolate additional data points, and applied Principal Component Analysis to reduce dimensionality.</p><p>To help mitigate problems with using gene expression data in network inference, external data sources can be integrated into the inference process. Public data repositories provide a rich resource of biological knowledge relevant to transcriptional regulation. Integrating such external data sources into network inference has become an important problem in systems biology. James et al. <abbrgrp>
						<abbr bid="B43">43</abbr>
					</abbrgrp> incorporated documented experimental evidence about the presence of a binding site for each known transcription factor (TF) in the promoter region of its target gene in <it>Escherichia coli</it>. Djebbari and Quackenbush <abbrgrp>
						<abbr bid="B13">13</abbr>
					</abbrgrp> used preliminary networks derived from literature indexed in PubMed and protein-protein interaction (PPI) databases as seeds for their Bayesian network analysis. Zhu et al. <abbrgrp>
						<abbr bid="B59">59</abbr>
					</abbrgrp> showed that combining information from TF binding sites and PPI data increased overall predictive power. Geier et al. <abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp> examined the impact of external knowledge with different levels of accuracy on network inference, albeit on a simulated setting. Imoto et al. <abbrgrp>
						<abbr bid="B16">16</abbr>
					</abbrgrp> described different ways to specify knowledge about PPI, documented regulatory relationships and well-studied pathways as prior information. Lee et al. <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp> presented a systematic way to include various types of biological knowledge, including the gene ontology (GO) database, ChIP-chip binding experiments and a compressive collection of information about sequence polymorphisms.</p>
			</sec>
			<sec>
				<st>
					<p>Our contributions</p>
				</st><p>This article is an extension of Yeung et al. <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp> which adopted a regression-based framework in which candidate regulators are inferred for each gene using expression data at the previous time point. Iterative Bayesian model averaging (iBMA) <abbrgrp>
						<abbr bid="B60">60</abbr>
						<abbr bid="B61">61</abbr>
						<abbr bid="B62">62</abbr>
					</abbrgrp> was used to account for model uncertainty in the regression models. A supervised framework was used to estimate the relative contribution of each type of external knowledge and from this a shortlist of promising regulators for each gene was predicted. This shortlist was used to infer regulators for each gene in the regression framework.</p><p>Our contributions are four-fold. First, we develop a new method called iBMA-prior that explicitly incorporates external biological knowledge into iBMA in the form of a prior distribution. Intuitively, we consider models consisting of candidate regulators supported by considerable external evidence to be frontrunners. A model that contains many candidate regulators with little support from external knowledge is penalized. Second, we demonstrate the merits of specifying the expected number of regulators per gene as priors through iBMA-size, which is a simplified version of iBMA-prior without using gene-specific external knowledge. Third, we refine the supervised framework to adjust for sampling bias towards positive cases in the training data, thereby calibrating the prior distribution. Fourth, we expand our benchmark to include simulated data, and compare our iBMA methods to L1-regularized regression-based methods. Specifically, we applied iBMA-prior to real and simulated time-series gene expression data, and found that it out-performed our previous work <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp> and other leading methods in the literature on these data, producing more compact and accurate networks. Figure <figr fid="F1">1</figr> summarizes iBMA-prior and our main contributions.</p>
				<fig id="F1"><title><p>Figure 1</p></title><caption><p>Overview of iBMA-prior with a highlight of our main contributions</p></caption><text>
   <p>
      <b>Overview of iBMA-prior with a highlight of our main contributions.</b>
   </p>
</text><graphic file="1752-0509-6-101-1"/></fig>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st><p>We applied our method, iBMA-prior, to a time-series data set of gene expression levels for 95 genotyped haploid yeast segregants perturbed with the macrolide drug rapamycin over 6 time points <abbrgrp>
					<abbr bid="B3">3</abbr>
				</abbrgrp>. These data are described in detail in the <it>Methods</it> section. To evaluate the performance of iBMA-prior, other published regression-based network construction methods were applied to the same time-series gene expression data set and the resulting networks were assessed for the recovery of documented regulatory relationships that were not used in the network construction process. We also checked whether each method recovered target genes enriched in upstream regions containing the binding sites of known TFs. We further carried out a simulation study to assess our method.</p>
			<sec>
				<st>
					<p>Comparison of different methods</p>
				</st><p>First, we assessed the improvement of iBMA-prior over that of our previous work iBMA-shortlist from Yeung et al. <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp> (see Methods for details) when applied to the same yeast time-series gene expression data. Then, we compared our BMA-based methods to several L1-regularized methods, including the least absolute shrinkage and selection operator (LASSO) <abbrgrp>
						<abbr bid="B36">36</abbr>
						<abbr bid="B63">63</abbr>
					</abbrgrp> and least angle regression (LAR) <abbrgrp>
						<abbr bid="B64">64</abbr>
					</abbrgrp>. Regularized regression methods combine shrinkage and variable selection. L1-regularized methods aim to minimize the sum of squared errors with a bound on the sum of the absolute values of the coefficients <abbrgrp>
						<abbr bid="B65">65</abbr>
					</abbrgrp>. Efficient implementations are available for some of these methods, including LASSO and LAR, and these methods have been applied to high-dimensional data in which there are more variables than observations <abbrgrp>
						<abbr bid="B64">64</abbr>
						<abbr bid="B66">66</abbr>
						<abbr bid="B67">67</abbr>
					</abbrgrp>.</p><p>We also compared the performance of our method with and without using external biological knowledge. We assessed hybrid methods by combining LASSO and LAR with the same supervised learning stage that was used in iBMA-prior and iBMA-shortlist. Table <tblr tid="T1">1</tblr> lists all the methods compared in this analysis.</p>
				<table id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>
							<b>Different regression-based methods applied to the time-series gene expression data to construct gene regulatory networks</b>
						</p>
					</caption>
					<tgroup align="left" cols="3">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="center" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="center" colname="c3" colnum="3" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Method</b>
									</p>
								</entry>
								<entry colname="c2">
									<p>
										<b>Data used</b>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<b>Description</b>
									</p>
								</entry>
							</row>
						</thead>
						<tbody valign="top">
							<row>
								<entry colname="c1">
									<p>iBMA-prior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>Our proposed methodology that incorporates prior model probabilities in BMA. These prior probabilities were computed using external data sources.</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>Iterative BMA that uses external knowledge to shortlist <it>p</it>&#8201;=&#8201;100 candidates for each target gene. The revised supervised step was used. Unlike iBMA-prior, the information from the external data is not used in variable selection via BMA.</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Network A from Yeung et al. <abbrgrp>
											<abbr bid="B3">3</abbr>
										</abbrgrp>
									</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>This method is the same as in iBMA-shortlist, but using the old version of supervised step described in Yeung et al. <abbrgrp>
											<abbr bid="B3">3</abbr>
										</abbrgrp>. We aim to study the impact of the revised supervised step by comparing iBMA-shortlist to network A.</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LASSO-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>LASSO <abbrgrp>
											<abbr bid="B36">36</abbr>
											<abbr bid="B63">63</abbr>
										</abbrgrp> with the use of external knowledge to shortlist <it>p</it>&#8201;=&#8201;100 candidates for each target gene.</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LAR-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>LAR <abbrgrp>
											<abbr bid="B64">64</abbr>
										</abbrgrp> with the use of external knowledge to shortlist <it>p</it>&#8201;=&#8201;100 candidates for each target gene.</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-size</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>A simplified version of iBMA-prior that disregards external knowledge, except for setting <it>&#960;</it>
										<sub>
											<it>gr</it>
										</sub>&#8201;=&#8201;<it>&#964;</it>&#8201;=&#8201;2.76/6000&#8201;=&#8201;0.00046 for all <it>g</it> and <it>r</it>. This essentially turns Eq. (5) into a function of model size only.</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>Iterative BMA without any use of external knowledge.</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LASSO-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>LASSO without any use of external knowledge.</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>LAR-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>LAR without any use of external knowledge.</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table>
			</sec>
			<sec>
				<st>
					<p>Assessment: recovery of documented relationships</p>
				</st><p>To evaluate the accuracy of the network constructed by each method, we assessed its concordance with the Yeastract database, a curated repository of regulatory relationships between known TFs and target genes in the <it>Saccharomyces cerevisiae</it> literature <abbrgrp>
						<abbr bid="B68">68</abbr>
					</abbrgrp>. If a regulatory relationship documented in Yeastract was also inferred in the network, we concluded that this relationship was recovered by direct evidence. Some of the positive examples used in the supervised learning stage are also documented in Yeastract. To avoid bias, we did not consider those regulatory relationships in the assessment. For each method compared, we applied Pearson&#8217;s chi-square test to a 2 &#215; 2 contingency table that quantified the concordance of the inferred network with the Yeastract database. We also computed the true positive rate (TPR), defined as the proportion of the inferred positive relationships that are documented in Yeastract. It should be noted that Yeastract cannot document all &#8220;true&#8221; relationships as the entire set of regulatory relationships in yeast has yet to be defined. We further considered the ratio of the observed number of recovered relationships to its expected count as a result of random assortment (O/E). More detailed definitions of the assessment criteria can be found in Additional file <supplr sid="S1">1</supplr>: Figure S1.</p>
				<suppl id="S1">
					<title>
						<p>Additional file 1</p>
					</title>
					<text>
						<p>
							<b>Supplementary figures.</b>
						</p>
					</text>
					<file name="1752-0509-6-101-S1.pdf">
   <p>Click here for file</p>
</file>
				</suppl><p>Table <tblr tid="T2">2</tblr> summarizes the assessment results for the nine methods compared. Additional details are presented in Additional file <supplr sid="S2">2</supplr>: Table S1. First, we studied the impact of integrating external knowledge into the network construction process under the iBMA framework. The TPR of iBMA-prior was 18.00%, and the number of recovered positive relationships was 593, which is 4.11 times more than the expected number by random chance. Using the revised supervised step described in this work without incorporating prior probabilities into the iBMA framework, iBMA-shortlist yielded a TPR of 12.78% and O/E ratio of 2.92. This is an improvement over network A (TPR&#8201;=&#8201;9.98% and O/E&#8201;=&#8201;2.28) constructed using the same algorithm and our previous version of the supervised framework as described in Yeung et al. <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp>. All of our methods that incorporate external knowledge (iBMA-prior, iBMA-shortlist and network A) produced higher TPRs than iBMA-noprior for which only the time-series gene expression data were used. In particular, iBMA-prior produced a TPR (18.00%), which represents a two-fold increase over iBMA-noprior (8.9%). Therefore, the integration of external data clearly improved the recovery of known relationships, and our latest method, iBMA-prior, performed the best.</p>
				<suppl id="S2">
					<title>
						<p>Additional file 2</p>
					</title>
					<text>
						<p>
							<b>Supplementary tables.</b>
						</p>
					</text>
					<file name="1752-0509-6-101-S2.pdf">
   <p>Click here for file</p>
</file>
				</suppl>
				<table id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>
							<b>Summary of the assessment result for different network construction methods on the time-series gene expression data</b>
						</p>
					</caption>
					<tgroup align="left" cols="8">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="center" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="center" colname="c3" colnum="3" colwidth="1*"/>
						<colspec align="center" colname="c4" colnum="4" colwidth="1*"/>
						<colspec align="center" colname="c5" colnum="5" colwidth="1*"/>
						<colspec align="center" colname="c6" colnum="6" colwidth="1*"/>
						<colspec align="center" colname="c7" colnum="7" colwidth="1*"/>
						<colspec align="center" colname="c8" colnum="8" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Method</b>
									</p>
								</entry>
								<entry colname="c2">
									<p>
										<b>Data used</b>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<b>Network size</b>
									</p>
								</entry>
								<entry colname="c4">
									<p>
										<b>
											<it>p</it>
										</b><b>-value of chi sq test</b>
										<sup>
											<b>
												<it>a</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c5">
									<p>
										<b>TPR (%)</b>
										<sup>
											<b>
												<it>b</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c6">
									<p>
										<b># mis-class.</b>
										<sup>
											<b>
												<it>c</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c7">
									<p>
										<b>TP</b>
									</p>
								</entry>
								<entry colname="c8">
									<p>
										<b>O/E</b>
										<sup>
											<b>
												<it>d</it>
											</b>
										</sup>
									</p>
								</entry>
							</row>
						</thead>
						<tfoot>
							<p>
								<sup>
									<it>a</it>
								</sup> The <it>p</it>-value of Pearson&#8217;s chi-square test measures the strength of association between an inferred network and the Yeastract database.</p><p>
								<sup>
									<it>b</it>
								</sup> True positive rate (TPR) is defined as the proportion of inferred regulatory relationships that are documented in Yeastract.</p><p>
								<sup>
									<it>c</it>
								</sup> The number of misclassified cases is the sum of false positives and false negatives.</p><p>
								<sup>
									<it>d</it>
								</sup> The O/E ratio is the number of folds the observed number of recovered relationships (i.e., TP) in excess of the expected count of recovery by chance.</p>
						</tfoot>
						<tbody valign="top">
							<row>
								<entry colname="c1">
									<p>iBMA-prior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>21951</p>
								</entry>
								<entry colname="c4">
									<p>&lt;1.00E-320</p>
								</entry>
								<entry colname="c5">
									<p>18.00</p>
								</entry>
								<entry colname="c6">
									<p>19282</p>
								</entry>
								<entry colname="c7">
									<p>593</p>
								</entry>
								<entry colname="c8">
									<p>4.11</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>67440</p>
								</entry>
								<entry colname="c4">
									<p>&lt;1.00E-320</p>
								</entry>
								<entry colname="c5">
									<p>12.78</p>
								</entry>
								<entry colname="c6">
									<p>24673</p>
								</entry>
								<entry colname="c7">
									<p>1287</p>
								</entry>
								<entry colname="c8">
									<p>2.92</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Network A from Yeung et al.</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>65122</p>
								</entry>
								<entry colname="c4">
									<p>1.68E-111</p>
								</entry>
								<entry colname="c5">
									<p>9.98</p>
								</entry>
								<entry colname="c6">
									<p>22485</p>
								</entry>
								<entry colname="c7">
									<p>662</p>
								</entry>
								<entry colname="c8">
									<p>2.28</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LASSO-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>255293</p>
								</entry>
								<entry colname="c4">
									<p>&lt;1.00E-320</p>
								</entry>
								<entry colname="c5">
									<p>11.07</p>
								</entry>
								<entry colname="c6">
									<p>46482</p>
								</entry>
								<entry colname="c7">
									<p>4169</p>
								</entry>
								<entry colname="c8">
									<p>2.53</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LAR-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>242495</p>
								</entry>
								<entry colname="c4">
									<p>&lt;1.00E-320</p>
								</entry>
								<entry colname="c5">
									<p>11.28</p>
								</entry>
								<entry colname="c6">
									<p>44765</p>
								</entry>
								<entry colname="c7">
									<p>4017</p>
								</entry>
								<entry colname="c8">
									<p>2.57</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-size</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>17202</p>
								</entry>
								<entry colname="c4">
									<p>5.75E-56</p>
								</entry>
								<entry colname="c5">
									<p>16.84</p>
								</entry>
								<entry colname="c6">
									<p>17622</p>
								</entry>
								<entry colname="c7">
									<p>114</p>
								</entry>
								<entry colname="c8">
									<p>3.84</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>63026</p>
								</entry>
								<entry colname="c4">
									<p>1.75E-23</p>
								</entry>
								<entry colname="c5">
									<p>8.85</p>
								</entry>
								<entry colname="c6">
									<p>18903</p>
								</entry>
								<entry colname="c7">
									<p>186</p>
								</entry>
								<entry colname="c8">
									<p>2.02</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LASSO-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>564321</p>
								</entry>
								<entry colname="c4">
									<p>2.56E-10</p>
								</entry>
								<entry colname="c5">
									<p>5.20</p>
								</entry>
								<entry colname="c6">
									<p>38399</p>
								</entry>
								<entry colname="c7">
									<p>1231</p>
								</entry>
								<entry colname="c8">
									<p>1.19</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>LAR-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>194687</p>
								</entry>
								<entry colname="c4">
									<p>1.38E-40</p>
								</entry>
								<entry colname="c5">
									<p>7.71</p>
								</entry>
								<entry colname="c6">
									<p>22777</p>
								</entry>
								<entry colname="c7">
									<p>511</p>
								</entry>
								<entry colname="c8">
									<p>1.76</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table><p>Next, we compared our iBMA-based methods to L1-regularized methods. All the approaches that used LASSO and LAR generated networks that had far more mis-classifications than the iBMA-based methods. Specifically, applications of LASSO or LAR without the supervised framework (LASSO-noprior and LAR-noprior) had TPRs of 5.20% and 7.71% respectively, the lowest among all the methods considered. Incorporating external knowledge did improve both LASSO and LAR, increasing the TPRs to about 11% in both LASSO-shortlist and LAR-shortlist. However, these TPRs were still lower than the TPRs for our iBMA-based methods. Our iBMA-based methods therefore outperformed methods based on LASSO and LAR for these data.</p><p>Finally, we investigated the impact of priors in iBMA-size, in which we applied a model size prior to calibrate the sparsity of the inferred networks without using any external data sources. iBMA-size can be considered as a simplified version of iBMA-prior that sets the regulatory potential (the prior probability that a candidate regulates a given gene) to a constant parameter that controls the expected number of regulators per gene. From Table <tblr tid="T2">2</tblr>, iBMA-size produced a TPR of 16.84%, which was higher than all the other methods considered except iBMA-prior. Although the number of recovered positive relationships was lower than that of iBMA-prior (114 &lt;593), iBMA-size also produced a network that was more compact (17,202 edges compared to 21,951 edges). We would recommend iBMA-size when gene-specific external information is not available.</p><p>In Table <tblr tid="T2">2</tblr> and <supplr sid="S2">Additional file 2: Table S1</supplr>, all the iBMA networks were thresholded at a posterior probability of 50% (i.e., edges with posterior probability &lt;50% were removed). We found that iBMA-prior also out-performed other methods for these data over different posterior probability thresholds (see Additional file <supplr sid="S2">2</supplr>: Table S2).</p>
			</sec>
			<sec>
				<st>
					<p>Assessment: transcription factor binding site analysis</p>
				</st><p>In another assessment, we checked whether the set of target genes containing known binding sites for a certain TF were enriched among the child nodes of that TF in each inferred network. We first extracted the known binding sites for 129 TFs documented in the JASPAR database <abbrgrp>
						<abbr bid="B69">69</abbr>
						<abbr bid="B70">70</abbr>
					</abbrgrp>. Using TFMscan <abbrgrp>
						<abbr bid="B71">71</abbr>
					</abbrgrp>, we retrieved a set of genes containing the known binding sites in their upstream regions for each TF. We then checked for enrichment of these genes among the inferred child nodes of the corresponding TFs in each network with Fisher&#8217;s exact test. Table <tblr tid="T3">3</tblr> reports the number of TFs whose inferred child nodes exhibited such enrichment, at a false discovery rate (FDR) of 10%. All of the methods that made use of external information outperformed all of those that did not, illustrating the benefit of incorporating external knowledge. LASSO-shortlist and LAR-shortlist appeared to produce slightly better results than iBMA-prior in this binding site analysis, but it is likely the consequence of their larger network sizes (&gt;2x larger than iBMA prior).</p>
				<table id="T3">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>
							<b>Number of transcription factors with gene sets containing their known binding sites enriched by the different methods in comparison</b>
						</p>
					</caption>
					<tgroup align="left" cols="3">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="center" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="center" colname="c3" colnum="3" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Method</b>
									</p>
								</entry>
								<entry colname="c2">
									<p>
										<b>Data used</b>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<b># TFs with enriched gene sets</b>
										<sup>
											<b>
												<it>a</it>
											</b>
										</sup>
									</p>
								</entry>
							</row>
						</thead>
						<tfoot>
							<p>
								<sup>
									<it>a</it>
								</sup> FDR was controlled at 10%.</p>
						</tfoot>
						<tbody valign="top">
							<row>
								<entry colname="c1">
									<p>iBMA-prior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>38</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>30</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LASSO-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>41</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LAR-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression&#8201;+&#8201;external data</p>
								</entry>
								<entry colname="c3">
									<p>44</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-size</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>4</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>9</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LASSO-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>13</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>LAR-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Gene expression data only</p>
								</entry>
								<entry colname="c3">
									<p>10</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table>
			</sec>
			<sec>
				<st>
					<p>Comparison with Lirnet</p>
				</st><p>Lee et al. <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp> proposed a regression-based network construction method called Lirnet, which performed well on a publicly available gene expression data set from Brem et al. <abbrgrp>
						<abbr bid="B72">72</abbr>
					</abbrgrp>. The Brem data set recorded the steady-state expression levels for 112 yeast segregants, 95 of which were profiled in our time-series experiments under different growth conditions. Lee et al. <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp> showed that Lirnet out-performed Bayesian networks on the same data, and so we compared our top performer, iBMA-prior, with Lirnet. Because Lirnet was formulated to analyze steady-state expression data with no time components, we adapted our method to static data by removing the subscript referring to the time point from Equation (4):</p><p>
					<display-formula id="M1">
						<m:math name="1752-0509-6-101-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>E</m:mi>
   <m:mo>[</m:mo>
   <m:mrow>
      <m:msub>
         <m:mi>X</m:mi>
         <m:mrow>
            <m:mi>g</m:mi>
            <m:mo>,</m:mo>
            <m:mi>s</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo>|</m:mo>
      <m:mi>D</m:mi>
   </m:mrow>
   <m:mo>]</m:mo>
   <m:mo>=</m:mo>
   <m:msub>
      <m:mi>&#946;</m:mi>
      <m:mrow>
         <m:mi>g</m:mi>
         <m:mo>,</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo>+</m:mo>
   <m:mstyle displaystyle="true">
      <m:munder>
         <m:mo>&#8721;</m:mo>
         <m:mrow>
            <m:mi>r</m:mi>
            <m:mo>&#8712;</m:mo>
            <m:msub>
               <m:mi>R</m:mi>
               <m:mi>g</m:mi>
            </m:msub>
         </m:mrow>
      </m:munder>
      <m:mrow>
         <m:msub>
            <m:mi>&#946;</m:mi>
            <m:mrow>
               <m:mi>g</m:mi>
               <m:mo>,</m:mo>
               <m:mi>r</m:mi>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mi>X</m:mi>
            <m:mrow>
               <m:mi>r</m:mi>
               <m:mo>,</m:mo>
               <m:mi>s</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:mstyle>
   <m:mo>,</m:mo>
</m:mrow>
</m:math>
					</display-formula>
				</p><p>We applied iBMA-prior to the same 3152-gene subset of the Brem et al. data that Lee et al. <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp> used. Lirnet constrained the search of regulators for each target gene to 304 known TFs. For fair comparison, we also confined the set of candidate regulators to the same TFs. Networks constructed from steady-state gene expression data cannot have feedback loops <abbrgrp>
						<abbr bid="B73">73</abbr>
						<abbr bid="B74">74</abbr>
						<abbr bid="B75">75</abbr>
					</abbrgrp>. To detect and remove such loops from our inferred network, we identified all strongly connected components using the igraph R package, and deleted the TF-gene link associated with the lowest posterior probability for each cycle.</p><p>Same as before, we evaluated different methods by assessing the concordance of the inferred networks with the Yeastract database using Pearson&#8217;s chi-square test. The assessment results in Table <tblr tid="T4">4</tblr> show that iBMA-prior outperformed Lirnet, almost doubling the TPR and the O/E ratio while producing a comparable number of misclassified regulatory relationships.</p>
				<table id="T4">
					<title>
						<p>Table 4</p>
					</title>
					<caption>
						<p>
							<b>Comparison of iBMA-prior, iBMA-shortlist and Lirnet in network construction on the Brem data</b>
						</p>
					</caption>
					<tgroup align="left" cols="7">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="center" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="center" colname="c3" colnum="3" colwidth="1*"/>
						<colspec align="center" colname="c4" colnum="4" colwidth="1*"/>
						<colspec align="center" colname="c5" colnum="5" colwidth="1*"/>
						<colspec align="center" colname="c6" colnum="6" colwidth="1*"/>
						<colspec align="center" colname="c7" colnum="7" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Method</b>
									</p>
								</entry>
								<entry colname="c2">
									<p>
										<b>Network size</b>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<b>
											<it>p</it>
										</b><b>-value of chi square test</b>
										<sup>
											<b>
												<it>a</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c4">
									<p>
										<b>TPR (%)</b>
										<sup>
											<b>
												<it>b</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c5">
									<p>
										<b># misclass.</b>
										<sup>
											<b>
												<it>c</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c6">
									<p>
										<b>TP</b>
									</p>
								</entry>
								<entry colname="c7">
									<p>
										<b>O/E</b>
										<sup>
											<b>
												<it>d</it>
											</b>
										</sup>
									</p>
								</entry>
							</row>
						</thead>
						<tfoot>
							<p>
								<sup>
									<it>a</it>
								</sup> The <it>p</it>-value of Pearson&#8217;s chi-square test measures the strength of association between an inferred network and the Yeastract database.</p><p>
								<sup>
									<it>b</it>
								</sup> True positive rate (TPR) is defined as the proportion of inferred regulatory relationships that are documented in Yeastract.</p><p>
								<sup>
									<it>c</it>
								</sup> The number of misclassified cases is the sum of false positives and false negatives.</p><p>
								<sup>
									<it>d</it>
								</sup> The O/E ratio is the number of folds the observed number of recovered relationships (i.e., TP) in excess of the expected count of recovery by chance.</p>
						</tfoot>
						<tbody valign="top">
							<row>
								<entry colname="c1">
									<p>iBMA-prior</p>
								</entry>
								<entry colname="c2">
									<p>8000</p>
								</entry>
								<entry colname="c3">
									<p>7.75E-65</p>
								</entry>
								<entry colname="c4">
									<p>15.62</p>
								</entry>
								<entry colname="c5">
									<p>10198</p>
								</entry>
								<entry colname="c6">
									<p>323</p>
								</entry>
								<entry colname="c7">
									<p>2.41</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>35995</p>
								</entry>
								<entry colname="c3">
									<p>1.02E-59</p>
								</entry>
								<entry colname="c4">
									<p>10.99</p>
								</entry>
								<entry colname="c5">
									<p>14581</p>
								</entry>
								<entry colname="c6">
									<p>818</p>
								</entry>
								<entry colname="c7">
									<p>1.70</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>Lirnet</p>
								</entry>
								<entry colname="c2">
									<p>10491</p>
								</entry>
								<entry colname="c3">
									<p>1.90E-03</p>
								</entry>
								<entry colname="c4">
									<p>8.42</p>
								</entry>
								<entry colname="c5">
									<p>10080</p>
								</entry>
								<entry colname="c6">
									<p>132</p>
								</entry>
								<entry colname="c7">
									<p>1.30</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table>
			</sec>
			<sec>
				<st>
					<p>Simulation study</p>
				</st><p>We designed and conducted a series of simulations to further assess our proposed method. We used the fitted model obtained from applying iBMA-prior to the yeast time-series microarray data set as the true underlying network, and generated simulated expression data from the estimated linear regression model. Twenty data sets, each with the same dimensions as the real time-series expression data, were independently generated as follows:</p><p indent="1">1. Set the prior probability of a regulatory relationship for each gene pair to the same value as the regulatory potential obtained at the supervised learning stage using the real external data.</p><p indent="1">2. Set the expression levels of the 3556 genes for the 95 yeast segregants and the two parental strains at time <it>t</it>&#8201;=&#8201;0 as the observed measurements in the real yeast time-series gene expression data.</p><p indent="1">3. For each target gene <it>g</it>, define the set <it>R</it>
					<sub>
						<it>g</it>
					</sub> of true regulators as those with a posterior probability of &#8805;50% in our inferred network using iBMA-prior and the real time-series data.</p><p indent="1">4. For time <it>t</it>&#8201;=&#8201;1 to 5,</p><p>For gene <it>g</it>&#8201;=&#8201;1 to 3556, generate the simulated true expression level for each segregant <it>s</it> using the following equation:</p><p>
					<display-formula id="M2">
						<m:math name="1752-0509-6-101-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msubsup>
      <m:mi>X</m:mi>
      <m:mrow>
         <m:mi>g</m:mi>
         <m:mo>,</m:mo>
         <m:mi>t</m:mi>
         <m:mo>,</m:mo>
         <m:mi>s</m:mi>
      </m:mrow>
      <m:mtext mathvariant="italic">true</m:mtext>
   </m:msubsup>
   <m:mo>=</m:mo>
   <m:msub>
      <m:mi>&#946;</m:mi>
      <m:mrow>
         <m:mi>g</m:mi>
         <m:mo>,</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo>+</m:mo>
   <m:mstyle displaystyle="true">
      <m:munder>
         <m:mo>&#8721;</m:mo>
         <m:mrow>
            <m:mi>r</m:mi>
            <m:mo>&#8712;</m:mo>
            <m:msub>
               <m:mi>R</m:mi>
               <m:mi>g</m:mi>
            </m:msub>
         </m:mrow>
      </m:munder>
      <m:mrow>
         <m:msub>
            <m:mi>&#946;</m:mi>
            <m:mrow>
               <m:mi>g</m:mi>
               <m:mo>,</m:mo>
               <m:mi>r</m:mi>
            </m:mrow>
         </m:msub>
         <m:msubsup>
            <m:mi>X</m:mi>
            <m:mrow>
               <m:mi>r</m:mi>
               <m:mo>,</m:mo>
               <m:mi>t</m:mi>
               <m:mo>&#8722;</m:mo>
               <m:mn>1</m:mn>
               <m:mo>,</m:mo>
               <m:mi>s</m:mi>
            </m:mrow>
            <m:mtext mathvariant="italic">true</m:mtext>
         </m:msubsup>
      </m:mrow>
   </m:mstyle>
   <m:mo>,</m:mo>
</m:mrow>
</m:math>
					</display-formula>
				</p><p>where the <it>&#946;</it>&#8217;s are given by the posterior expectation of the regression coefficients corresponding to the set of true regulators determined in Step 3.</p><p indent="1">5. Generate the simulated observed gene expression levels by adding noise to the true expression levels without measurement errors, i.e.,</p><p>
					<display-formula id="M3">
						<m:math name="1752-0509-6-101-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>X</m:mi>
      <m:mrow>
         <m:mi>g</m:mi>
         <m:mo>,</m:mo>
         <m:mi>t</m:mi>
         <m:mo>,</m:mo>
         <m:mi>s</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo>=</m:mo>
   <m:msubsup>
      <m:mi>X</m:mi>
      <m:mrow>
         <m:mi>g</m:mi>
         <m:mo>,</m:mo>
         <m:mi>t</m:mi>
         <m:mo>,</m:mo>
         <m:mi>s</m:mi>
      </m:mrow>
      <m:mtext mathvariant="italic">true</m:mtext>
   </m:msubsup>
   <m:mo>+</m:mo>
   <m:msub>
      <m:mi>&#1013;</m:mi>
      <m:mrow>
         <m:mi>g</m:mi>
         <m:mo>,</m:mo>
         <m:mi>t</m:mi>
         <m:mo>,</m:mo>
         <m:mi>s</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo>,</m:mo>
</m:mrow>
</m:math>
					</display-formula>
				</p><p>where <it>&#1013;</it>
					<sub>
						<it>g</it>,<it>t</it>,<it>s</it>
					</sub>&#8201;~&#8201;N(0, <it>&#963;</it>
					<sub>
						<it>g</it>
					</sub>
					<sup>2</sup>) with <it>&#963;</it>
					<sub>
						<it>g</it>
					</sub>
					<sup>2</sup> being given by the sample variance of the regression residuals in the real data analysis. Others, e.g. <abbrgrp>
						<abbr bid="B76">76</abbr>
					</abbrgrp>, have shown that the error in log ratios of expression data is reasonably approximately by a normal distribution.</p><p>To assess the accuracy of networks inferred with the simulated data sets, we compared each of these networks to the true network created in Step 3 of the data generation algorithm. We used the same assessment criteria as in the real data analysis with the true network replacing Yeastract as the reference. As shown in Table <tblr tid="T5">5</tblr>, iBMA-prior out-performed the other iBMA-based methods, yielding a TPR of 71.13% averaged over 20 replications (compared to 47.23% for iBMA-shortlist, 20.31% for iBMA-size, and 8.55% for iBMA-noprior).</p>
				<table id="T5">
					<title>
						<p>Table 5</p>
					</title>
					<caption>
						<p>
							<b>Assessment result for the different methods applied to data sets generated in the stimulation study</b>
						</p>
					</caption>
					<tgroup align="left" cols="7">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="center" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="center" colname="c3" colnum="3" colwidth="1*"/>
						<colspec align="center" colname="c4" colnum="4" colwidth="1*"/>
						<colspec align="center" colname="c5" colnum="5" colwidth="1*"/>
						<colspec align="center" colname="c6" colnum="6" colwidth="1*"/>
						<colspec align="center" colname="c7" colnum="7" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Method</b>
									</p>
								</entry>
								<entry colname="c2">
									<p>
										<b>Data used</b>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<b>Network size</b>
									</p>
								</entry>
								<entry colname="c4">
									<p>
										<b>
											<it>p</it>
										</b><b>-value of chi sq test</b>
										<sup>
											<b>
												<it>a</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c5">
									<p>
										<b>TPR (%)</b>
										<sup>
											<b>
												<it>b</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c6">
									<p>
										<b># mis-class.</b>
										<sup>
											<b>
												<it>c</it>
											</b>
										</sup>
									</p>
								</entry>
								<entry colname="c7">
									<p>
										<b>TP</b>
									</p>
								</entry>
							</row>
						</thead>
						<tfoot>
							<p>
								<sup>
									<it>a</it>
								</sup> The <it>p</it>-value of Pearson&#8217;s chi-square test measures the strength of association between an inferred network and the true network for the simulation study.</p><p>
								<sup>
									<it>b</it>
								</sup> True positive rate (TPR) is defined as the proportion of correctly inferred regulatory relationships.</p><p>
								<sup>
									<it>c</it>
								</sup> The number of misclassified cases is the sum of false positives and false negatives.</p><p>Remark: The values reported in the table were averaged across the 20 replications. The true network for the simulation study contained a total of 21951 edges.</p>
						</tfoot>
						<tbody valign="top">
							<row>
								<entry colname="c1">
									<p>iBMA-prior</p>
								</entry>
								<entry colname="c2">
									<p>Generated data&#8201;+&#8201;prior probability matrix</p>
								</entry>
								<entry colname="c3">
									<p>14011</p>
								</entry>
								<entry colname="c4">
									<p>&lt;1.00E-320</p>
								</entry>
								<entry colname="c5">
									<p>71.13</p>
								</entry>
								<entry colname="c6">
									<p>16029</p>
								</entry>
								<entry colname="c7">
									<p>9966</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-shortlist</p>
								</entry>
								<entry colname="c2">
									<p>Generated data&#8201;+&#8201;prior probability matrix</p>
								</entry>
								<entry colname="c3">
									<p>30753</p>
								</entry>
								<entry colname="c4">
									<p>&lt;1.00E-320</p>
								</entry>
								<entry colname="c5">
									<p>47.23</p>
								</entry>
								<entry colname="c6">
									<p>23652</p>
								</entry>
								<entry colname="c7">
									<p>14526</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>iBMA-size</p>
								</entry>
								<entry colname="c2">
									<p>Generated data only</p>
								</entry>
								<entry colname="c3">
									<p>9349</p>
								</entry>
								<entry colname="c4">
									<p>&lt;1.00E-320</p>
								</entry>
								<entry colname="c5">
									<p>20.31</p>
								</entry>
								<entry colname="c6">
									<p>27503</p>
								</entry>
								<entry colname="c7">
									<p>1899</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>iBMA-noprior</p>
								</entry>
								<entry colname="c2">
									<p>Generated data only</p>
								</entry>
								<entry colname="c3">
									<p>29393</p>
								</entry>
								<entry colname="c4">
									<p>&lt;1.00E-320</p>
								</entry>
								<entry colname="c5">
									<p>8.55</p>
								</entry>
								<entry colname="c6">
									<p>46317</p>
								</entry>
								<entry colname="c7">
									<p>2513</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st><p>In this article, we have proposed a methodology that systematically integrates external biological knowledge into BMA for network construction. A key feature of our approach is a formal mechanism to account for model uncertainty. For each target gene, we arrive at a compact set of promising models from which to draw inference, the weights of which are calibrated by the external biological knowledge. Our method infers sparse, compact and accurate networks upon the input of a reasonable estimate of network density from both real and simulated data. It does not put a hard limit on the number of regulators per target gene, unlike some other methods, such as Bayesian network approaches that impose this constraint to reduce the computational burden. While known TFs are in general favored <it>a priori</it> with the available external biological knowledge, we do not confine the search for regulators to them. This allows for the discovery of new regulatory relationships.</p><p>We showed that our method, iBMA-prior, consistently outperformed our previous method <abbrgrp>
					<abbr bid="B3">3</abbr>
				</abbrgrp> using both real and simulated time-series gene expression data. We showed that this improvement is mostly due to the incorporation of external data sources via prior probabilities (iBMA-prior versus iBMA-shortlist in Table <tblr tid="T2">2</tblr>). We also improved upon our previous supervised method by adjusting for the sampling bias of positive and negative training samples (iBMA-shortlist versus network A in Table <tblr tid="T2">2</tblr>). We further showed that our iBMA-based methods (iBMA-prior and iBMA-shortlist) recovered a higher percentage of known regulatory relationships (i.e. higher TPRs) than other popular variable selection methods (LASSO and LAR).</p><p>A key contribution of this work is the derivation of more compact networks with higher TPRs. Unfortunately, due to incomplete knowledge, the evaluation of false positives and false negatives is difficult using real data. Therefore, we supplemented our study with a simulation study designed to mimic the real data, and showed that iBMA-prior produced fewer misclassified cases (i.e. the sum of false positives and false negatives) than other iBMA-based methods.</p><p>There are many directions for future work. A time-lag regression model, i.e., one that accounts for the current expression level of a target gene with the past expression levels of its regulators, is used in our methodology. This model formulation is in line with many other regression-based methods targeting time-series gene expression data <abbrgrp>
					<abbr bid="B3">3</abbr>
					<abbr bid="B28">28</abbr>
					<abbr bid="B35">35</abbr>
					<abbr bid="B48">48</abbr>
					<abbr bid="B49">49</abbr>
				</abbrgrp>. The expression levels were taken at regular time intervals in our yeast time-series gene expression data set. If the levels were measured at non-uniform time intervals, we could create interpolated time-series data with interpolation strategies employed in the literature <abbrgrp>
					<abbr bid="B51">51</abbr>
					<abbr bid="B53">53</abbr>
				</abbrgrp>. It would be useful to apply our methodology to network construction in prokaryotic systems as we would expect better performance in these less complex systems that tend to be more dominated by transcriptional control <abbrgrp>
					<abbr bid="B77">77</abbr>
				</abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Time-series gene expression data for yeast segregants</p>
				</st><p>We applied our method to a set of time-series mRNA expression data measuring the gene expression levels of 95 genotyped haploid yeast segregants perturbed with the macrolide drug rapamycin <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp>. These segregants, along with their genetically diverse parents, BY4716 (BY) and RM11-1a (RM), have been genotyped previously <abbrgrp>
						<abbr bid="B72">72</abbr>
					</abbrgrp>. Rapamycin was chosen for perturbation because it was expected to induce widespread changes in global transcription, based on a screen of the public microarray data repositories <abbrgrp>
						<abbr bid="B78">78</abbr>
						<abbr bid="B79">79</abbr>
						<abbr bid="B80">80</abbr>
					</abbrgrp>. This perturbation allowed for the capture of a large subset of all regulatory interactions encoded by the yeast genome. Each yeast culture was sampled at 10-minute intervals for 50&#8201;minutes after rapamycin addition. The RNA purified from these samples was profiled with Affymetrix Yeast 2.0 microarrays. Probe signals were summarized into gene expression levels using the Robust Multi-array Average (RMA) method <abbrgrp>
						<abbr bid="B81">81</abbr>
					</abbrgrp> and genes not exhibiting significant changes in expression were filtered from the data as described in <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp>. The data subset that remained consisted of the time-dependent mRNA expression profiles of 3556 genes. The complete time series gene expression data are publicly available at ArrayExpress (<url>http://www.ebi.ac.uk/arrayexpress/</url>) with accession number E-MTAB-412.</p>
			</sec>
			<sec>
				<st>
					<p>Bayesian model averaging (BMA)</p>
				</st><p>BMA is a variable selection approach that takes model uncertainty into account by averaging over the posterior distribution of a quantity of interest based on multiple models, weighted by their posterior model probabilities <abbrgrp>
						<abbr bid="B82">82</abbr>
						<abbr bid="B83">83</abbr>
					</abbrgrp>. In BMA, the posterior distribution of a quantity of interest &#920; given the data <it>D</it> is given by <inline-formula>
						<m:math name="1752-0509-6-101-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtext mathvariant="normal">Pr</m:mtext>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mi>&#920;</m:mi>
         <m:mo stretchy="true">|</m:mo>
         <m:mi>D</m:mi>
      </m:mrow>
   </m:mfenced>
   <m:mo>=</m:mo>
   <m:mstyle displaystyle="true">
      <m:munderover>
         <m:mo>&#8721;</m:mo>
         <m:mrow>
            <m:mi>k</m:mi>
            <m:mo>=</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
         <m:mi>K</m:mi>
      </m:munderover>
      <m:mrow>
         <m:mtext mathvariant="normal">Pr</m:mtext>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:mi>&#920;</m:mi>
               <m:mo stretchy="true">|</m:mo>
               <m:mi>D</m:mi>
               <m:mo>,</m:mo>
               <m:msub>
                  <m:mi>M</m:mi>
                  <m:mi>k</m:mi>
               </m:msub>
            </m:mrow>
         </m:mfenced>
         <m:mtext mathvariant="normal">Pr</m:mtext>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msub>
                  <m:mi>M</m:mi>
                  <m:mi>k</m:mi>
               </m:msub>
               <m:mo stretchy="true">|</m:mo>
               <m:mi>D</m:mi>
            </m:mrow>
         </m:mfenced>
      </m:mrow>
   </m:mstyle>
</m:mrow>
</m:math>
					</inline-formula>, where <it>M</it>
					<sub>1</sub>,&#8230;,<it>M</it>
					<sub>
						<it>k</it>
					</sub> are the models considered. Each model consists of a set of candidate regulators. In order to efficiently identify a compact set of promising models <it>M</it>
					<sub>
						<it>k</it>
					</sub> out of all possible models, two approaches are sequentially applied. First, the leaps and bounds algorithm <abbrgrp>
						<abbr bid="B84">84</abbr>
					</abbrgrp> is applied to identify the best <it>nbest</it> models for each number of variables (i.e., regulators). Next, Occam&#8217;s window is applied to discard models with much lower posterior model probabilities than the best one <abbrgrp>
						<abbr bid="B85">85</abbr>
					</abbrgrp>. The Bayesian Information Criterion (BIC) <abbrgrp>
						<abbr bid="B86">86</abbr>
					</abbrgrp> is used to approximate each model's integrated likelihood, from which its posterior model probability can be determined.</p><p>While BMA has performed well in many applications <abbrgrp>
						<abbr bid="B60">60</abbr>
					</abbrgrp>, it is hard to apply directly to the current data set in which there are many more variables than samples. Yeung et al. <abbrgrp>
						<abbr bid="B62">62</abbr>
					</abbrgrp> proposed an iterative version of BMA (iBMA) to resolve this problem. At each iteration, BMA is applied to a small number, say, <it>w</it>&#8201;=&#8201;30, of variables that could be efficiently enumerated by leaps and bounds. Candidate predictor variables with a low posterior inclusion probability are discarded, leaving room for other variables in the candidate list to be considered in subsequent iterations. This procedure continues until all the variables have been processed.</p>
			</sec>
			<sec>
				<st>
					<p>Supervised framework for the integration of external knowledge</p>
				</st><p>We formulated network construction from time series data as a regression problem in which the expression of each gene is predicted by a linear combination of the expression of candidate regulators at the previous time point. Let <it>D</it> be the entire data set and <it>X</it>
					<sub>
						<it>g</it>,<it>t</it>,<it>s</it>
					</sub> be the expression of gene <it>g</it> at time <it>t</it> in segregant <it>s</it>. Denote by <it>R</it>
					<sub>
						<it>g</it>
					</sub> the set of regulators for gene <it>g</it> in a candidate model. The expression of gene <it>g</it> is formulated by the following regression model:</p><p>
					<display-formula id="M4">
						<m:math name="1752-0509-6-101-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>E</m:mi>
   <m:mo>[</m:mo>
   <m:mrow>
      <m:msub>
         <m:mi>X</m:mi>
         <m:mrow>
            <m:mi>g</m:mi>
            <m:mo>,</m:mo>
            <m:mi>t</m:mi>
            <m:mo>,</m:mo>
            <m:mi>s</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo>|</m:mo>
      <m:mi>D</m:mi>
   </m:mrow>
   <m:mo>]</m:mo>
   <m:mo>=</m:mo>
   <m:msub>
      <m:mi>&#946;</m:mi>
      <m:mrow>
         <m:mi>g</m:mi>
         <m:mo>,</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo>+</m:mo>
   <m:mstyle displaystyle="true">
      <m:munder>
         <m:mo>&#8721;</m:mo>
         <m:mrow>
            <m:mi>r</m:mi>
            <m:mo>&#8712;</m:mo>
            <m:msub>
               <m:mi>R</m:mi>
               <m:mi>g</m:mi>
            </m:msub>
         </m:mrow>
      </m:munder>
      <m:mrow>
         <m:msub>
            <m:mi>&#946;</m:mi>
            <m:mrow>
               <m:mi>g</m:mi>
               <m:mo>,</m:mo>
               <m:mi>r</m:mi>
            </m:mrow>
         </m:msub>
         <m:msub>
            <m:mi>X</m:mi>
            <m:mrow>
               <m:mi>r</m:mi>
               <m:mo>,</m:mo>
               <m:mi>t</m:mi>
               <m:mo>&#8722;</m:mo>
               <m:mn>1</m:mn>
               <m:mo>,</m:mo>
               <m:mi>s</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:mstyle>
   <m:mo>,</m:mo>
</m:mrow>
</m:math>
					</display-formula>
				</p><p>where <it>E</it> denotes expectation and <it>&#946;</it>&#8217;s are regression coefficients. For each gene, we apply iBMA to infer the set of regulators.</p><p>To account for external knowledge in the network construction process, Yeung et al. <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp> introduced a supervised framework to estimate the weights of various types of evidence of transcriptional regulation and subsequently derived top candidate regulators. For instance, a target gene is likely to be co-expressed with its regulators across diverse conditions in publicly available, large-scale microarray experiments <abbrgrp>
						<abbr bid="B78">78</abbr>
						<abbr bid="B87">87</abbr>
						<abbr bid="B88">88</abbr>
					</abbrgrp>. ChIP-chip data <abbrgrp>
						<abbr bid="B89">89</abbr>
					</abbrgrp> provide supporting evidence for a direct regulatory relationship between a given TF and a gene of interest by showing that the TF directly binds to the promoter of that gene. A candidate regulator with known regulatory roles in curated databases such as the <it>Saccharomyces</it> Genome Database (SGD) <abbrgrp>
						<abbr bid="B90">90</abbr>
					</abbrgrp> would be favored <it>a priori</it>. Polymorphisms in the amino acid sequence of a candidate regulator that affect its regulatory potential provide further evidence of a regulatory relationship <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp>. Common gene ontology (GO) <abbrgrp>
						<abbr bid="B91">91</abbr>
					</abbrgrp> annotations for a target gene and candidate regulators also provide evidence of functional relationship.</p><p>To study the relative importance of the various types of external knowledge from the supervised framework, we collected 583 positive examples of known regulatory relationships between TFs and target genes from the <it>Saccharomyces cerevisiae</it> Promoter Database (SCPD) <abbrgrp>
						<abbr bid="B92">92</abbr>
					</abbrgrp> and the Yeast Protein Database (YPD) <abbrgrp>
						<abbr bid="B93">93</abbr>
					</abbrgrp>. Random sampling of these TF-gene pairs was used to generate 444 negative examples. Logistic regression using BMA was applied to estimate the contribution of each type of external knowledge in the prediction of regulatory relationships. The fitted model was then used to predict the regulatory potential <it>&#960;</it>
					<sub>
						<it>gr</it>
					</sub> of a candidate regulator <it>r</it> for a gene <it>g</it>, i.e., the prior probability that candidate <it>r</it> regulates gene <it>g</it>, for all possible regulator-gene pairs. Next, the regulatory potentials were used to rank and shortlist the top <it>p</it> candidate regulators for each gene (<it>p</it>&#8201;=&#8201;100 by default in our experiments). The shortlisted candidates were then input to BMA for variable selection in the network construction process.</p>
			</sec>
			<sec>
				<st>
					<p>Incorporating prior probabilities into iBMA</p>
				</st><p>The potential benefit of using information from external knowledge to refine the search for regulators was shown by Yeung et al. and many others <abbrgrp>
						<abbr bid="B3">3</abbr>
						<abbr bid="B13">13</abbr>
						<abbr bid="B15">15</abbr>
						<abbr bid="B16">16</abbr>
						<abbr bid="B17">17</abbr>
						<abbr bid="B43">43</abbr>
						<abbr bid="B44">44</abbr>
					</abbrgrp>. However, external knowledge was only used to shortlist the top <it>p</it> candidate regulators for each target gene in Yeung et al. Here, we develop a formal framework that fully incorporates external knowledge into the BMA network construction process.</p><p>We associate each candidate model <it>M</it>
					<sub>
						<it>k</it>
					</sub> with a prior probability, namely:</p><p>
					<display-formula id="M5">
						<m:math name="1752-0509-6-101-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtext mathvariant="normal">Pr</m:mtext>
   <m:mfenced open="(" close=")">
      <m:msub>
         <m:mi>M</m:mi>
         <m:mi>k</m:mi>
      </m:msub>
   </m:mfenced>
   <m:mo>&#8733;</m:mo>
   <m:mstyle displaystyle="true">
      <m:munder>
         <m:mo>&#8719;</m:mo>
         <m:mi>r</m:mi>
      </m:munder>
      <m:mrow>
         <m:msubsup>
            <m:mi>&#960;</m:mi>
            <m:mtext mathvariant="italic">gr</m:mtext>
            <m:msub>
               <m:mi>&#948;</m:mi>
               <m:mtext mathvariant="italic">kr</m:mtext>
            </m:msub>
         </m:msubsup>
         <m:msup>
            <m:mfenced open="(" close=")">
               <m:mrow>
                  <m:mn>1</m:mn>
                  <m:mo>&#8722;</m:mo>
                  <m:msub>
                     <m:mi>&#960;</m:mi>
                     <m:mtext mathvariant="italic">gr</m:mtext>
                  </m:msub>
               </m:mrow>
            </m:mfenced>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo>&#8722;</m:mo>
               <m:msub>
                  <m:mi>&#948;</m:mi>
                  <m:mtext mathvariant="italic">kr</m:mtext>
               </m:msub>
            </m:mrow>
         </m:msup>
         <m:mo>,</m:mo>
      </m:mrow>
   </m:mstyle>
</m:mrow>
</m:math>
					</display-formula>
				</p><p>where <it>&#960;</it>
					<sub>
						<it>gr</it>
					</sub> is the regulatory potential of a candidate regulator <it>r</it> for a gene <it>g, &#948;</it>
					<sub>
						<it>kr</it>
					</sub>&#8201;=&#8201;1 if <it>r</it> &#8712;<it>M</it>
					<sub>
						<it>k</it>
					</sub> and <it>&#948;</it>
					<sub>
						<it>kr</it>
					</sub>&#8201;=&#8201;0 otherwise <abbrgrp>
						<abbr bid="B85">85</abbr>
						<abbr bid="B94">94</abbr>
					</abbrgrp>. Intuitively, we consider models consisting of candidate regulators supported by considerable external evidence to be frontrunners. A model that contains many candidate regulators with little support from external knowledge is penalized.</p><p>The posterior model probability of model <it>M</it>
					<sub>
						<it>k</it>
					</sub> is given by</p><p>
					<display-formula id="M6">
						<m:math name="1752-0509-6-101-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtext mathvariant="normal">Pr</m:mtext>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:msub>
            <m:mi>M</m:mi>
            <m:mi>k</m:mi>
         </m:msub>
         <m:mo stretchy="true">|</m:mo>
         <m:mi>D</m:mi>
      </m:mrow>
   </m:mfenced>
   <m:mo>&#8733;</m:mo>
   <m:mi>f</m:mi>
   <m:mfenced open="(" close=")">
      <m:mrow>
         <m:mi>D</m:mi>
         <m:mo stretchy="true">|</m:mo>
         <m:msub>
            <m:mi>M</m:mi>
            <m:mi>k</m:mi>
         </m:msub>
      </m:mrow>
   </m:mfenced>
   <m:mtext mathvariant="normal">Pr</m:mtext>
   <m:mfenced open="(" close=")">
      <m:msub>
         <m:mi>M</m:mi>
         <m:mi>k</m:mi>
      </m:msub>
   </m:mfenced>
   <m:mo>,</m:mo>
</m:mrow>
</m:math>
					</display-formula>
				</p><p>where <it>f</it>(<it>D</it> | <it>M</it>
					<sub>
						<it>k</it>
					</sub>) is the integrated likelihood of the data <it>D</it> under model <it>M</it>
					<sub>
						<it>k</it>
					</sub>, and the proportionality constant ensures that the posterior model probabilities sum up to 1.</p><p>Then Occam&#8217;s window was used to discard any model <it>M</it>
					<sub>
						<it>k</it>
					</sub> having a posterior odds less than 1/<it>OR</it> relative to the model with the highest posterior probability, <it>M</it>
					<sub>
						<it>opt</it>
					</sub>. The parameter <it>OR</it> controls the compactness of the set of selected models, and here we set it to 20.</p>
			</sec>
			<sec>
				<st>
					<p>Extension of iBMA: cumulative model support</p>
				</st><p>In Yeung et al. <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp>, the models selected in an intermediate iteration by iBMA were not recorded once that iteration was completed, and the final set of models selected were chosen only from those considered in the last iteration. While computationally efficient, this strategy overlooked the possibility of accumulated model support over multiple iterations. We improve the model selection process by storing all the models selected in any iteration and applying Occam&#8217;s window to this cumulative set of models as the last step in the algorithm.</p><p>At the end of each iteration of iBMA, and after applying Occam&#8217;s window to all models considered, we compute the posterior inclusion probabilities for each candidate regulator <it>r</it> by summing up the posterior probabilities of all models that involve this regulator.</p><p>
					<display-formula id="M7">
						<m:math name="1752-0509-6-101-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtext mathvariant="normal">Pr</m:mtext>
   <m:mo>(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mi>&#946;</m:mi>
         <m:mtext mathvariant="italic">gr</m:mtext>
      </m:msub>
      <m:mo>&#8800;</m:mo>
      <m:mn>0</m:mn>
      <m:mo>|</m:mo>
      <m:mi>D</m:mi>
   </m:mrow>
   <m:mo>)</m:mo>
   <m:mo>=</m:mo>
   <m:mstyle displaystyle="true">
      <m:munder>
         <m:mo>&#8721;</m:mo>
         <m:mrow>
            <m:mi>k</m:mi>
            <m:mo>:</m:mo>
            <m:msub>
               <m:mi>M</m:mi>
               <m:mi>k</m:mi>
            </m:msub>
            <m:mo>&#8712;</m:mo>
            <m:mi>&#934;</m:mi>
         </m:mrow>
      </m:munder>
      <m:mrow>
         <m:mtext mathvariant="normal">Pr</m:mtext>
         <m:mfenced open="(" close=")">
            <m:mrow>
               <m:msub>
                  <m:mi>M</m:mi>
                  <m:mi>k</m:mi>
               </m:msub>
               <m:mo stretchy="true">|</m:mo>
               <m:mi>D</m:mi>
            </m:mrow>
         </m:mfenced>
      </m:mrow>
   </m:mstyle>
   <m:mo>&#183;</m:mo>
   <m:msub>
      <m:mi>&#948;</m:mi>
      <m:mtext mathvariant="italic">kr</m:mtext>
   </m:msub>
   <m:mtext>.</m:mtext>
</m:mrow>
</m:math>
					</display-formula>
				</p><p>where F is the set of all possible models for gene g, &#946;<sub>
						<it>gr</it>
					</sub> is the regression coefficient of a candidate regulator <it>r</it> for a gene <it>g, &#948;</it>
					<sub>
						<it>kr</it>
					</sub>&#8201;=&#8201;1 if <it>r</it> &#8712;<it>M</it>
					<sub>
						<it>k</it>
					</sub> and <it>&#948;</it>
					<sub>
						<it>kr</it>
					</sub>&#8201;=&#8201;0 otherwise. Finally, we infer regulators for each target gene <it>g</it> by thresholding on the posterior inclusion probability at a predetermined level (50% in all our experiments unless otherwise specified).</p>
			</sec>
			<sec>
				<st>
					<p>Extensions of the supervised framework</p>
				</st><p>We have extended the supervised framework of Yeung et al. <abbrgrp>
						<abbr bid="B3">3</abbr>
					</abbrgrp> in three ways.</p>
				<sec>
					<st>
						<p>Imputation of missing values in ChIP-chip data</p>
					</st><p>About 9% of the ChIP-chip data used in the training samples were originally undefined. The ChIP-chip data take the form of <it>p</it>-values for the statistical tests of whether candidate regulator <it>r</it> binds to the upstream region of gene <it>g in-vivo</it>. In <abbrgrp>
							<abbr bid="B3">3</abbr>
						</abbrgrp>, those undefined values were regarded as lack of evidence for upstream binding and assigned values of one. Here, we used multiple imputation <abbrgrp>
							<abbr bid="B95">95</abbr>
							<abbr bid="B96">96</abbr>
						</abbrgrp>, in which we sampled with replacement from the empirical distribution of the non-missing ChIP-chip data, conditioning on the presence or absence of regulatory relationships. We used 20 imputations as recommended by Graham et al. <abbrgrp>
							<abbr bid="B97">97</abbr>
						</abbrgrp> for scenarios with about 10% missing data. Logistic regression was then performed on the training sample filled with the imputed ChIP-chip values.</p>
				</sec>
				<sec>
					<st>
						<p>Truncation of extreme values in external data</p>
					</st><p>Some of the external data types used in the supervised learning stage contained value ranges for individual genes that far exceeded the ranges for these genes in the training samples, e.g. the SNP-level information in Additional file <supplr sid="S2">2</supplr>: Table S3. Therefore, we truncated all extreme values in the external data to the respective maximum value observed in the training samples.</p>
				</sec>
				<sec>
					<st>
						<p>Adjustment for sampling bias regarding positive and negative cases</p>
					</st><p>In the supervised framework of Yeung et al., the expected number of regulators per target gene, computed as the sum of regulatory potentials of all candidate regulators, mostly fell between 400 and 600 (see Figure <figr fid="F2">2</figr>(a)). Such an apparent overestimation of positive regulatory relationships was due to the fact that similar numbers of positive and negative examples in the supervised learning stage. Given the sparse nature of a gene regulatory network, we expect the number of TF-gene pairs with regulatory relationships to be a small proportion of the total.</p>
					<fig id="F2"><title><p>Figure 2</p></title><caption><p>The expected number of regulators per target gene in accordance with external knowledge</p></caption><text>
   <p><b>The expected number of regulators per target gene in accordance with external knowledge.</b> Histogram of the expected number of regulators per target gene in the <b>A</b>. absence / <b>B</b>. presence of a proper measure to account for the difference in sampling rates for positive and negative examples respectively at the supervised learning stage.</p>
</text><graphic file="1752-0509-6-101-2"/></fig><p>Here, we address this issue by using a strategy that is commonly used in case&#8211;control studies, in which disease (positive) cases are usually rare <abbrgrp>
							<abbr bid="B98">98</abbr>
							<abbr bid="B99">99</abbr>
						</abbrgrp>. Let <it>&#960;</it>
						<sub>1</sub> and <it>&#960;</it>
						<sub>0</sub> be the sampling rates for positive and negative cases respectively. To adjust for the difference in the sampling rates, we add an offset of -log(<it>&#960;</it>
						<sub>1</sub>/<it>&#960;</it>
						<sub>0</sub>) to the logistic regression model. Equivalently, we divide the predicted odds by <it>&#960;</it>
						<sub>1</sub>/<it>&#960;</it>
						<sub>0</sub>. Previous literature has suggested that the in-degree distribution of gene regulatory networks decays exponentially <abbrgrp>
							<abbr bid="B100">100</abbr>
							<abbr bid="B101">101</abbr>
							<abbr bid="B102">102</abbr>
						</abbrgrp>. Based on regulatory relationships documented in various yeast databases <abbrgrp>
							<abbr bid="B90">90</abbr>
							<abbr bid="B92">92</abbr>
							<abbr bid="B93">93</abbr>
							<abbr bid="B103">103</abbr>
							<abbr bid="B104">104</abbr>
						</abbrgrp>, Guelzim et al. <abbrgrp>
							<abbr bid="B100">100</abbr>
						</abbrgrp> empirically estimated the in-degree distribution of the regulatory network as 157<it>e</it>
						<sup>-0.45<it>m</it>
						</sup>, where <it>m</it> denotes the number of TFs for a target gene. This implies that each target gene is regulated by approximately 2.76 TFs on average. Since we have 583 positive training examples, 444 negative examples, and 6000 yeast genes, we characterize such a network with density <it>&#964;</it>&#8201;=&#8201;2.76/6000&#8201;=&#8201;0.00046, and compute <inline-formula>
							<m:math name="1752-0509-6-101-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>&#960;</m:mi>
      <m:mn>1</m:mn>
   </m:msub>
   <m:mo>=</m:mo>
   <m:mfrac>
      <m:mn>583</m:mn>
      <m:mrow>
         <m:mn>6000</m:mn>
         <m:mo>&#215;</m:mo>
         <m:mn>2.76</m:mn>
      </m:mrow>
   </m:mfrac>
   <m:mo>=</m:mo>
   <m:mn>3</m:mn>
   <m:mo>.</m:mo>
   <m:mn>52</m:mn>
   <m:mo>%</m:mo>
</m:mrow>
</m:math>
						</inline-formula>, and <inline-formula>
							<m:math name="1752-0509-6-101-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>&#960;</m:mi>
      <m:mn>0</m:mn>
   </m:msub>
   <m:mo>=</m:mo>
   <m:mfrac>
      <m:mn>444</m:mn>
      <m:mrow>
         <m:mfenced open="[" close="]">
            <m:mrow>
               <m:mn>6000</m:mn>
               <m:mo>&#215;</m:mo>
               <m:mfenced open="(" close=")">
                  <m:mrow>
                     <m:mn>6000</m:mn>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>2.76</m:mn>
                  </m:mrow>
               </m:mfenced>
            </m:mrow>
         </m:mfenced>
      </m:mrow>
   </m:mfrac>
   <m:mo>=</m:mo>
   <m:mn>0.0012</m:mn>
   <m:mo>%</m:mo>
</m:mrow>
</m:math>
						</inline-formula>. Therefore, we divide all the predicted odds by <it>&#960;</it>
						<sub>1</sub>/<it>&#960;</it>
						<sub>0</sub>&#8201;=&#8201;2853. For instance, if the original predicted probability is 0.9, i.e., the predicted odds is 9, then after scaling the odds adjusted for sampling bias, it becomes 9/2853&#8201;=&#8201;0.0032, implying an adjusted probability of 0.0032. As shown in Figure <figr fid="F2">2</figr>(b), the expected number of regulators per target gene has dropped substantially to a level of around 0.5 after our three correction strategies (adjustment of sampling bias, imputation of missing ChIP-chip values and truncation of extreme values) are applied. Additional file <supplr sid="S1">1</supplr>: Figure S2 shows the incremental merit of our correction strategies. Additional file <supplr sid="S2">2</supplr>: Table S3 gives the estimated regression coefficient and the posterior probability for each external data type in our revised supervised framework.</p><p>To assess the sensitivity of our results to changes in the assumed prior average number of regulators per target gene, we repeated the analysis with various levels of the network density <it>&#964;</it>, and found that the assessment results were comparable. Please see the <supplr sid="S3">Additional file 3</supplr> for complete details.</p>
					<suppl id="S3">
						<title>
							<p>Additional file 3</p>
						</title>
						<text>
							<p>
								<b>Text containing supplementary materials and methods.</b>
							</p>
						</text>
						<file name="1752-0509-6-101-S3.pdf">
   <p>Click here for file</p>
</file>
					</suppl>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Summary: outline of algorithm</p>
				</st><p indent="1">1. For each gene <it>g</it>, rank the candidate regulators based on the regulatory potentials predicted from the supervised framework.</p><p indent="1">2. Shortlist the top <it>p</it> candidates from the ranked list (<it>p</it>&#8201;=&#8201;100 in our experiments).</p><p indent="1">3. Fill the BMA window with the top <it>w</it> candidates in the shortlist (<it>w</it>&#8201;=&#8201;30 in our experiments).</p><p indent="1">4. Apply BMA with prior model probabilities based on the external knowledge:</p><p indent="2">a. Determine the best <it>nbest</it> models for each number of variables using the leaps and bounds algorithm (<it>nbest</it>&#8201;=&#8201;10 in our experiments).</p><p indent="2">b. For each selected model, compute its prior probability relative to the <it>w</it> candidates in the current BMA window using Equation (5).</p><p indent="2">c. Remove the <it>w</it> candidate regulators with posterior inclusion probability Pr(<it>&#946;</it>
					<sub>
						<it>gr</it>
					</sub>&#8201;&#8800;&#8201;0 | <it>D</it>) &lt;5%.</p><p indent="1">5. Fill the <it>w</it>-candidate BMA window with those not considered yet in the shortlist.</p><p indent="1">6. Repeat steps 4&#8211;5 until all the <it>p</it> candidates in the shortlist have been processed.</p><p indent="1">7. Compute the prior probability for all selected models relative to all the <it>p</it> shortlisted candidates using Equation (5).</p><p indent="1">8. Take the collection of all models selected at any iteration of BMA, and apply Occam&#8217;s window, reducing the set of models.</p><p indent="1">9. Compute the posterior inclusion probability for each candidate regulator using the set of selected models, and infer candidates associated with a posterior probability exceeding a pre-specified threshold (50%) to be regulators for target gene <it>g</it>.</p><p>External knowledge is used in the following ways:</p><p indent="1">1. All the candidate regulators are ranked according to their regulatory potentials, which were predicted using the available external data sources at the supervised learning stage.</p><p indent="1">2. Model selection is performed by comparing models against each other based on their posterior odds. As shown by Equation (6), the posterior odds is proportional to a product of the integrated likelihood and the prior odds. The prior probability and, therefore, the prior odds, of a candidate model are formulated as a function of regulatory potentials.</p><p indent="1">3. The posterior inclusion probability of each candidate regulator, from which inference is made about the presence or absence of a regulatory relationship, is positively related to its regulatory potential. As shown in Equation (5), a factor of <it>&#960;</it>
					<sub>
						<it>gr</it>
					</sub> is contributed to each model in which the candidate <it>g</it> is included. Otherwise, a factor of 1- <it>&#960;</it>
					<sub>
						<it>gr</it>
					</sub> is contributed to each model.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Abbreviations</p>
			</st><p>BMA: Bayesian Model Averaging; iBMA: Iterative Bayesian Model Averaging; LAR: Least angle regression; LASSO: Least absolute shrinkage and selection operator; TF: Transcription factor.</p>
		</sec>
		<sec>
			<st>
				<p>Competing interests</p>
			</st><p>The authors declare that they have no competing interest.</p>
		</sec>
		<sec>
			<st>
				<p>Author contributions</p>
			</st><p>KL and AER developed the methodology. KL implemented the methods. KL and KYY analyzed the data. KMD performed and JZ, EES, REB designed the experiments. AER and KYY conceived the study. KL, AER and KYY wrote the manuscript. All authors read, edited and approved the final manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgments</p>
				</st><p>We would like to thank Dr. Chris Fraley for her code to generate the precision-recall curves in Supplementary Figure S4 and Supplementary Table S5, and Dr. John E. Mittler for helpful comments and discussions. In addition, we thank the Western Canada Research Grid (WestGrid) for providing computational resources.</p><p>KYY, KL, AER, KMD and REB are supported by NIH grants 5R01GM084163. REB, KL and KYY are also supported by 3R01GM084163-02S2. REB, KMD and KYY were supported by a generous basic research grant from Merck. AER was also supported by NIH grants R01 HD54511 and R01 HD070936.</p>
			</sec>
		</ack>
		<refgrp><bibl id="B1"><title><p>Molecular networks as sensors and drivers of common human diseases</p></title><aug><au><snm>Schadt</snm><fnm>EE</fnm></au></aug><source>Nature</source><pubdate>2009</pubdate><volume>461</volume><fpage>218</fpage><lpage>223</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature08454</pubid><pubid idtype="pmpid" link="fulltext">19741703</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Advances in systems biology are enhancing our understanding of disease and moving us closer to novel disease treatments</p></title><aug><au><snm>Schadt</snm><fnm>EE</fnm></au><au><snm>Zhang</snm><fnm>B</fnm></au><au><snm>Zhu</snm><fnm>J</fnm></au></aug><source>Genetica</source><pubdate>2009</pubdate><volume>136</volume><fpage>259</fpage><lpage>269</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s10709-009-9359-x</pubid><pubid idtype="pmpid" link="fulltext">19363597</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Construction of regulatory networks using expression time-series data of a genotyped population</p></title><aug><au><snm>Yeung</snm><fnm>KY</fnm></au><au><snm>Dombek</snm><fnm>KM</fnm></au><au><snm>Lo</snm><fnm>K</fnm></au><au><snm>Mittler</snm><fnm>JE</fnm></au><au><snm>Zhu</snm><fnm>J</fnm></au><au><snm>Schadt</snm><fnm>EE</fnm></au><au><snm>Bumgarner</snm><fnm>RE</fnm></au><au><snm>Raftery</snm><fnm>AE</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2011</pubdate><volume>108</volume><fpage>19436</fpage><lpage>19441</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.1116442108</pubid><pubid idtype="pmcid">3228453</pubid><pubid idtype="pmpid" link="fulltext">22084118</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>A tutorial on learning with Bayesian networks</p></title><aug><au><snm>Heckerman</snm><fnm>D</fnm></au></aug><source>Studies in Computational Intelligence</source><pubdate>2008</pubdate><volume>156</volume><fpage>33</fpage><lpage>82</lpage><xrefbib><pubid idtype="doi">10.1007/978-3-540-85066-3_3</pubid></xrefbib></bibl><bibl id="B5"><aug><au><snm>Jensen</snm><fnm>FV</fnm></au><au><snm>Nielsen</snm><fnm>TD</fnm></au></aug><source>Bayesian networks and decision graphs</source><publisher>New York, NY: Springer</publisher><edition>2</edition><pubdate>2007</pubdate></bibl><bibl id="B6"><aug><au><snm>Pearl</snm><fnm>J</fnm></au></aug><source>Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference</source><publisher>San Francisco, CA: Morgan Kaufmann</publisher><pubdate>1988</pubdate></bibl><bibl id="B7"><title><p>Inferring cellular networks using probabilistic graphical models</p></title><aug><au><snm>Friedman</snm><fnm>N</fnm></au></aug><source>Science</source><pubdate>2004</pubdate><volume>303</volume><fpage>799</fpage><lpage>805</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1094068</pubid><pubid idtype="pmpid" link="fulltext">14764868</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Using Bayesian networks to analyze expression data</p></title><aug><au><snm>Friedman</snm><fnm>N</fnm></au><au><snm>Linial</snm><fnm>M</fnm></au><au><snm>Nachman</snm><fnm>I</fnm></au><au><snm>Pe&apos;er</snm><fnm>D</fnm></au></aug><source>J Comput Biol</source><pubdate>2000</pubdate><volume>7</volume><fpage>601</fpage><lpage>620</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1089/106652700750050961</pubid><pubid idtype="pmpid" link="fulltext">11108481</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks</p></title><aug><au><snm>Hartemink</snm><fnm>AJ</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Jaakkola</snm><fnm>TS</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Pac Symp Biocomput</source><pubdate>2001</pubdate><volume>6</volume><fpage>422</fpage><lpage>433</lpage></bibl><bibl id="B10"><title><p>Combining location and expression data for principled discovery of genetic regulatory network models</p></title><aug><au><snm>Hartemink</snm><fnm>AJ</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Jaakkola</snm><fnm>TS</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Pac Symp Biocomput</source><pubdate>2002</pubdate><volume>7</volume><fpage>437</fpage><lpage>449</lpage></bibl><bibl id="B11"><title><p>Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks</p></title><aug><au><snm>Husmeier</snm><fnm>D</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><fpage>2271</fpage><lpage>2282</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg313</pubid><pubid idtype="pmpid" link="fulltext">14630656</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Inferring subnetworks from perturbed expression profiles</p></title><aug><au><snm>Pe&apos;er</snm><fnm>D</fnm></au><au><snm>Regev</snm><fnm>A</fnm></au><au><snm>Elidan</snm><fnm>G</fnm></au><au><snm>Friedman</snm><fnm>N</fnm></au></aug><source>Bioinformatics</source><pubdate>2001</pubdate><volume>17</volume><fpage>S215</fpage><lpage>S224</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/17.suppl_1.S215</pubid><pubid idtype="pmpid" link="fulltext">11473012</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Seeded Bayesian Networks: constructing genetic networks from microarray data</p></title><aug><au><snm>Djebbari</snm><fnm>A</fnm></au><au><snm>Quackenbush</snm><fnm>J</fnm></au></aug><source>BMC Syst Biol</source><pubdate>2008</pubdate><volume>2</volume><fpage>57</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1752-0509-2-57</pubid><pubid idtype="pmcid">2474592</pubid><pubid idtype="pmpid" link="fulltext">18601736</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Modelling regulatory pathways in E. coli from time series expression profiles</p></title><aug><au><snm>Ong</snm><fnm>IM</fnm></au><au><snm>Glasner</snm><fnm>JD</fnm></au><au><snm>Page</snm><fnm>D</fnm></au></aug><source>Bioinformatics</source><pubdate>2002</pubdate><volume>18</volume><fpage>S241</fpage><lpage>S248</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12386008</pubid></xrefbib></bibl><bibl id="B15"><title><p>Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge</p></title><aug><au><snm>Geier</snm><fnm>F</fnm></au><au><snm>Timmer</snm><fnm>J</fnm></au><au><snm>Fleck</snm><fnm>C</fnm></au></aug><source>BMC Syst Biol</source><pubdate>2007</pubdate><volume>1</volume><fpage>11</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1752-0509-1-11</pubid><pubid idtype="pmcid">1839889</pubid><pubid idtype="pmpid" link="fulltext">17408501</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network</p></title><aug><au><snm>Imoto</snm><fnm>S</fnm></au><au><snm>Kim</snm><fnm>S</fnm></au><au><snm>Goto</snm><fnm>T</fnm></au><au><snm>Aburatani</snm><fnm>S</fnm></au><au><snm>Tashiro</snm><fnm>K</fnm></au><au><snm>Kuhara</snm><fnm>S</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>J Bioinform Comput Biol</source><pubdate>2003</pubdate><volume>1</volume><fpage>231</fpage><lpage>252</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1142/S0219720003000071</pubid><pubid idtype="pmpid">15290771</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks</p></title><aug><au><snm>Zhu</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>B</fnm></au><au><snm>Smith</snm><fnm>EN</fnm></au><au><snm>Drees</snm><fnm>B</fnm></au><au><snm>Brem</snm><fnm>RB</fnm></au><au><snm>Kruglyak</snm><fnm>L</fnm></au><au><snm>Bumgarner</snm><fnm>RE</fnm></au><au><snm>Schadt</snm><fnm>EE</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><fpage>854</fpage><lpage>861</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.167</pubid><pubid idtype="pmcid">2573859</pubid><pubid idtype="pmpid" link="fulltext">18552845</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>An integrative genomics approach to infer causal associations between gene expression and disease</p></title><aug><au><snm>Schadt</snm><fnm>EE</fnm></au><au><snm>Lamb</snm><fnm>J</fnm></au><au><snm>Yang</snm><fnm>X</fnm></au><au><snm>Zhu</snm><fnm>J</fnm></au><au><snm>Edwards</snm><fnm>S</fnm></au><au><snm>Guhathakurta</snm><fnm>D</fnm></au><au><snm>Sieberts</snm><fnm>SK</fnm></au><au><snm>Monks</snm><fnm>S</fnm></au><au><snm>Reitman</snm><fnm>M</fnm></au><au><snm>Zhang</snm><fnm>C</fnm></au><au><snm>Lum</snm><fnm>PY</fnm></au><au><snm>Leonardson</snm><fnm>A</fnm></au><au><snm>Thieringer</snm><fnm>R</fnm></au><au><snm>Metzger</snm><fnm>JM</fnm></au><au><snm>Yang</snm><fnm>L</fnm></au><au><snm>Castle</snm><fnm>J</fnm></au><au><snm>Zhu</snm><fnm>H</fnm></au><au><snm>Kash</snm><fnm>SF</fnm></au><au><snm>Drake</snm><fnm>TA</fnm></au><au><snm>Sachs</snm><fnm>A</fnm></au><au><snm>Lusis</snm><fnm>AJ</fnm></au></aug><source>Nat Genet</source><pubdate>2005</pubdate><volume>37</volume><fpage>710</fpage><lpage>717</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1589</pubid><pubid idtype="pmcid">2841396</pubid><pubid idtype="pmpid" link="fulltext">15965475</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Characterizing dynamic changes in the human blood transcriptional network</p></title><aug><au><snm>Zhu</snm><fnm>J</fnm></au><au><snm>Chen</snm><fnm>Y</fnm></au><au><snm>Leonardson</snm><fnm>AS</fnm></au><au><snm>Wang</snm><fnm>K</fnm></au><au><snm>Lamb</snm><fnm>JR</fnm></au><au><snm>Emilsson</snm><fnm>V</fnm></au><au><snm>Schadt</snm><fnm>EE</fnm></au></aug><source>PLoS Comput Biol</source><pubdate>2010</pubdate><volume>6</volume><fpage>e1000671</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.1000671</pubid><pubid idtype="pmcid">2820517</pubid><pubid idtype="pmpid" link="fulltext">20168994</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>A genomic regulatory network for development</p></title><aug><au><snm>Davidson</snm><fnm>EH</fnm></au><au><snm>Rast</snm><fnm>JP</fnm></au><au><snm>Oliveri</snm><fnm>P</fnm></au><au><snm>Ransick</snm><fnm>A</fnm></au><au><snm>Calestani</snm><fnm>C</fnm></au><au><snm>Yuh</snm><fnm>CH</fnm></au><au><snm>Minokawa</snm><fnm>T</fnm></au><au><snm>Amore</snm><fnm>G</fnm></au><au><snm>Hinman</snm><fnm>V</fnm></au><au><snm>Arenas-Mena</snm><fnm>C</fnm></au><au><snm>Otim</snm><fnm>O</fnm></au><au><snm>Brown</snm><fnm>CT</fnm></au><au><snm>Livi</snm><fnm>CB</fnm></au><au><snm>Lee</snm><fnm>PY</fnm></au><au><snm>Revilla</snm><fnm>R</fnm></au><au><snm>Rust</snm><fnm>AG</fnm></au><au><snm>Pan</snm><fnm>Z</fnm></au><au><snm>Schilstra</snm><fnm>MJ</fnm></au><au><snm>Clarke</snm><fnm>PJ</fnm></au><au><snm>Arnone</snm><fnm>MI</fnm></au><au><snm>Rowen</snm><fnm>L</fnm></au><au><snm>Cameron</snm><fnm>RA</fnm></au><au><snm>McClay</snm><fnm>DR</fnm></au><au><snm>Hood</snm><fnm>L</fnm></au><au><snm>Bolouri</snm><fnm>H</fnm></au></aug><source>Science</source><pubdate>2002</pubdate><volume>295</volume><fpage>1669</fpage><lpage>1678</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1069883</pubid><pubid idtype="pmpid" link="fulltext">11872831</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Learning the structure of dynamic probabilistic networks</p></title><aug><au><snm>Friedman</snm><fnm>N</fnm></au><au><snm>Murphy</snm><fnm>K</fnm></au><au><snm>Russell</snm><fnm>S</fnm></au></aug><source>1998</source><publisher>San Mateo, CA: Morgan Kaufmann</publisher><pubdate>1998</pubdate><fpage>139</fpage><lpage>147</lpage></bibl><bibl id="B22"><title><p>Inferring gene networks from time series microarray data using dynamic Bayesian networks</p></title><aug><au><snm>Kim</snm><fnm>SY</fnm></au><au><snm>Imoto</snm><fnm>S</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>Brief Bioinform</source><pubdate>2003</pubdate><volume>4</volume><fpage>228</fpage><lpage>235</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/4.3.228</pubid><pubid idtype="pmpid" link="fulltext">14582517</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Modeling gene expression data using dynamic Bayesian networks</p></title><aug><au><snm>Murphy</snm><fnm>K</fnm></au><au><snm>Mian</snm><fnm>S</fnm></au></aug><source>Technical Report, Computer Science Division</source><publisher>Berkeley, CA: University of California</publisher><pubdate>1999</pubdate></bibl><bibl id="B24"><title><p>Advances to Bayesian network inference for generating causal networks from observational biological data</p></title><aug><au><snm>Yu</snm><fnm>J</fnm></au><au><snm>Smith</snm><fnm>VA</fnm></au><au><snm>Wang</snm><fnm>PP</fnm></au><au><snm>Hartemink</snm><fnm>AJ</fnm></au><au><snm>Jarvis</snm><fnm>ED</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><fpage>3594</fpage><lpage>3603</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth448</pubid><pubid idtype="pmpid" link="fulltext">15284094</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Learning Bayesian Networks is NP-Complete</p></title><aug><au><snm>Chickering</snm><fnm>DM</fnm></au></aug><source>Learning from Data: Artificial Intelligence and Statistics V</source><publisher>Springer-Verlag</publisher><editor>Fisher D, Lenz HJ</editor><pubdate>1996</pubdate><fpage>121</fpage><lpage>130</lpage></bibl><bibl id="B26"><title><p>Large-sample learning of Bayesian networks is NP-hard</p></title><aug><au><snm>Chickering</snm><fnm>DM</fnm></au><au><snm>Heckerman</snm><fnm>D</fnm></au><au><snm>Meek</snm><fnm>C</fnm></au></aug><source>J Mach Learn Res</source><pubdate>2004</pubdate><volume>5</volume><fpage>1287</fpage><lpage>1330</lpage></bibl><bibl id="B27"><title><p>A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data</p></title><aug><au><snm>Zou</snm><fnm>M</fnm></au><au><snm>Conzen</snm><fnm>SD</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><fpage>71</fpage><lpage>79</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth463</pubid><pubid idtype="pmpid" link="fulltext">15308537</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>A new multiple regression approach for the construction of genetic regulatory networks</p></title><aug><au><snm>Zhang</snm><fnm>SQ</fnm></au><au><snm>Ching</snm><fnm>WK</fnm></au><au><snm>Tsing</snm><fnm>NK</fnm></au><au><snm>Leung</snm><fnm>HY</fnm></au><au><snm>Guo</snm><fnm>D</fnm></au></aug><source>Artif Intell Med</source><pubdate>2010</pubdate><volume>48</volume><fpage>153</fpage><lpage>160</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.artmed.2009.11.001</pubid><pubid idtype="pmpid" link="fulltext">19963359</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Reverse engineering gene networks using singular value decomposition and robust regression</p></title><aug><au><snm>Yeung</snm><fnm>MK</fnm></au><au><snm>Tegner</snm><fnm>J</fnm></au><au><snm>Collins</snm><fnm>JJ</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2002</pubdate><volume>99</volume><fpage>6163</fpage><lpage>6168</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.092576199</pubid><pubid idtype="pmcid">122920</pubid><pubid idtype="pmpid" link="fulltext">11983907</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Dynamic network reconstruction from gene expression data applied to immune response during bacterial infection</p></title><aug><au><snm>Guthke</snm><fnm>R</fnm></au><au><snm>Moller</snm><fnm>U</fnm></au><au><snm>Hoffmann</snm><fnm>M</fnm></au><au><snm>Thies</snm><fnm>F</fnm></au><au><snm>Topfer</snm><fnm>S</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><fpage>1626</fpage><lpage>1634</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti226</pubid><pubid idtype="pmpid" link="fulltext">15613398</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Inferring regulatory networks from expression data using tree-based methods</p></title><aug><au><snm>Huynh-Thu</snm><fnm>VA</fnm></au><au><snm>Irrthum</snm><fnm>A</fnm></au><au><snm>Wehenkel</snm><fnm>L</fnm></au><au><snm>Geurts</snm><fnm>P</fnm></au></aug><source>PLoS One</source><pubdate>2010</pubdate><volume>5</volume><fpage>e12776</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0012776</pubid><pubid idtype="pmcid">2946910</pubid><pubid idtype="pmpid" link="fulltext">20927193</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification</p></title><aug><au><snm>Lee</snm><fnm>SI</fnm></au><au><snm>Pe&apos;er</snm><fnm>D</fnm></au><au><snm>Dudley</snm><fnm>AM</fnm></au><au><snm>Church</snm><fnm>GM</fnm></au><au><snm>Koller</snm><fnm>D</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2006</pubdate><volume>103</volume><fpage>14062</fpage><lpage>14067</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0601852103</pubid><pubid idtype="pmcid">1599912</pubid><pubid idtype="pmpid" link="fulltext">16968785</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Inferring gene regression networks with model trees</p></title><aug><au><snm>Nepomuceno-Chamorro</snm><fnm>IA</fnm></au><au><snm>Aguilar-Ruiz</snm><fnm>JS</fnm></au><au><snm>Riquelme</snm><fnm>JC</fnm></au></aug><source>BMC Bioinforma</source><pubdate>2010</pubdate><volume>11</volume><fpage>517</fpage><xrefbib><pubid idtype="doi">10.1186/1471-2105-11-517</pubid></xrefbib></bibl><bibl id="B34"><title><p>Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data</p></title><aug><au><snm>Segal</snm><fnm>E</fnm></au><au><snm>Shapira</snm><fnm>M</fnm></au><au><snm>Regev</snm><fnm>A</fnm></au><au><snm>Pe&apos;er</snm><fnm>D</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Koller</snm><fnm>D</fnm></au><au><snm>Friedman</snm><fnm>N</fnm></au></aug><source>Nat Genet</source><pubdate>2003</pubdate><volume>34</volume><fpage>166</fpage><lpage>176</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1165</pubid><pubid idtype="pmpid" link="fulltext">12740579</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Using GeneReg to construct time delay gene regulatory networks</p></title><aug><au><snm>Huang</snm><fnm>T</fnm></au><au><snm>Liu</snm><fnm>L</fnm></au><au><snm>Qian</snm><fnm>Z</fnm></au><au><snm>Tu</snm><fnm>K</fnm></au><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Xie</snm><fnm>L</fnm></au></aug><source>BMC Res Notes</source><pubdate>2010</pubdate><volume>3</volume><fpage>142</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1756-0500-3-142</pubid><pubid idtype="pmcid">2892504</pubid><pubid idtype="pmpid" link="fulltext">20500822</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Regularization paths for generalized linear models via coordinate descent</p></title><aug><au><snm>Friedman</snm><fnm>J</fnm></au><au><snm>Hastie</snm><fnm>T</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au></aug><source>J Stat Softw</source><pubdate>2010</pubdate><volume>33</volume><fpage>1</fpage><lpage>22</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2929880</pubid><pubid idtype="pmpid">20808728</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Regularization and variable selection via the elastic net</p></title><aug><au><snm>Zou</snm><fnm>H</fnm></au><au><snm>Trevor</snm><fnm>H</fnm></au></aug><source>Journal of the Royal Statistical Society, Series B</source><pubdate>2005</pubdate><volume>67</volume><fpage>301</fpage><lpage>320</lpage><xrefbib><pubid idtype="doi">10.1111/j.1467-9868.2005.00503.x</pubid></xrefbib></bibl><bibl id="B38"><title><p>Weighted lasso in graphical Gaussian modeling for large gene network estimation based on microarray data</p></title><aug><au><snm>Shimamura</snm><fnm>T</fnm></au><au><snm>Imoto</snm><fnm>S</fnm></au><au><snm>Yamaguchi</snm><fnm>R</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>Genome Inform</source><pubdate>2007</pubdate><volume>19</volume><fpage>142</fpage><lpage>153</lpage><xrefbib><pubid idtype="pmpid">18546512</pubid></xrefbib></bibl><bibl id="B39"><title><p>Weighted-LASSO for structured network inference from time course data</p></title><aug><au><snm>Charbonnier</snm><fnm>C</fnm></au><au><snm>Chiquet</snm><fnm>J</fnm></au><au><snm>Ambroise</snm><fnm>C</fnm></au></aug><source>Stat Appl Genet Mol Biol</source><pubdate>2010</pubdate><volume>9</volume><fpage>Article 15</fpage><xrefbib><pubid idtype="pmpid" link="fulltext">20196750</pubid></xrefbib></bibl><bibl id="B40"><title><p>Gene expression prediction by soft integration and the elastic net-best performance of the DREAM3 gene expression challenge</p></title><aug><au><snm>Gustafsson</snm><fnm>M</fnm></au><au><snm>Hornquist</snm><fnm>M</fnm></au></aug><source>PLoS One</source><pubdate>2010</pubdate><volume>5</volume><fpage>e9134</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0009134</pubid><pubid idtype="pmcid">2821917</pubid><pubid idtype="pmpid" link="fulltext">20169069</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Integrative modeling of transcriptional regulation in response to antirheumatic therapy</p></title><aug><au><snm>Hecker</snm><fnm>M</fnm></au><au><snm>Goertsches</snm><fnm>RH</fnm></au><au><snm>Engelmann</snm><fnm>R</fnm></au><au><snm>Thiesen</snm><fnm>HJ</fnm></au><au><snm>Guthke</snm><fnm>R</fnm></au></aug><source>BMC Bioinforma</source><pubdate>2009</pubdate><volume>10</volume><fpage>262</fpage><xrefbib><pubid idtype="doi">10.1186/1471-2105-10-262</pubid></xrefbib></bibl><bibl id="B42"><title><p>Network analysis of transcriptional regulation in response to intramuscular interferon-beta-1a multiple sclerosis treatment</p></title><aug><au><snm>Hecker</snm><fnm>M</fnm></au><au><snm>Goertsches</snm><fnm>RH</fnm></au><au><snm>Fatum</snm><fnm>C</fnm></au><au><snm>Koczan</snm><fnm>D</fnm></au><au><snm>Thiesen</snm><fnm>HJ</fnm></au><au><snm>Guthke</snm><fnm>R</fnm></au><au><snm>Zettl</snm><fnm>UK</fnm></au></aug><source>Pharmacogenomics J</source><pubdate>2010</pubdate><note>in press</note></bibl><bibl id="B43"><title><p>Sparse regulatory networks</p></title><aug><au><snm>James</snm><fnm>G</fnm></au><au><snm>Sabatti</snm><fnm>C</fnm></au><au><snm>Zhou</snm><fnm>N</fnm></au><au><snm>Zhu</snm><fnm>J</fnm></au></aug><source>Annals of Applied Statistics</source><pubdate>2010</pubdate><volume>4</volume><fpage>663</fpage><lpage>686</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1214/10-AOAS350</pubid><pubid idtype="pmcid">3102251</pubid><pubid idtype="pmpid">21625366</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Learning a prior on regulatory potential from eQTL data</p></title><aug><au><snm>Lee</snm><fnm>SI</fnm></au><au><snm>Dudley</snm><fnm>AM</fnm></au><au><snm>Drubin</snm><fnm>D</fnm></au><au><snm>Silver</snm><fnm>PA</fnm></au><au><snm>Krogan</snm><fnm>NJ</fnm></au><au><snm>Pe&apos;er</snm><fnm>D</fnm></au><au><snm>Koller</snm><fnm>D</fnm></au></aug><source>PLoS Genet</source><pubdate>2009</pubdate><volume>5</volume><fpage>e1000358</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.1000358</pubid><pubid idtype="pmcid">2627940</pubid><pubid idtype="pmpid" link="fulltext">19180192</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Recovering genetic regulatory networks from micro-array data and location analysis data</p></title><aug><au><snm>Li</snm><fnm>F</fnm></au><au><snm>Yang</snm><fnm>Y</fnm></au></aug><source>Genome Inform</source><pubdate>2004</pubdate><volume>15</volume><fpage>131</fpage><lpage>140</lpage><xrefbib><pubid idtype="pmpid">15706499</pubid></xrefbib></bibl><bibl id="B46"><title><p>Incorporating predictor network in penalized regression with application to microarray data</p></title><aug><au><snm>Pan</snm><fnm>W</fnm></au><au><snm>Xie</snm><fnm>B</fnm></au><au><snm>Shen</snm><fnm>X</fnm></au></aug><source>Biometrics</source><pubdate>2010</pubdate><volume>66</volume><fpage>474</fpage><lpage>484</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1541-0420.2009.01296.x</pubid><pubid idtype="pmcid">3338337</pubid><pubid idtype="pmpid" link="fulltext">19645699</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer</p></title><aug><au><snm>Peng</snm><fnm>J</fnm></au><au><snm>Zhu</snm><fnm>J</fnm></au><au><snm>Bergamaschi</snm><fnm>A</fnm></au><au><snm>Han</snm><fnm>W</fnm></au><au><snm>Noh</snm><fnm>D-Y</fnm></au><au><snm>Pollack</snm><fnm>JR</fnm></au><au><snm>Wang</snm><fnm>P</fnm></au></aug><source>Ann Appl Stat</source><pubdate>2010</pubdate><volume>4</volume><fpage>53</fpage><lpage>77</lpage></bibl><bibl id="B48"><title><p>Least absolute regression network analysis of the murine osteoblast differentiation network</p></title><aug><au><snm>van Someren</snm><fnm>EP</fnm></au><au><snm>Vaes</snm><fnm>BL</fnm></au><au><snm>Steegenga</snm><fnm>WT</fnm></au><au><snm>Sijbers</snm><fnm>AM</fnm></au><au><snm>Dechering</snm><fnm>KJ</fnm></au><au><snm>Reinders</snm><fnm>MJ</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><fpage>477</fpage><lpage>484</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti816</pubid><pubid idtype="pmpid" link="fulltext">16332709</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>Multi-criterion optimization for genetic network modeling</p></title><aug><au><snm>van Someren</snm><fnm>EP</fnm></au><au><snm>Wessels</snm><fnm>LFA</fnm></au><au><snm>Backer</snm><fnm>E</fnm></au><au><snm>Reinders</snm><fnm>MJT</fnm></au></aug><source>Signal Process</source><pubdate>2003</pubdate><volume>83</volume><fpage>763</fpage><lpage>775</lpage><xrefbib><pubid idtype="doi">10.1016/S0165-1684(02)00473-5</pubid></xrefbib></bibl><bibl id="B50"><title><p>How to infer gene networks from expression profiles</p></title><aug><au><snm>Bansal</snm><fnm>M</fnm></au><au><snm>Belcastro</snm><fnm>V</fnm></au><au><snm>Ambesi-Impiombato</snm><fnm>A</fnm></au><au><snm>di Bernardo</snm><fnm>D</fnm></au></aug><source>Mol Syst Biol</source><pubdate>2007</pubdate><volume>3</volume><fpage>78</fpage><xrefbib><pubidlist><pubid idtype="pmcid">1828749</pubid><pubid idtype="pmpid" link="fulltext">17299415</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Linear modeling of mRNA expression levels during CNS development and injury</p></title><aug><au><snm>D&apos;Haeseleer</snm><fnm>P</fnm></au><au><snm>Wen</snm><fnm>X</fnm></au><au><snm>Fuhrman</snm><fnm>S</fnm></au><au><snm>Somogyi</snm><fnm>R</fnm></au></aug><source>Pac Symp Biocomput</source><pubdate>1999</pubdate><fpage>41</fpage><lpage>52</lpage></bibl><bibl id="B52"><title><p>Modeling and simulation of genetic regulatory systems: a literature review</p></title><aug><au><snm>de Jong</snm><fnm>H</fnm></au></aug><source>J Comput Biol</source><pubdate>2002</pubdate><volume>9</volume><fpage>67</fpage><lpage>103</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1089/10665270252833208</pubid><pubid idtype="pmpid" link="fulltext">11911796</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>Inference of gene regulatory networks and compound mode of action from time course gene expression profiles</p></title><aug><au><snm>Bansal</snm><fnm>M</fnm></au><au><snm>Della Gatta</snm><fnm>G</fnm></au><au><snm>di Bernardo</snm><fnm>D</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><fpage>815</fpage><lpage>822</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl003</pubid><pubid idtype="pmpid" link="fulltext">16418235</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo</p></title><aug><au><snm>Bonneau</snm><fnm>R</fnm></au><au><snm>Reiss</snm><fnm>DJ</fnm></au><au><snm>Shannon</snm><fnm>P</fnm></au><au><snm>Facciotti</snm><fnm>M</fnm></au><au><snm>Hood</snm><fnm>L</fnm></au><au><snm>Baliga</snm><fnm>NS</fnm></au><au><snm>Thorsson</snm><fnm>V</fnm></au></aug><source>Genome Biol</source><pubdate>2006</pubdate><volume>7</volume><fpage>R36</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2006-7-5-r36</pubid><pubid idtype="pmcid">1779511</pubid><pubid idtype="pmpid" link="fulltext">16686963</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks</p></title><aug><au><snm>di Bernardo</snm><fnm>D</fnm></au><au><snm>Thompson</snm><fnm>MJ</fnm></au><au><snm>Gardner</snm><fnm>TS</fnm></au><au><snm>Chobot</snm><fnm>SE</fnm></au><au><snm>Eastwood</snm><fnm>EL</fnm></au><au><snm>Wojtovich</snm><fnm>AP</fnm></au><au><snm>Elliott</snm><fnm>SJ</fnm></au><au><snm>Schaus</snm><fnm>SE</fnm></au><au><snm>Collins</snm><fnm>JJ</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2005</pubdate><volume>23</volume><fpage>377</fpage><lpage>383</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1075</pubid><pubid idtype="pmpid" link="fulltext">15765094</pubid></pubidlist></xrefbib></bibl><bibl id="B56"><title><p>Inferring genetic networks and identifying compound mode of action via expression profiling</p></title><aug><au><snm>Gardner</snm><fnm>TS</fnm></au><au><snm>di Bernardo</snm><fnm>D</fnm></au><au><snm>Lorenz</snm><fnm>D</fnm></au><au><snm>Collins</snm><fnm>JJ</fnm></au></aug><source>Science</source><pubdate>2003</pubdate><volume>301</volume><fpage>102</fpage><lpage>105</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1081900</pubid><pubid idtype="pmpid" link="fulltext">12843395</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>A parallel implementation of the network identification by multiple regression (NIR) algorithm to reverse-engineer regulatory gene networks</p></title><aug><au><snm>Gregoretti</snm><fnm>F</fnm></au><au><snm>Belcastro</snm><fnm>V</fnm></au><au><snm>di Bernardo</snm><fnm>D</fnm></au><au><snm>Oliva</snm><fnm>G</fnm></au></aug><source>PLoS One</source><pubdate>2010</pubdate><volume>5</volume><fpage>e10179</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0010179</pubid><pubid idtype="pmcid">2858156</pubid><pubid idtype="pmpid" link="fulltext">20422008</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling</p></title><aug><au><snm>Tegner</snm><fnm>J</fnm></au><au><snm>Yeung</snm><fnm>MK</fnm></au><au><snm>Hasty</snm><fnm>J</fnm></au><au><snm>Collins</snm><fnm>JJ</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2003</pubdate><volume>100</volume><fpage>5944</fpage><lpage>5949</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0933416100</pubid><pubid idtype="pmcid">156306</pubid><pubid idtype="pmpid" link="fulltext">12730377</pubid></pubidlist></xrefbib></bibl><bibl id="B59"><title><p>An integrative genomics approach to the reconstruction of gene networks in segregating populations</p></title><aug><au><snm>Zhu</snm><fnm>J</fnm></au><au><snm>Lum</snm><fnm>PY</fnm></au><au><snm>Lamb</snm><fnm>J</fnm></au><au><snm>GuhaThakurta</snm><fnm>D</fnm></au><au><snm>Edwards</snm><fnm>SW</fnm></au><au><snm>Thieringer</snm><fnm>R</fnm></au><au><snm>Berger</snm><fnm>JP</fnm></au><au><snm>Wu</snm><fnm>MS</fnm></au><au><snm>Thompson</snm><fnm>J</fnm></au><au><snm>Sachs</snm><fnm>AB</fnm></au><au><snm>Schadt</snm><fnm>EE</fnm></au></aug><source>Cytogenet Genome Res</source><pubdate>2004</pubdate><volume>105</volume><fpage>363</fpage><lpage>374</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1159/000078209</pubid><pubid idtype="pmpid" link="fulltext">15237224</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>Bayesian model selection in social research (with discussion)</p></title><aug><au><snm>Raftery</snm><fnm>AE</fnm></au></aug><source>Sociol Methodol</source><pubdate>1995</pubdate><volume>25</volume><fpage>111</fpage><lpage>193</lpage></bibl><bibl id="B61"><title><p>Bayesian model averaging for linear regression models</p></title><aug><au><snm>Raftery</snm><fnm>AE</fnm></au><au><snm>Madigan</snm><fnm>D</fnm></au><au><snm>Hoeting</snm><fnm>JA</fnm></au></aug><source>J Am Stat Assoc</source><pubdate>1997</pubdate><volume>92</volume><fpage>179</fpage><lpage>191</lpage><xrefbib><pubid idtype="doi">10.1080/01621459.1997.10473615</pubid></xrefbib></bibl><bibl id="B62"><title><p>Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data</p></title><aug><au><snm>Yeung</snm><fnm>KY</fnm></au><au><snm>Bumgarner</snm><fnm>RE</fnm></au><au><snm>Raftery</snm><fnm>AE</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><fpage>2394</fpage><lpage>2402</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti319</pubid><pubid idtype="pmpid" link="fulltext">15713736</pubid></pubidlist></xrefbib></bibl><bibl id="B63"><title><p>Regression shrinkage and selection via the LASSO</p></title><aug><au><snm>Tibshirani</snm><fnm>R</fnm></au></aug><source>J R Stat Soc Series B Stat Methodol</source><pubdate>1996</pubdate><volume>58</volume><fpage>267</fpage><lpage>288</lpage></bibl><bibl id="B64"><title><p>Least angle regression</p></title><aug><au><snm>Efron</snm><fnm>B</fnm></au><au><snm>Hastie</snm><fnm>T</fnm></au><au><snm>Johnstone</snm><fnm>I</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au></aug><source>Ann Stat</source><pubdate>2004</pubdate><volume>32</volume><fpage>407</fpage><lpage>499</lpage><xrefbib><pubid idtype="doi">10.1214/009053604000000067</pubid></xrefbib></bibl><bibl id="B65"><title><p>Least angle and L1 penalized regression: a review</p></title><aug><au><snm>Hesterberg</snm><fnm>T</fnm></au><au><snm>Choi</snm><fnm>NH</fnm></au><au><snm>Meier</snm><fnm>L</fnm></au><au><snm>Fraley</snm><fnm>C</fnm></au></aug><source>Statistics Surveys</source><pubdate>2008</pubdate><volume>2</volume><fpage>61</fpage><lpage>92</lpage><xrefbib><pubid idtype="doi">10.1214/08-SS035</pubid></xrefbib></bibl><bibl id="B66"><title><p>Regularization paths for generalized linear models via coordinate descent</p></title><aug><au><snm>Friedman</snm><fnm>J</fnm></au><au><snm>Hastie</snm><fnm>T</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au></aug><source>J Stat Softw</source><pubdate>2010</pubdate><volume>33</volume><fpage>1</fpage><lpage>22</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2929880</pubid><pubid idtype="pmpid">20808728</pubid></pubidlist></xrefbib></bibl><bibl id="B67"><title><p>glmnet: Lasso and elastic net regularized generalized linear models</p></title><aug><au><snm>Friedman</snm><fnm>J</fnm></au><au><snm>Hastie</snm><fnm>T</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au></aug><note>R package available at <url>http://cran.r-project.org/web/packages/glmnet/index.html</url></note></bibl><bibl id="B68"><title><p>The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae</p></title><aug><au><snm>Teixeira</snm><fnm>MC</fnm></au><au><snm>Monteiro</snm><fnm>P</fnm></au><au><snm>Jain</snm><fnm>P</fnm></au><au><snm>Tenreiro</snm><fnm>S</fnm></au><au><snm>Fernandes</snm><fnm>AR</fnm></au><au><snm>Mira</snm><fnm>NP</fnm></au><au><snm>Alenquer</snm><fnm>M</fnm></au><au><snm>Freitas</snm><fnm>AT</fnm></au><au><snm>Oliveira</snm><fnm>AL</fnm></au><au><snm>Sa-Correia</snm><fnm>I</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2006</pubdate><volume>34</volume><fpage>D446</fpage><lpage>D451</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj013</pubid><pubid idtype="pmcid">1347376</pubid><pubid idtype="pmpid" link="fulltext">16381908</pubid></pubidlist></xrefbib></bibl><bibl id="B69"><title><p>JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update</p></title><aug><au><snm>Bryne</snm><fnm>JC</fnm></au><au><snm>Valen</snm><fnm>E</fnm></au><au><snm>Tang</snm><fnm>MH</fnm></au><au><snm>Marstrand</snm><fnm>T</fnm></au><au><snm>Winther</snm><fnm>O</fnm></au><au><snm>da Piedade</snm><fnm>I</fnm></au><au><snm>Krogh</snm><fnm>A</fnm></au><au><snm>Lenhard</snm><fnm>B</fnm></au><au><snm>Sandelin</snm><fnm>A</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2008</pubdate><volume>36</volume><fpage>D102</fpage><lpage>D106</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn449</pubid><pubid idtype="pmcid">2238834</pubid><pubid idtype="pmpid" link="fulltext">18006571</pubid></pubidlist></xrefbib></bibl><bibl id="B70"><title><p>Applied bioinformatics for the identification of regulatory elements</p></title><aug><au><snm>Wasserman</snm><fnm>WW</fnm></au><au><snm>Sandelin</snm><fnm>A</fnm></au></aug><source>Nat Rev Genet</source><pubdate>2004</pubdate><volume>5</volume><fpage>276</fpage><lpage>287</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nrg1315</pubid><pubid idtype="pmpid" link="fulltext">15131651</pubid></pubidlist></xrefbib></bibl><bibl id="B71"><title><p>Large scale matching for Position Weight Matrices</p></title><aug><au><snm>Liefooghe</snm><fnm>A</fnm></au><au><snm>Touzet</snm><fnm>H</fnm></au><au><snm>Varr&#233;</snm><fnm>J-S</fnm></au></aug><source>Combinatorial Pattern Matching, Lecture Notes in Computer Science. Springer Verlag</source><pubdate>2006</pubdate><volume>4009</volume><fpage>401</fpage><lpage>412</lpage></bibl><bibl id="B72"><title><p>The landscape of genetic complexity across 5,700 gene expression traits in yeast</p></title><aug><au><snm>Brem</snm><fnm>RB</fnm></au><au><snm>Kruglyak</snm><fnm>L</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2005</pubdate><volume>102</volume><fpage>1572</fpage><lpage>1577</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0408709102</pubid><pubid idtype="pmcid">547855</pubid><pubid idtype="pmpid" link="fulltext">15659551</pubid></pubidlist></xrefbib></bibl><bibl id="B73"><aug><au><snm>Pearl</snm><fnm>J</fnm></au></aug><source>Causality: Models, Reasoning, and Inference</source><publisher>Cambridge University Press</publisher><pubdate>2000</pubdate></bibl><bibl id="B74"><aug><au><snm>Shipley</snm><fnm>B</fnm></au></aug><source>Cause and Correlation in Biology: A User&apos;s Guide to Path Analysis, Structural Equations and Causal Inference</source><publisher>Cambridge University Press</publisher><pubdate>2002</pubdate><xrefbib><pubidlist><pubid idtype="pmcid">2638159</pubid><pubid idtype="pmpid" link="fulltext">19091018</pubid></pubidlist></xrefbib></bibl><bibl id="B75"><aug><au><snm>Spirtes</snm><fnm>P</fnm></au><au><snm>Glymour</snm><fnm>C</fnm></au><au><snm>Scheines</snm><fnm>R</fnm></au></aug><source>Causation, Prediction and Search</source><publisher>MIT Press</publisher><pubdate>2000</pubdate></bibl><bibl id="B76"><title><p>Error distribution for gene expression data</p></title><aug><au><snm>Purdom</snm><fnm>E</fnm></au><au><snm>Holmes</snm><fnm>SP</fnm></au></aug><source>Stat Appl Genet Mol Biol</source><pubdate>2005</pubdate><volume>4</volume><fpage>Article16</fpage><xrefbib><pubid idtype="pmpid" link="fulltext">16646833</pubid></xrefbib></bibl><bibl id="B77"><title><p>Methods to reconstruct and compare transcriptional regulatory networks</p></title><aug><au><snm>Babu</snm><fnm>MM</fnm></au><au><snm>Lang</snm><fnm>B</fnm></au><au><snm>Aravind</snm><fnm>L</fnm></au></aug><source>Methods Mol Biol</source><pubdate>2009</pubdate><volume>541</volume><fpage>163</fpage><lpage>180</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/978-1-59745-243-4_8</pubid><pubid idtype="pmpid" link="fulltext">19381525</pubid></pubidlist></xrefbib></bibl><bibl id="B78"><title><p>The Stanford Microarray Database accommodates additional microarray platforms and data formats</p></title><aug><au><snm>Ball</snm><fnm>CA</fnm></au><au><snm>Awad</snm><fnm>IA</fnm></au><au><snm>Demeter</snm><fnm>J</fnm></au><au><snm>Gollub</snm><fnm>J</fnm></au><au><snm>Hebert</snm><fnm>JM</fnm></au><au><snm>Hernandez-Boussard</snm><fnm>T</fnm></au><au><snm>Jin</snm><fnm>H</fnm></au><au><snm>Matese</snm><fnm>JC</fnm></au><au><snm>Nitzberg</snm><fnm>M</fnm></au><au><snm>Wymore</snm><fnm>F</fnm></au><au><snm>Zachariah</snm><fnm>ZK</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2005</pubdate><volume>33</volume><fpage>D580</fpage><lpage>D582</lpage><xrefbib><pubidlist><pubid idtype="pmcid">539960</pubid><pubid idtype="pmpid" link="fulltext">15608265</pubid></pubidlist></xrefbib></bibl><bibl id="B79"><title><p>NCBI GEO: mining tens of millions of expression profiles - database and tools update</p></title><aug><au><snm>Barrett</snm><fnm>T</fnm></au><au><snm>Troup</snm><fnm>DB</fnm></au><au><snm>Wilhite</snm><fnm>SE</fnm></au><au><snm>Ledoux</snm><fnm>P</fnm></au><au><snm>Rudnev</snm><fnm>D</fnm></au><au><snm>Evangelista</snm><fnm>C</fnm></au><au><snm>Kim</snm><fnm>IF</fnm></au><au><snm>Soboleva</snm><fnm>A</fnm></au><au><snm>Tomashevsky</snm><fnm>M</fnm></au><au><snm>Edgar</snm><fnm>R</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><fpage>D760</fpage><lpage>D765</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl887</pubid><pubid idtype="pmcid">1669752</pubid><pubid idtype="pmpid" link="fulltext">17099226</pubid></pubidlist></xrefbib></bibl><bibl id="B80"><title><p>ArrayExpress - a public repository for microarray gene expression data at the EBI</p></title><aug><au><snm>Brazma</snm><fnm>A</fnm></au><au><snm>Parkinson</snm><fnm>H</fnm></au><au><snm>Sarkans</snm><fnm>U</fnm></au><au><snm>Shojatalab</snm><fnm>M</fnm></au><au><snm>Vilo</snm><fnm>J</fnm></au><au><snm>Abeygunawardena</snm><fnm>N</fnm></au><au><snm>Holloway</snm><fnm>E</fnm></au><au><snm>Kapushesky</snm><fnm>M</fnm></au><au><snm>Kemmeren</snm><fnm>P</fnm></au><au><snm>Lara</snm><fnm>GG</fnm></au><au><snm>Oezcimen</snm><fnm>A</fnm></au><au><snm>Rocca-Serra</snm><fnm>P</fnm></au><au><snm>Sansone</snm><fnm>SA</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>68</fpage><lpage>71</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg091</pubid><pubid idtype="pmcid">165538</pubid><pubid idtype="pmpid" link="fulltext">12519949</pubid></pubidlist></xrefbib></bibl><bibl id="B81"><title><p>Exploration, normalization, and summaries of high density oligonucleotide array probe level data</p></title><aug><au><snm>Irizarry</snm><fnm>RA</fnm></au><au><snm>Hobbs</snm><fnm>B</fnm></au><au><snm>Collin</snm><fnm>F</fnm></au><au><snm>Beazer-Barclay</snm><fnm>YD</fnm></au><au><snm>Antonellis</snm><fnm>KJ</fnm></au><au><snm>Scherf</snm><fnm>U</fnm></au><au><snm>Speed</snm><fnm>TP</fnm></au></aug><source>Biostatistics</source><pubdate>2003</pubdate><volume>4</volume><fpage>249</fpage><lpage>264</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/biostatistics/4.2.249</pubid><pubid idtype="pmpid" link="fulltext">12925520</pubid></pubidlist></xrefbib></bibl><bibl id="B82"><title><p>Bayesian model averaging: a tutorial</p></title><aug><au><snm>Hoeting</snm><fnm>JA</fnm></au><au><snm>Madigan</snm><fnm>D</fnm></au><au><snm>Raftery</snm><fnm>AE</fnm></au><au><snm>Volinsky</snm><fnm>CT</fnm></au></aug><source>Stat Sci</source><pubdate>1999</pubdate><volume>14</volume><fpage>382</fpage><lpage>401</lpage><xrefbib><pubid idtype="doi">10.1214/ss/1009212519</pubid></xrefbib></bibl><bibl id="B83"><title><p>Bayes Factors</p></title><aug><au><snm>Kass</snm><fnm>RE</fnm></au><au><snm>Raftery</snm><fnm>AE</fnm></au></aug><source>J Am Stat Assoc</source><pubdate>1995</pubdate><volume>90</volume><fpage>773</fpage><lpage>795</lpage><xrefbib><pubid idtype="doi">10.1080/01621459.1995.10476572</pubid></xrefbib></bibl><bibl id="B84"><title><p>Regression by leaps and bounds</p></title><aug><au><snm>Furnival</snm><fnm>GM</fnm></au><au><snm>Wilson</snm><fnm>RW</fnm></au></aug><source>Technometrics</source><pubdate>1974</pubdate><volume>16</volume><fpage>499</fpage><lpage>511</lpage><xrefbib><pubid idtype="doi">10.1080/00401706.1974.10489231</pubid></xrefbib></bibl><bibl id="B85"><title><p>Model selection and accounting for model uncertainty in graphical models using Occam's window</p></title><aug><au><snm>Madigan</snm><fnm>D</fnm></au><au><snm>Raftery</snm><fnm>A</fnm></au></aug><source>J Am Stat Assoc</source><pubdate>1994</pubdate><volume>89</volume><fpage>1335</fpage><lpage>1346</lpage></bibl><bibl id="B86"><title><p>Estimating the dimension of a model</p></title><aug><au><snm>Schwarz</snm><fnm>G</fnm></au></aug><source>Ann Stat</source><pubdate>1978</pubdate><volume>6</volume><fpage>461</fpage><lpage>464</lpage><xrefbib><pubid idtype="doi">10.1214/aos/1176344136</pubid></xrefbib></bibl><bibl id="B87"><title><p>Genomic expression programs in the response of yeast cells to environmental changes</p></title><aug><au><snm>Gasch</snm><fnm>AP</fnm></au><au><snm>Spellman</snm><fnm>PT</fnm></au><au><snm>Kao</snm><fnm>CM</fnm></au><au><snm>Carmel-Harel</snm><fnm>O</fnm></au><au><snm>Eisen</snm><fnm>MB</fnm></au><au><snm>Storz</snm><fnm>G</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au></aug><source>Mol Biol Cell</source><pubdate>2000</pubdate><volume>11</volume><fpage>4241</fpage><lpage>4257</lpage><xrefbib><pubidlist><pubid idtype="pmcid">15070</pubid><pubid idtype="pmpid" link="fulltext">11102521</pubid></pubidlist></xrefbib></bibl><bibl id="B88"><title><p>Functional discovery via a compendium of expression profiles</p></title><aug><au><snm>Hughes</snm><fnm>TR</fnm></au><au><snm>Marton</snm><fnm>MJ</fnm></au><au><snm>Jones</snm><fnm>AR</fnm></au><au><snm>Roberts</snm><fnm>CJ</fnm></au><au><snm>Stoughton</snm><fnm>R</fnm></au><au><snm>Armour</snm><fnm>CD</fnm></au><au><snm>Bennett</snm><fnm>HA</fnm></au><au><snm>Coffey</snm><fnm>E</fnm></au><au><snm>Dai</snm><fnm>H</fnm></au><au><snm>He</snm><fnm>YD</fnm></au><au><snm>Kidd</snm><fnm>MJ</fnm></au><au><snm>King</snm><fnm>AM</fnm></au><au><snm>Meyer</snm><fnm>MR</fnm></au><au><snm>Slade</snm><fnm>D</fnm></au><au><snm>Lum</snm><fnm>PY</fnm></au><au><snm>Stepaniants</snm><fnm>SB</fnm></au><au><snm>Shoemaker</snm><fnm>DD</fnm></au><au><snm>Gachotte</snm><fnm>D</fnm></au><au><snm>Chakraburtty</snm><fnm>K</fnm></au><au><snm>Simon</snm><fnm>J</fnm></au><au><snm>Bard</snm><fnm>M</fnm></au><au><snm>Friend</snm><fnm>SH</fnm></au></aug><source>Cell</source><pubdate>2000</pubdate><volume>102</volume><fpage>109</fpage><lpage>126</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0092-8674(00)00015-5</pubid><pubid idtype="pmpid" link="fulltext">10929718</pubid></pubidlist></xrefbib></bibl><bibl id="B89"><title><p>Transcriptional regulatory code of a eukaryotic genome</p></title><aug><au><snm>Harbison</snm><fnm>CT</fnm></au><au><snm>Gordon</snm><fnm>DB</fnm></au><au><snm>Lee</snm><fnm>TI</fnm></au><au><snm>Rinaldi</snm><fnm>NJ</fnm></au><au><snm>Macisaac</snm><fnm>KD</fnm></au><au><snm>Danford</snm><fnm>TW</fnm></au><au><snm>Hannett</snm><fnm>NM</fnm></au><au><snm>Tagne</snm><fnm>JB</fnm></au><au><snm>Reynolds</snm><fnm>DB</fnm></au><au><snm>Yoo</snm><fnm>J</fnm></au><au><snm>Jennings</snm><fnm>EG</fnm></au><au><snm>Zeitlinger</snm><fnm>J</fnm></au><au><snm>Pokholok</snm><fnm>DK</fnm></au><au><snm>Kellis</snm><fnm>M</fnm></au><au><snm>Rolfe</snm><fnm>PA</fnm></au><au><snm>Takusagawa</snm><fnm>KT</fnm></au><au><snm>Lander</snm><fnm>ES</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Fraenkel</snm><fnm>E</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Nature</source><pubdate>2004</pubdate><volume>431</volume><fpage>99</fpage><lpage>104</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature02800</pubid><pubid idtype="pmcid">3006441</pubid><pubid idtype="pmpid" link="fulltext">15343339</pubid></pubidlist></xrefbib></bibl><bibl id="B90"><title><p>Genetic and physical maps of Saccharomyces cerevisiae</p></title><aug><au><snm>Cherry</snm><fnm>JM</fnm></au><au><snm>Ball</snm><fnm>C</fnm></au><au><snm>Weng</snm><fnm>S</fnm></au><au><snm>Juvik</snm><fnm>G</fnm></au><au><snm>Schmidt</snm><fnm>R</fnm></au><au><snm>Adler</snm><fnm>C</fnm></au><au><snm>Dunn</snm><fnm>B</fnm></au><au><snm>Dwight</snm><fnm>S</fnm></au><au><snm>Riles</snm><fnm>L</fnm></au><au><snm>Mortimer</snm><fnm>RK</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au></aug><source>Nature</source><pubdate>1997</pubdate><volume>387</volume><fpage>67</fpage><lpage>73</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/387067a0</pubid><pubid idtype="pmcid">3057085</pubid><pubid idtype="pmpid">9169866</pubid></pubidlist></xrefbib></bibl><bibl id="B91"><title><p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium</p></title><aug><au><snm>Ashburner</snm><fnm>M</fnm></au><au><snm>Ball</snm><fnm>CA</fnm></au><au><snm>Blake</snm><fnm>JA</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Butler</snm><fnm>H</fnm></au><au><snm>Cherry</snm><fnm>JM</fnm></au><au><snm>Davis</snm><fnm>AP</fnm></au><au><snm>Dolinski</snm><fnm>K</fnm></au><au><snm>Dwight</snm><fnm>SS</fnm></au><au><snm>Eppig</snm><fnm>JT</fnm></au><au><snm>Harris</snm><fnm>MA</fnm></au><au><snm>Hill</snm><fnm>DP</fnm></au><au><snm>Issel-Tarver</snm><fnm>L</fnm></au><au><snm>Kasarskis</snm><fnm>A</fnm></au><au><snm>Lewis</snm><fnm>S</fnm></au><au><snm>Matese</snm><fnm>JC</fnm></au><au><snm>Richardson</snm><fnm>JE</fnm></au><au><snm>Ringwald</snm><fnm>M</fnm></au><au><snm>Rubin</snm><fnm>GM</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au></aug><source>Nat Genet</source><pubdate>2000</pubdate><volume>25</volume><fpage>25</fpage><lpage>29</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/75556</pubid><pubid idtype="pmcid">3037419</pubid><pubid idtype="pmpid" link="fulltext">10802651</pubid></pubidlist></xrefbib></bibl><bibl id="B92"><title><p>SCPD: a promoter database of the yeast Saccharomyces cerevisiae</p></title><aug><au><snm>Zhu</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>MQ</fnm></au></aug><source>Bioinformatics</source><pubdate>1999</pubdate><volume>15</volume><fpage>607</fpage><lpage>611</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/15.7.607</pubid><pubid idtype="pmpid" link="fulltext">10487868</pubid></pubidlist></xrefbib></bibl><bibl id="B93"><title><p>The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): comprehensive resources for the organization and comparison of model organism protein information</p></title><aug><au><snm>Costanzo</snm><fnm>MC</fnm></au><au><snm>Hogan</snm><fnm>JD</fnm></au><au><snm>Cusick</snm><fnm>ME</fnm></au><au><snm>Davis</snm><fnm>BP</fnm></au><au><snm>Fancher</snm><fnm>AM</fnm></au><au><snm>Hodges</snm><fnm>PE</fnm></au><au><snm>Kondu</snm><fnm>P</fnm></au><au><snm>Lengieza</snm><fnm>C</fnm></au><au><snm>Lew-Smith</snm><fnm>JE</fnm></au><au><snm>Lingner</snm><fnm>C</fnm></au><au><snm>Roberg-Perez</snm><fnm>KJ</fnm></au><au><snm>Tillberg</snm><fnm>M</fnm></au><au><snm>Brooks</snm><fnm>JE</fnm></au><au><snm>Garrels</snm><fnm>JI</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2000</pubdate><volume>28</volume><fpage>73</fpage><lpage>76</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.73</pubid><pubid idtype="pmcid">102421</pubid><pubid idtype="pmpid" link="fulltext">10592185</pubid></pubidlist></xrefbib></bibl><bibl id="B94"><title><p>Bayesian variable selection in linear regression</p></title><aug><au><snm>Mitchell</snm><fnm>TJ</fnm></au><au><snm>Beauchamp</snm><fnm>JJ</fnm></au></aug><source>J Am Stat Assoc</source><pubdate>1988</pubdate><volume>83</volume><fpage>1023</fpage><lpage>1032</lpage><xrefbib><pubid idtype="doi">10.1080/01621459.1988.10478694</pubid></xrefbib></bibl><bibl id="B95"><title><p>Regression with missing X's: a review</p></title><aug><au><snm>Little</snm><fnm>RJA</fnm></au></aug><source>J Am Stat Assoc</source><pubdate>1992</pubdate><volume>87</volume><fpage>1227</fpage><lpage>1237</lpage></bibl><bibl id="B96"><aug><au><snm>Rubin</snm><fnm>DB</fnm></au></aug><source>Multiple Imputation for Nonresponse in Surveys</source><publisher>New York: John Wiley</publisher><pubdate>1987</pubdate></bibl><bibl id="B97"><title><p>How many imputations are really needed? Some practical clarifications of multiple imputation theory</p></title><aug><au><snm>Graham</snm><fnm>JW</fnm></au><au><snm>Olchowski</snm><fnm>AE</fnm></au><au><snm>Gilreath</snm><fnm>TD</fnm></au></aug><source>Prev Sci</source><pubdate>2007</pubdate><volume>8</volume><fpage>206</fpage><lpage>213</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s11121-007-0070-9</pubid><pubid idtype="pmpid" link="fulltext">17549635</pubid></pubidlist></xrefbib></bibl><bibl id="B98"><aug><au><snm>Breslow</snm><fnm>NE</fnm></au><au><snm>Day</snm><fnm>NE</fnm></au><au><snm>Davis</snm><fnm>W</fnm></au></aug><source>Statistical Methods in Cancer Research, Volume I: The Analysis of Case&#8211;control Studies</source><publisher>Lyon: International Agency for Research on Cancer</publisher><pubdate>1980</pubdate><xrefbib><pubid idtype="pmpid">21938797</pubid></xrefbib></bibl><bibl id="B99"><aug><au><snm>Lachin</snm><fnm>JM</fnm></au></aug><source>Biostatistical Methods: The Assessment of Relative Risks</source><publisher>New York, NY: Wiley</publisher><pubdate>2000</pubdate></bibl><bibl id="B100"><title><p>Topological and causal structure of the yeast transcriptional regulatory network</p></title><aug><au><snm>Guelzim</snm><fnm>N</fnm></au><au><snm>Bottani</snm><fnm>S</fnm></au><au><snm>Bourgine</snm><fnm>P</fnm></au><au><snm>K&#233;p&#232;s</snm><fnm>F</fnm></au></aug><source>Nat Genet</source><pubdate>2002</pubdate><volume>31</volume><fpage>60</fpage><lpage>63</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng873</pubid><pubid idtype="pmpid" link="fulltext">11967534</pubid></pubidlist></xrefbib></bibl><bibl id="B101"><title><p>Network motifs in the transcriptional regulation network of Escherichia coli</p></title><aug><au><snm>Shen-Orr</snm><fnm>S</fnm></au><au><snm>Milo</snm><fnm>R</fnm></au><au><snm>Mangan</snm><fnm>S</fnm></au><au><snm>Alon</snm><fnm>U</fnm></au></aug><source>Nat Genet</source><pubdate>2002</pubdate><volume>31</volume><fpage>64</fpage><lpage>68</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng881</pubid><pubid idtype="pmpid" link="fulltext">11967538</pubid></pubidlist></xrefbib></bibl><bibl id="B102"><title><p>Degree dependence in rates of transcription factor evolution explains the unusual structure of transcription networks</p></title><aug><au><snm>Stewart</snm><fnm>AJ</fnm></au><au><snm>Seymour</snm><fnm>RM</fnm></au><au><snm>Pomiankowski</snm><fnm>A</fnm></au></aug><source>Proc R Soc B</source><pubdate>2009</pubdate><volume>276</volume><fpage>2493</fpage><lpage>2501</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1098/rspb.2009.0210</pubid><pubid idtype="pmcid">2690470</pubid><pubid idtype="pmpid" link="fulltext">19364737</pubid></pubidlist></xrefbib></bibl><bibl id="B103"><title><p>MIPS: a database for genomes and protein sequences</p></title><aug><au><snm>Mewes</snm><fnm>HW</fnm></au><au><snm>Frishman</snm><fnm>D</fnm></au><au><snm>G&#252;ldener</snm><fnm>U</fnm></au><au><snm>Mannhaupt</snm><fnm>G</fnm></au><au><snm>Mayer</snm><fnm>K</fnm></au><au><snm>Mokrejs</snm><fnm>M</fnm></au><au><snm>Morgenstern</snm><fnm>B</fnm></au><au><snm>M\&quot;unsterkoetter</snm><fnm>M</fnm></au><au><snm>Rudd</snm><fnm>S</fnm></au><au><snm>Weil</snm><fnm>B</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2002</pubdate><volume>30</volume><fpage>31</fpage><lpage>34</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/30.1.31</pubid><pubid idtype="pmcid">99165</pubid><pubid idtype="pmpid" link="fulltext">11752246</pubid></pubidlist></xrefbib></bibl><bibl id="B104"><title><p>The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003</p></title><aug><au><snm>Boeckmann</snm><fnm>B</fnm></au><au><snm>Bairoch</snm><fnm>A</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Blatter</snm><fnm>M-C</fnm></au><au><snm>Estreicher</snm><fnm>A</fnm></au><au><snm>Gasteiger</snm><fnm>E</fnm></au><au><snm>Martin</snm><fnm>MJ</fnm></au><au><snm>Michoud</snm><fnm>K</fnm></au><au><snm>O&apos;D o</snm><fnm>C</fnm></au><au><snm>Phan</snm><fnm>I</fnm></au><au><snm>Pilbout</snm><fnm>S</fnm></au><au><snm>Schneider</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>365</fpage><lpage>370</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg095</pubid><pubid idtype="pmcid">165542</pubid><pubid idtype="pmpid" link="fulltext">12520024</pubid></pubidlist></xrefbib></bibl></refgrp>
	</bm>
</art>