<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2105-12-359</ui><ji>1471-2105</ji><fm>
<dochead>Methodology article</dochead>
<bibl>
<title>
<p>Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data</p>
</title>
<aug>
<au id="A1"><snm>Gao</snm><fnm>Shouguo</fnm><insr iid="I1"/><insr iid="I2"/><email>sgao@uab.edu</email></au>
<au ca="yes" id="A2"><snm>Wang</snm><fnm>Xujing</fnm><insr iid="I1"/><insr iid="I2"/><email>xujingw@uab.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Department of Physics, University of Alabama at Birmingham, 1300 University Blvd, Birmingham, AL 35294, USA</p></ins>
<ins id="I2"><p>The Comprehensive Diabetes Center, University of Alabama at Birmingham, 1825 University Blvd, Birmingham, AL 35294, USA</p></ins>
</insg>
<source>BMC Bioinformatics</source>
<issn>1471-2105</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>1</issue>
<fpage>359</fpage>
<url>http://www.biomedcentral.com/1471-2105/12/359</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-12-359</pubid><pubid idtype="pmpid">21884587</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>1</day><month>1</month><year>2011</year></date></rec><acc><date><day>31</day><month>8</month><year>2011</year></date></acc><pub><date><day>31</day><month>8</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Gao and Wang; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Bayesian Network (BN) is a powerful approach to reconstructing genetic regulatory networks from gene expression data. However, expression data by itself suffers from high noise and lack of power. Incorporating prior biological knowledge can improve the performance. As each type of prior knowledge on its own may be incomplete or limited by quality issues, integrating multiple sources of prior knowledge to utilize their consensus is desirable.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>We introduce a new method to incorporate the quantitative information from multiple sources of prior knowledge. It first uses the Na&#239;ve Bayesian classifier to assess the likelihood of functional linkage between gene pairs based on prior knowledge. In this study we included cocitation in PubMed and schematic similarity in Gene Ontology annotation. A candidate network edge reservoir is then created in which the copy number of each edge is proportional to the estimated likelihood of linkage between the two corresponding genes. In network simulation the Markov Chain Monte Carlo sampling algorithm is adopted, and samples from this reservoir at each iteration to generate new candidate networks. We evaluated the new algorithm using both simulated and real gene expression data including that from a yeast cell cycle and a mouse pancreas development/growth study. Incorporating prior knowledge led to a ~2 fold increase in the number of known transcription regulations recovered, without significant change in false positive rate. In contrast, without the prior knowledge BN modeling is not always better than a random selection, demonstrating the necessity in network modeling to supplement the gene expression data with additional information.</p>
</sec>
<sec>
<st>
<p>Conclusion</p>
</st>
<p>our new development provides a statistical means to utilize the quantitative information in prior biological knowledge in the BN modeling of gene expression data, which significantly improves the performance.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Reverse engineering of genetic networks will greatly facilitate the dissection of cellular functions at the molecular level <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
</abbrgrp>. The time course gene expression study offers an ideal data source for transcription regulatory network modeling. However, in a typical microarray experiment usually up to tens of thousands of genes are measured in only several dozens or less samples, data from such experiments alone is significantly underpowered, leading to high rate of false positive predictions <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>. Network reconstruction from microarray data is further limited by low data quality, noise and measurement errors <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>.</p>
<p>Incorporating other types of data and existing knowledge of gene relationships into the network modeling process is a practical approach to overcome some of these problems. It has been proven that data integration and useful bias with relevant knowledge can improve the network prediction accuracy from gene expression data <abbrgrp>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
</abbrgrp>. Among the various approaches of network modeling, Bayesian Networks (BN) have shown great promise and are receiving increasing attention <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>. BN is a graphic probabilistic model that describes multiple interacting quantities by a directed acyclic graph (DAG). The nodes in the network represent random variables (expression levels), and edges represent conditional dependencies between nodes <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>. Learning a BN structure is to find a DAG that best matches the dataset, namely maximizing the posterior probability of DAG given data D: <it>P </it>(DAG|D). The sound probabilistic schematics allow BN to deal with the inherent stochasticity in gene expressions and the noise brought in by the microarray technology. Furthermore, BN is capable of integrating prior knowledge into the system in a natural way <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>.</p>
<p>A number of studies demonstrated that adding prior knowledge to BN improved the performance <abbrgrp>
<abbr bid="B4">4</abbr>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
</abbrgrp>. Many sources of data and information are useful to supplement the gene expression data, and they can be incorporated at different steps of BN simulation, from prior structure definition to structure simulation and evaluation.</p>
<p>Known protein-DNA interaction or other clues of the relationships between transcription factors and their target genes are useful to transcription regulatory network inference. Hartemink <it>et al</it>. included data from the chromatin immunoprecipitation (ChIP) assay <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>, and Tamada <it>et al </it>incorporated promoter sequence motif information <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>, to define the prior probability of network structures. Information of other types of gene pair relationship has also been explored. Steele <it>et </it>
<it>al</it>. developed a gene-pair association score from the correlation of their concept profiles derived from literature, and utilized that to define the prior structure probabilities <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>. Larsen <it>et al </it>defined a Likelihood of Interaction (LOI) score, which measures the statistical significance of two genes interacting with each other according to their shared Gene Ontology (GO) information. They then restricted the candidate network edges (interactions) to those with significant <it>p </it>-values of LOI during the BN structure learning iterations <abbrgrp>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp>. By doing so, the quantitative information of the likelihood is not fully utilized in the network modeling. Djebbari and Quackenbush utilized literature, high-throughput protein-protein interaction (PPI) data, or the combination of both to define the seed (initial) network structure. They observed an improved ability of the BN analysis to learn gene interaction networks from the expression data <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>.</p>
<p>Imoto <it>et al </it>formulated an novel approach to incorporate prior biological knowledge within the BN framework by adopting the energy concepts from statistical physics <abbrgrp>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
</abbrgrp>, which was later further extended by Husmeier and Werhli <abbrgrp>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>. In this approach an energy function was first defined to measure the agreement between a candidate network and the prior biological knowledge, and prior distribution of network structure is hence calculated using the Gibbs distribution in a canonical ensemble. Using this approach, the two groups examined several types of prior knowledge, including PPI, protein-DNA interaction, binding site information, literature, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways <abbrgrp>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
<abbr bid="B24">24</abbr>
</abbrgrp>. The algorithms were validated using yeast gene expression data <abbrgrp>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
</abbrgrp>, and synthetic data <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>.</p>
<p>Existing studies often utilize prior knowledge to construct the prior distribution of network, or initial network structure. It has been demonstrated that the sampling method during simulation also affects the performance of BN structure learning <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. Though prior knowledge has been utilized to bias the sampling step, it is normally done through restricting the search space to sub regions, for instance, only simulate candidate structures whose significance is above a certain threshold according to prior knowledge <abbrgrp>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp>.</p>
<p>In searching for the network structure (DAG) that maximize <it>P </it>(DAG|D), the Markov Chain Monte Carlo (MCMC) approach is regarded better than greedy searching algorithms, especially for the microarray data with small sample size where there is often no single structure that is prominently better than others <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>. In this study we propose a new approach to incorporate prior knowledge in a quantitative way to bias the MCMC simulation of candidate structure. It utilizes information of functional linkage between gene pairs, assuming that functionally linked genes are likely to interact with each other. It is known that interacting proteins or genes often share similar function, and participate in the same biological pathways and processes <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>. Interaction has been utilized to infer functional linkage and annotate gene functions <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>. Increasing evidence suggests that the reverse is also frequently true <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. In our algorithm a probability score is first calculated that measures how likely two genes are functionally linked based on prior knowledge; A candidate edge reservoir is then constructed where the number of copies of each edge is proportional to this probability score; The reservoir is in turn used for sampling candidate network structure during the MCMC simulation. This way the quantitative information of the potential gene pair link predicted by prior knowledge is retained.</p>
<p>We will consider two type of prior knowledge: co-citation in PubMed literature and similarity in ontological annotation according GO <url>http://www.geneontology.org/</url>. We will demonstrate they both contain information of functional linkage. The performance of the new algorithm is evaluated using a synthetic data set as well as data from two real microarray experiments: the yeast cell cycle study, and the mouse pancreas development/growth study. We will demonstrate that including the prior knowledge significantly improves the performance of BN modeling of gene expression data.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>Algorithm</p>
</st>
<p>BN is a graphical model to capture complex relationships among a set of random variables {<it>X</it>
<sub>1</sub>, <it>X</it>
<sub>2</sub>,...,<it>X<sub>n</sub>
</it>} encoding the Markov assumption, each node representing a variable. In the context of gene network modeling, each node represents a gene, while gene interactions are represented by directed edges between nodes. Each variable <it>X<sub>i </sub>
</it>in the DAG is conditionally independent of its non-descendants given its set of parents. Mathematically the joint distribution of the DAG can be decomposed into a product form as:</p>
<p>
<display-formula id="M1">
<m:math name="1471-2105-12-359-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>P</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">DAG</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mi>P</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>X</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>X</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mo class="MathClass-op">&#8230;</m:mo>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>X</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>n</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:munderover accentunder="false" accent="false">
      <m:mrow>
         <m:mo mathsize="big"> &#8719;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>n</m:mi>
      </m:mrow>
   </m:munderover>
   <m:mi>P</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>X</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-bin">&#8725;</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#928;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>where &#928;<it>
<sub>i </sub>
</it>denotes the parent set of the variable <it>X<sub>i</sub>
</it>. This is referred as the chain rule for BNs <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>. Learning a BN structure is to find a DAG that best matches the dataset, namely maximizing the posterior probability of DAG given data <it>P </it>(DAG|D). Here we adopt the sampling-based approach to Bayesian inference, and sample network structures from a candidate edge reservoir with the MCMC network learning method. In the reservoir the edge representation is proportional to the likelihood of the two genes being functionally linked according to prior knowledge. This way, the edges between the strongly-related gene pairs have higher chance to be proposed as part of the candidate network. The overall design is given in Figure <figr fid="F1">1</figr>. The major steps included:</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>The framework of our BN modeling to incorporate the quantitative information in prior knowledge</p></caption><text>
   <p><b>The framework of our BN modeling to incorporate the quantitative information in prior knowledge</b>.</p>
</text><graphic file="1471-2105-12-359-1" hint_layout="single"/></fig>
<p indent="1">1. Determine the probability of functional link <it>p</it>
<sub>link </sub>between each gene pairs</p>
<p indent="2">1.1 Calculate GO schematic similarity</p>
<p indent="2">1.2 Calculate p value of PubMed co-citation.</p>
<p indent="2">1.3 Integrate GO and PubMed information using the Na&#239;ve BN to determine <it>p</it>
<sub>link</sub>.</p>
<p indent="1">2. Construct candidate network edge reservoir in which copy number of each edge is proportional to the <it>p</it>
<sub>link </sub>of the corresponding gene pair.</p>
<p indent="1">3. Learn network structure using the MCMC algorithm through sampling the candidate network edge reservoir.</p>
<p>At each step of the iteration, the proposed network is retained with an acceptance probability that is determined by the relative posterior of the proposed versus current network, penalized by the network complexity <abbrgrp>
<abbr bid="B29">29</abbr>
<abbr bid="B30">30</abbr>
</abbrgrp>. In calculating the posterior we use the BDe (Bayesian Dirichlet equivalence) scoring metric <abbrgrp>
<abbr bid="B10">10</abbr>
<abbr bid="B31">31</abbr>
</abbrgrp>. The prior distribution is assumed to be uniform.</p>
<p>To evaluate the performance of our BN algorithm, and the benefit of adding prior knowledge, we compare it to two alternative approaches: (1) Plain BN. In each iteration, a new network is proposed by randomly changing one edge in the current network. (2) The method developed by Husmeier and Werhli <abbrgrp>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>GO schematic similarity and significance of PubMed co-citation</p>
</st>
<p>GO annotation and gene citation database (PubMed) were downloaded from <url>ftp://ftp.ncbi.nlm.nih.gov/gene/DATA</url>. Schematic similarity in GO taxonomy was first calculated for each gene pair using the approach proposed by Cao <it>et al </it>
<abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp>, which calculates the shared information content of the GO terms. The value of this measure ranges between [0 1], with 0 being no similarity, and 1 being maximum similarity. The GO similarity between each gene pair is defined to be the maximum schematic similarity of all the GO terms they share.</p>
<p>For a given pair of genes, the total number of PubMed abstracts in which each gene appears (<it>n </it>and <it>m</it>, respectively), and in which both appear (<it>k</it>) were determined. The probability of co-citation frequency observed by random chance is calculated by</p>
<p>
<display-formula id="M2">
<m:math name="1471-2105-12-359-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">PubMed</m:mtext>
         </m:mstyle>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>#</m:mi>
         <m:mspace width="2.77695pt" class="tmspace"/>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">of</m:mtext>
         </m:mstyle>
         <m:mspace width="2.77695pt" class="tmspace"/>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">co&#160;-&#160;citation</m:mtext>
         </m:mstyle>
         <m:mo class="MathClass-rel">&#8805;</m:mo>
         <m:mi>k</m:mi>
         <m:mo class="MathClass-rel">|</m:mo>
         <m:mi>n</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>m</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>N</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mn>1</m:mn>
   <m:mo class="MathClass-bin">-</m:mo>
   <m:munderover accentunder="false" accent="false">
      <m:mrow>
         <m:mo mathsize="big">&#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mi>k</m:mi>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:munderover>
   <m:mi>p</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-rel">|</m:mo>
         <m:mi>n</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>m</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>N</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>where <inline-formula>
<m:math name="1471-2105-12-359-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>p</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-rel">|</m:mo>
         <m:mi>n</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>m</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>N</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mi>n</m:mi>
         <m:mo class="MathClass-punc">!</m:mo>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>N</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mi>n</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo class="MathClass-punc">!</m:mo>
         <m:mi>m</m:mi>
         <m:mo class="MathClass-punc">!</m:mo>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>N</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mi>m</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo class="MathClass-punc">!</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>n</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo class="MathClass-punc">!</m:mo>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-punc">!</m:mo>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>m</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo class="MathClass-punc">!</m:mo>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>N</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mi>n</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mi>m</m:mi>
               <m:mo class="MathClass-bin">+</m:mo>
               <m:mi>i</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo class="MathClass-punc">!</m:mo>
         <m:mi>N</m:mi>
         <m:mo class="MathClass-punc">!</m:mo>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math>
</inline-formula>, and N is the total number of abstracts in PubMed <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Construction of the candidate network edge reservoir</p>
</st>
<p>We used the Na&#239;ve Bayesian network to integrate the GO and co-citation information, and a simple Bayesian na&#239;ve classifier to predict the functional linkage probability <it>p<sub>link </sub>
</it>for all gene pairs. Note that the prior knowledge of functional linkage is undirected, <it>i.e. p<sub>link </sub>
</it>(<it>i</it>, <it>j</it>) <it>= p<sub>link </sub>
</it>(<it>j</it>, <it>i</it>). An edge sampling reservoir was constructed, in which the number of replicates for the edge between gene <it>i </it>and <it>j </it>
<it>N </it>(Edge<it>
<sub>i</sub>
</it>
<sub>,</sub>
<it>
<sub>j</sub>
</it>) is in proportion to their <it>p</it>
<sub>link </sub>:</p>
<p>
<display-formula id="M3">
<m:math name="1471-2105-12-359-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>N</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">Edg</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mi>e</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mo class="MathClass-punc">,</m:mo>
               <m:mi>j</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">Ceil</m:mtext>
   </m:mstyle>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mn>0</m:mn>
         <m:mo class="MathClass-bin">&#215;</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>p</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">link</m:mtext>
               </m:mstyle>
            </m:mrow>
         </m:msub>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mo class="MathClass-punc">,</m:mo>
               <m:mi>j</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>where Ceil(<it>x</it>) is the smallest integer no less than <it>x</it>. In this definition, any gene pair will be represented at least once and at most 10 times. The edges of gene pairs with higher <it>p</it>
<sub>link </sub>will appear more frequently in the edge reservoir, and hence enjoy a higher chance to be selected during the network structure learning.</p>
</sec>
<sec>
<st>
<p>Implementation</p>
</st>
<p>Our BN simulation algorithm is implemented in Matlab utilizing Kevin Murphy's BNT package <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>bnt.googlecode.com, and is summarized in Table <tblr tid="T1">1</tblr>. Note that steps 1 and 3.1 contain unique features that separate our approach from others. The source code is available upon request. The networks were visualized with Cytoscape <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Implemention of the new BN structure learning algorithm</p></caption><tblbdy cols="1">
      <r>
         <c ca="left">
            <p>
               <b>Input:</b>
            </p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>n: number of nodes in the network.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>D: discretized expression data matrix.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>BurnIn: number of steps to take before drawing sample networks for evaluation. Default value: 50 times the size of the sampling reservoir.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>n_iteration: number of iterations. Default value: 80 times the size of the sampling reservoir.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>&#916;_samples: interval of sample networks being collected from the chain after burn-in. Default</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>value: 1000.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>maxFanIn: maximum number of parents of a node.</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Output:</b>
            </p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>A set of DAGs after reaching the max iteration step.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>An average DAG in the form of a matrix.</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>Steps</b>
            </p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>1. Create a sampling edge reservoir based on <it>p<sub>link</sub></it>.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>2. Set all elements of the adjacency matrix for the initial DAG to 0.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>3. for loop_index = 1: n_iteration do</p>
         </c>
      </r>
      <r>
         <c indent="2" ca="left">
            <p>(1) randomly select a element edge(i,j) from the edge sampling reservoir, corresponding to gene pair (i,j).</p>
         </c>
      </r>
      <r>
         <c indent="2" ca="left">
            <p>(2) if edge(i,j) exists in the current DAG, delete the edge; else if edge(j,i) exists in the current DAG, reverse edge(j,i) to edge(j,i); else add edge(i,j). We name these operations as "delete", "reverse" and "add", respectively.</p>
         </c>
      </r>
      <r>
         <c indent="2" ca="left">
            <p>(3) check whether the newly proposed DAG remains acyclic and satisfy the maxFanIn rules to nodes (i,j). If not, keep the current DAG and give up proposed DAG, go to (1).</p>
         </c>
      </r>
      <r>
         <c indent="2" ca="left">
            <p>(4) calculate log value of the marginal likelihood (LL)* of the expression data D of node j and its parents given the current DAG (LL_old) or the proposed DAG (LL_new) and define bf1 = exp(LL_new - LL_old).</p>
         </c>
      </r>
      <r>
         <c indent="2" ca="left">
            <p>(5) if the operation is "delete" or "add", bf2 = 1; if the operation is "reverse", calculate bf2 for node i in same way as for node j in (4).</p>
         </c>
      </r>
      <r>
         <c indent="2" ca="left">
            <p>(6) calculate the prior probability* of current DAG (prior_old) and propose DAG (prior_new); calculate the Metropolis-Hastings ratio (<it>R</it><sub>HM</sub>) of the two DAGs; generate a random number u between 0 to 1, if bf1*bf2*prior_new/prior_old&lt;u*<it>R</it><sub>HM</sub>, keep the current DAG and give up proposed DAG, go to (1).</p>
         </c>
      </r>
      <r>
         <c indent="2" ca="left">
            <p>(7) when loop_index>BurnIn and (loop_index-BurnIn) is exactly divisible by <b>&#916;</b>_samples, record the proposed DAG and its posterior probability.</p>
         </c>
      </r>
      <r>
         <c indent="1" ca="left">
            <p>4. End of loop, calculate the average DAG in the form of a matrix, where the elements are given by the averaged edges of all recorded DAGs weighted by their posterior probabilities.</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>*Details of the definition of marginal likelihood, and how to calculate LL, prior probability of DAG, can be found in <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B31">31</abbr></abbrgrp>.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Validation</p>
</st>
<sec>
<st>
<p>Utility of GO similarity and PubMed co-citation in discovering functional linkage between gene pairs</p>
</st>
<p>Lee <it>et al </it>developed an approach to evaluate if gene-pair functional relationships can be predicted by a certain type of high-throughput genomic data (gene expression, PPI, ChIP-chip, etc) <abbrgrp>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>. Assuming that <it>p</it>(<it>L</it>|<it>D</it>) and (~ <it>L</it>|<it>D</it>) denote the probabilities of gene pairs to share or not share functional annotation given that they are linked by data <it>D </it>(for instance, co-expressed, sharing PPI, protein of one gene binds to the promoter of the other, etc), and <it>p</it>(<it>L</it>) and p(~ <it>L</it>) represent the prior probabilities of sharing and not sharing functional annotation, they proposed a log likelihood score <abbrgrp>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>:</p>
<p>
<display-formula id="M4">
<m:math name="1471-2105-12-359-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>L</m:mi>
   <m:mi>L</m:mi>
   <m:mi>S</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">ln</m:mtext>
   </m:mstyle>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:mfrac>
            <m:mrow>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>L</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>D</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mo class="MathClass-bin">&#8725;</m:mo>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mo class="MathClass-rel">~</m:mo>
                     <m:mi>L</m:mi>
                     <m:mo class="MathClass-bin">&#8725;</m:mo>
                     <m:mi>D</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>L</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mo class="MathClass-bin">&#8725;</m:mo>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mo class="MathClass-rel">~</m:mo>
                     <m:mi>L</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>to describe the utility of data <it>D </it>in functional linkage inference. An LLS close to 0 suggest that the data is not more informative than random pairing, whilst higher positive values of LLS indicates that data <it>D </it>contains more information of functional linkage.</p>
<p>We adopted equation (4) to evaluate whether GO schematic similarity and PubMed co-citation were useful in identifying functional linkage. The KEGG <url>http://www.genome.ad.jp/KEGG</url> and Munich Information Center for Protein Sequences (MIPS, mips.gsf.de/) database <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
</abbrgrp> were used to construct the benchmarks of functional linkage. These databases were chosen for their high quality <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp>. In this study we utilized yeast and mouse gene expression data to validate our algorithm. For each species, the positive control set consists of randomly sampled 5% (43,761 for yeast, and 35,424 for mouse) of all gene pairs that are in the same KEGG pathways <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. The choice of 5% rather than all is to lower the computational complexity. The negative control set was constructed with gene pairs that encode proteins localized in different cellular compartments, with the underlying assumption that they are functionally unrelated and do not interact with each other. Four categories in the MIPS annotation <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp> were utilized: 70.03 cytoplasm, 70.10 nucleus, 70.16 mitochondrion, and 70.27 extracellular/secretion proteins.</p>
<p>Again we only kept 5% of all possible gene pairs, totaling 112,693 for yeast and 531,089 for mouse, respectively. The same benchmark sets were also utilized to train the Na&#239;ve Bayesian classifier when calculating <it>p</it>
<sub>link</sub>.</p>
<p>The LLS of co-citation in discovering functional linkage is then determined by:</p>
<p>
<display-formula id="M5">
<m:math name="1471-2105-12-359-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>L</m:mi>
   <m:mi>L</m:mi>
   <m:msub>
      <m:mrow>
         <m:mi>S</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">PubMed</m:mtext>
         </m:mstyle>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">ln</m:mtext>
   </m:mstyle>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:mfrac>
            <m:mrow>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>L</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:msub>
                        <m:mrow>
                           <m:mi>p</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mstyle class="text">
                              <m:mtext class="textsf" mathvariant="sans-serif">pubMed</m:mtext>
                           </m:mstyle>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mo class="MathClass-bin">&#8725;</m:mo>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mo class="MathClass-rel">~</m:mo>
                     <m:mi>L</m:mi>
                     <m:mo class="MathClass-bin">&#8725;</m:mo>
                     <m:msub>
                        <m:mrow>
                           <m:mi>p</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mstyle class="text">
                              <m:mtext class="textsf" mathvariant="sans-serif">pubMed</m:mtext>
                           </m:mstyle>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>L</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mo class="MathClass-bin">&#8725;</m:mo>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mo class="MathClass-rel">~</m:mo>
                     <m:mi>L</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
   </m:mfenced>
</m:mrow>
</m:math>
</display-formula>
</p>
<p>The LLS of GO schematic similarity was performed in the similar fashion. The LLS value for gene pair sets in different ranges of GO similarity and co-citation p-value were given in Table <tblr tid="T2">2</tblr>. Gene pairs sets with higher GO similarity or PubMed co-citation significance, have more positive LLS values, and vice versa. Note that gene pairs with negative LLS means they are less likely to be functionally linked than random pairs, which is expected if they share low GO similarity or co-citation. The results suggest that PubMed Co-citation and GO similarity are efficient at discriminating functionally linked gene pairs from not linked ones.</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>GO and PubMed citation contain information of functional linkage</p></caption><tblbdy cols="6">
      <r>
         <c ca="center">
            <p>
               <b>interval</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>GO similarity LLS, yeast</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>LLS, mouse</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>interval</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>-log<sub>10</sub>(p<sub>PubMed</sub>)LLS, yeast</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>LLS, mouse</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="6">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>[1, 1]</p>
         </c>
         <c ca="center">
            <p>1.51</p>
         </c>
         <c ca="center">
            <p>1.62</p>
         </c>
         <c ca="center">
            <p>(4 &#8734;)</p>
         </c>
         <c ca="center">
            <p>0.25</p>
         </c>
         <c ca="center">
            <p>0.37</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>[0.2, 1)</p>
         </c>
         <c ca="center">
            <p>-0.71</p>
         </c>
         <c ca="center">
            <p>-0.99</p>
         </c>
         <c ca="center">
            <p>(3 4]</p>
         </c>
         <c ca="center">
            <p>0.13</p>
         </c>
         <c ca="center">
            <p>0.14</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>[0, 0.2)</p>
         </c>
         <c ca="center">
            <p>-1.61</p>
         </c>
         <c ca="center">
            <p>-2.2</p>
         </c>
         <c ca="center">
            <p>(1 3]</p>
         </c>
         <c ca="center">
            <p>0.07</p>
         </c>
         <c ca="center">
            <p>0.19</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>[0 1]</p>
         </c>
         <c ca="center">
            <p>-3.4</p>
         </c>
         <c ca="center">
            <p>-3.6</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Log Likelihood Scores of functional linkage in yeast and mouse, for gene pair in different value interval of GO similarity and PubMed co-citation significance. Gene pairs with higher GO similarity or significance of co-citation are more likely to be functionally linked.</p>
   </tblfn></tbl>
<p>We found that there is a marginal dependence between the GO similarity and PubMed co-citation (Fisher's Z test, p~0.1). Theoretically na&#239;ve Bayesian classifier is optimal when the attributes are independent given class. However, empirical studies have shown that the classifier still performs well in many domains when there is moderate attribute dependences <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp>. The weak dependence between them indicates that the na&#239;ve Bayesian Network is an appropriate choice to integrate their information <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>. Interestingly, the GO and MIPS categories, which are both functional annotations, also only depend weakly on each other. This may be because the present annotations are far from being perfect and complete <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>.</p>
</sec>
</sec>
<sec>
<st>
<p>Utility of functional linkage information to interaction network modeling</p>
</st>
<p>The distribution of <it>p<sub>link </sub>
</it>for yeast gene pairs is given in Figure <figr fid="F2">2</figr>. Note that only a small proportion of gene pairs share high values of <it>p<sub>link</sub>
</it>, for about 92% of the gene pairs this value is less than 0.2. This indicates that most gene pairs share no functional linkage, consistent with the fact that gene networks are usually sparse. The candidate edge reservoir is constructed according to equation 3, and the MCMC samples this distribution to propose new candidate network structure at each iteration. In Figure <figr fid="F2">2</figr> we have also included the distribution for gene pairs predicted to be interacting to each other, with and without the prior knowledge. Among all possible gene pairs, only ~8% with <it>p<sub>link </sub>
</it>0.6. In contrast, this proportion increases to 28% among the predicted interactions. It indicates that the prior knowledge did affect the outcome of the BN learning. The results from the other data sets are similar.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Distribution of functional linkage probability for all possible gene pairs, and for predicted interactions with and without prior knowledge</p></caption><text>
   <p><b>Distribution of functional linkage probability for all possible gene pairs, and for predicted interactions with and without prior knowledge</b>.</p>
</text><graphic file="1471-2105-12-359-2" hint_layout="single"/></fig>
<p>The assumption of incorporating prior knowledge of functional linkage is that they can help network modeling. Existing data from yeast revealed that genes sharing the same GO attribute interact genetically more often than expected by chance (p &lt; 0.05) <abbrgrp>
<abbr bid="B43">43</abbr>
<abbr bid="B44">44</abbr>
</abbrgrp>. In a very conservative estimate, over ~12% of the genetic interactions are comprised of genes with identical GO annotation (a 12 fold enhancement over what expected by chance, p &lt; 10<sup>-12</sup>); and over 27% are between genes with similar or identical GO annotations (an 8 fold enhancement, <it>p </it>&lt; 10<sup>-10</sup>).</p>
<p>We examined whether <it>p<sub>link </sub>
</it>can potentially discriminate interacting gene pairs from non-interacting ones, using the receiver operating characteristic (ROC) curve. ROC is a graphical plot of the sensitivity versus (1-specificity), namely the fraction of true positives versus the fraction of false positives, as the discrimination threshold of a classifier is varied. The area under curve (AUC) reflects the performance. The ROC of a random classifier would be a 45&#176; line with AUC = 0.5. Figure <figr fid="F3">3</figr> presents the ROC plot for the nine yeast cell cycle regulating transcription factors (TF): Fkh1, Fkh2, Ndd1, Mcm1, Ace2, Swi5, Mbp1, Swi4, and Swi6, and their targets identified using the ChIP-chip technology <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>. The AUC of 0.6064 indicating that is positively correlated with interaction and therefore useful in interaction inference.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>ROC curve indicating that functional linkage contains information for interaction</p></caption><text>
   <p><b>ROC curve indicating that functional linkage contains information for interaction</b>. Plotted is the performance of <it>p<sub>link </sub></it>as a classifier to identify yeast TF-target pairs defined by ChIP-chip.</p>
</text><graphic file="1471-2105-12-359-3" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>Convergence of simulation</p>
</st>
<p>In Figure <figr fid="F4">4A</figr> we plot the acceptance ratio versus number of MCMC steps in the yeast cell cycle dataset. Obviously in the later steps the probability to accept the new proposed DAG is small and flattens. The results from the other datasets are similar. In addition, the MCMC simulation was repeated 20 times with independent initializations, and consistency in the marginal posterior probabilities was examined. We found that they correlated well between different runs: 0.83 &#177; 0.11 for the simulated dataset, 0.68 &#177; 0.10 for the yeast data set, and 0.51 &#177; 0.26 for the mouse pancreas dataset. Figure <figr fid="F4">4B</figr> presents the scatter plot of the edge posterior probability from two typical runs that simulate the yeast dataset.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Convergence of simulation</p></caption><text>
   <p><b>Convergence of simulation</b>. (A) Acceptance ratio versus the number of MCMC steps (B) scatter plot of the marginal posterior probabilities of the edges, obtained from two separate MCMC simulations of the yeast cell cycle data.</p>
</text><graphic file="1471-2105-12-359-4" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>Validation using simulated data</p>
</st>
<p>In our network inference, the MCMC learning simulation is repeated 20 times with independent initializations and an interaction will be considered in the final network if it is observed more than 15 times. Our new BN algorithm was first tested in a simulated time course (50 time points) gene expression dataset of an artificial network generated using SynTReN <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>. This network contains 76 genes, of which 24 act as regulators with a total of 124 regulatory relationships (<it>i.e</it>. 124 edges). The results are summarized in Table <tblr tid="T3">3</tblr>, 2n<sup>d </sup>column. It demonstrates that incorporating the functional linkage as prior knowledge allows the identification of a significantly higher number, 21 versus 14, of the true gene-gene relationships compared with the plain BN modeling of gene expression data only. A random network of the same number of edges was also created for the 76 genes <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp>. The improvement of BN with prior knowledge over random is significant (p &lt; 0.01, Table <tblr tid="T3">3</tblr>), while without prior knowledge it is not (p~0.2, Table <tblr tid="T3">3</tblr>).</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>The improvement in network modeling with the addition of prior knowledge</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p>
               <b>Data set</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Simulated data</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Yeast cell cycle study, benchmark from BIND</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Yeast cell cycle study, benchmark from ChIP-chip</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Mouse pancreas study</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of genes</p>
         </c>
         <c ca="left">
            <p>76</p>
         </c>
         <c ca="left">
            <p>107</p>
         </c>
         <c ca="left">
            <p>107</p>
         </c>
         <c ca="left">
            <p>36</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of established regulations</p>
         </c>
         <c ca="left">
            <p>124</p>
         </c>
         <c ca="left">
            <p>114</p>
         </c>
         <c ca="left">
            <p>190</p>
         </c>
         <c ca="left">
            <p>24</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of possible regulations</p>
         </c>
         <c ca="left">
            <p>76*75 = 5700</p>
         </c>
         <c ca="left">
            <p>107*106/2 = 5671*</p>
         </c>
         <c ca="left">
            <p>9*106 = 954</p>
         </c>
         <c ca="left">
            <p>36*35 = 1260</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Number of known regulations recovered with (without) prior knowledge</p>
         </c>
         <c ca="left">
            <p>21 (14)</p>
         </c>
         <c ca="left">
            <p>26 (13)</p>
         </c>
         <c ca="left">
            <p>23 (11)</p>
         </c>
         <c ca="left">
            <p>12 (6)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total number of regulations predicted, with (without) prior knowledge</p>
         </c>
         <c ca="left">
            <p>503 (440)</p>
         </c>
         <c ca="left">
            <p>436 (387)</p>
         </c>
         <c ca="left">
            <p>58 (33)</p>
         </c>
         <c ca="left">
            <p>322 (297)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Improvement over plain BN</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 0.36,</p>
            <p>p~0.54</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 2.28, p &lt; 0.13</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 0.04, p~0.84</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 0.98,</p>
            <p>p~0.32</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Improvement: over random selection</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 7.32,</p>
            <p>p &lt; 0.01</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 24.5, p &lt; 0.001</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2</sup>= 6.71, p &lt; 0.01</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 2.87,</p>
            <p>p &lt; 0.09</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Plain BN over random</p>
            <p>selection</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 1.58,</p>
            <p>p~0.2</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 2.42, p~0.11</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 1.6, p~0.2</p>
         </c>
         <c ca="left">
            <p>&#967; <sup>2 </sup>= 0.01, p~0.8</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>* We ignored edge direction with comparing to BIND since it contains both directed and undirected interactions.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Validation using the yeast cell cycle data</p>
</st>
<p>Next the new algorithm was applied to one of the Stanford yeast cell cycle data <url>http://genome-www.stanford.edu/cellcycle/</url>, where the cells from a cdc15 temperature sensitive mutant were studied <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>. To evaluate the performance, we compared the predicted interactions from our algorithm to the annotated interactions in BIND <url>http://bind.ca</url>
<abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>, and the transcription regulation predicted by the ChIP-chip data <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>. Tables <tblr tid="T4">4</tblr>, <tblr tid="T5">5</tblr> list the benchmark interactions for the 107 yeast cell cycle genes that were recovered by the BN modeling. The statistical results are summarized in Table <tblr tid="T3">3</tblr>, columns 3-4.</p>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Predicted yeast gene regulatory relationships that are annotated in BIND</p></caption><tblbdy cols="4">
      <r>
         <c cspan="4" ca="center">
            <p>
               <b>BN with prior knowledge</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>HTA1</b>&#8594;<b>HHT1</b></p>
         </c>
         <c ca="left">
            <p><b>FUS1</b>&#8594;<b>FAR1</b></p>
         </c>
         <c ca="left">
            <p><b>FKH2</b>&#8594;<b>CLB2</b></p>
         </c>
         <c ca="left">
            <p><b>GAS1</b>&#8594;<b>SWI4</b></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>SWI5</b>&#8594;<b>FKH1</b></p>
         </c>
         <c ca="left">
            <p><b>DPB3</b>&#8594;<b>CDC45</b></p>
         </c>
         <c ca="left">
            <p><b>DPB2</b>&#8594;<b>DPB3</b></p>
         </c>
         <c ca="left">
            <p>CLN2&#8594;CLN3</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ASF1&#8594;HHF1</p>
         </c>
         <c ca="left">
            <p>GAS1&#8594;KRE6</p>
         </c>
         <c ca="left">
            <p>CLN3&#8594;CLB6</p>
         </c>
         <c ca="left">
            <p>CDC14&#8594;SIC1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SWI4&#8594;MBP1</p>
         </c>
         <c ca="left">
            <p>MSH6&#8594;POL30</p>
         </c>
         <c ca="left">
            <p>CLB6&#8594;CLN1</p>
         </c>
         <c ca="left">
            <p>SWI4&#8594;CHS3</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>KAR3&#8594;NUM1</p>
         </c>
         <c ca="left">
            <p>HHF1&#8594;HHT1</p>
         </c>
         <c ca="left">
            <p>MOB1&#8594;DBF2</p>
         </c>
         <c ca="left">
            <p>RFA1&#8594;RFA3</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CLB1&#8594;CLB3</p>
         </c>
         <c ca="left">
            <p>CLN1&#8594;CLN3</p>
         </c>
         <c ca="left">
            <p>CDC45&#8594;CDC6</p>
         </c>
         <c ca="left">
            <p>CLB1&#8594;CLB5</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>HHF1&#8594;HTB2</p>
         </c>
         <c ca="left">
            <p>HPR5&#8594;RAD54</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="4" ca="center">
            <p>
               <b>BN Without prior knowledge</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>HTA1</b>&#8594;<b>HHT1</b></p>
         </c>
         <c ca="left">
            <p><b>FUS1</b>&#8594;<b>FAR1</b></p>
         </c>
         <c ca="left">
            <p><b>FKH2</b>&#8594;<b>CLB2</b></p>
         </c>
         <c ca="left">
            <p><b>GAS1</b>&#8594;<b>SWI4</b></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>SWI5</b>&#8594;<b>FKH1</b></p>
         </c>
         <c ca="left">
            <p><b>DPB3</b>&#8594;<b>CDC45</b></p>
         </c>
         <c ca="left">
            <p><b>DPB2</b>&#8594;<b>DPB3</b></p>
         </c>
         <c ca="left">
            <p>CLN3&#8594;CLN2</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>DBF4&#8594;CDC5</p>
         </c>
         <c ca="left">
            <p>CDC8&#8594;CIK1</p>
         </c>
         <c ca="left">
            <p>CDC6&#8594;CDC45</p>
         </c>
         <c ca="left">
            <p>CLB3&#8594;CDC6</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>SIC1</b>&#8594;<b>CDC14</b></p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Relationships in bold font are predicted both with and without prior knowledge.</p>
   </tblfn></tbl>
<tbl id="T5"><title><p>Table 5</p></title><caption><p>Predicted yeast gene regulatory relationships that are confirmed by ChIP-chip</p></caption><tblbdy cols="4">
      <r>
         <c cspan="4" ca="center">
            <p>
               <b>BN with prior knowledge</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>FKH2</b>&#8594;<b>HHF1</b></p>
         </c>
         <c ca="left">
            <p><b>FKH2</b>&#8594;<b>CLB2</b></p>
         </c>
         <c ca="left">
            <p><b>SWI6</b>&#8594;<b>CLN1</b></p>
         </c>
         <c ca="left">
            <p><b>FKH2</b>&#8594;<b>HHT1</b></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>SWI5</b>&#8594;<b>FKH1</b></p>
         </c>
         <c ca="left">
            <p><b>SWI6</b>&#8594;<b>HO</b></p>
         </c>
         <c ca="left">
            <p><b>SWI6</b>&#8594;<b>POL30</b></p>
         </c>
         <c ca="left">
            <p><b>SWI4</b>&#8594;<b>MFA2</b></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>FKH1&#8594;SWE1</p>
         </c>
         <c ca="left">
            <p>FKH2&#8594;CDC6</p>
         </c>
         <c ca="left">
            <p>FKH2&#8594;SWI4</p>
         </c>
         <c ca="left">
            <p>SWI4&#8594;PSA1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SWI6&#8594;HHT1</p>
         </c>
         <c ca="left">
            <p>SWI5&#8594;ASH1</p>
         </c>
         <c ca="left">
            <p>SWI6&#8594;CLN2</p>
         </c>
         <c ca="left">
            <p>FKH2&#8594;SWE1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>FKH2&#8594;HPR5</p>
         </c>
         <c ca="left">
            <p>SWI6&#8594;RAD54</p>
         </c>
         <c ca="left">
            <p>FKH1&#8594;RAD51</p>
         </c>
         <c ca="left">
            <p>SWI6&#8594;HHF1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>SWI6&#8594;AGA1</p>
         </c>
         <c ca="left">
            <p>SWI4&#8594;AGA1</p>
         </c>
         <c ca="left">
            <p>SWI4&#8594;MBP1</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="4" ca="center">
            <p>
               <b>BN without prior knowledge</b>
            </p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>SWI6</b>&#8594;<b>POL30</b></p>
         </c>
         <c ca="left">
            <p><b>SWI6</b>&#8594;<b>CLN1</b></p>
         </c>
         <c ca="left">
            <p><b>FKH2</b>&#8594;<b>HHT1</b></p>
         </c>
         <c ca="left">
            <p><b>SWI5</b>&#8594;<b>FKH1</b></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>FKH2</b>&#8594;<b>HHF1</b></p>
         </c>
         <c ca="left">
            <p><b>SWI4</b>&#8594;<b>MFA2</b></p>
         </c>
         <c ca="left">
            <p><b>SWI6</b>&#8594;<b>HO</b></p>
         </c>
         <c ca="left">
            <p><b>FKH2</b>&#8594;<b>CLB2</b></p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><b>SWI4</b>&#8594;<b>TIR1</b></p>
         </c>
         <c ca="left">
            <p>FKH1&#8594;CDC6</p>
         </c>
         <c ca="left">
            <p>FKH1&#8594;CDC20</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Relationships in bold font are predicted both with and without prior knowledge.</p>
   </tblfn></tbl>
<p>Evidently, our method is capable of identifying a higher number of the positive benchmarks compared with the plain BN without prior knowledge. When evaluated with the BIND annotation, the number of correctly identified interactions doubled from 13 to 26 (p~0.13, &#967;<sup>2</sup>~2.28). The plain BN actually did not perform better than random selection (p~0.11). In contrast, BN with prior knowledge performed significantly better than random selection with &#967;<sup>2 </sup>= 24.5, p &lt; 0.001. When evaluated with the ChIP-chip data, the story is similar. The number of correctly identified gene regulatory relationships increased from 11 to 23 with the addition of prior knowledge (p &lt; 0.01, &#967;<sup>2 </sup>= 6.71). Without the prior knowledge, the plain BN is not different from random selection (p~0.1).</p>
<p>Figure <figr fid="F5">5A-5C</figr> shows the ROC curves that give a more quantitative view of the performance of BN with/without prior knowledge, and of the Werhli and Husmeier's algorithm <abbrgrp>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>, in detecting TF-target gene interactions. Incorporation of prior knowledge significantly improved the performance with higher AUC. Our algorithm performed slightly better than Werhli and Husmeier's.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>ROC curves for the network modeling of the yeast cell cycle data using plain BN (A), Werhli and Husmeier's (B), and our algorithm (C)</p></caption><text>
   <p><b>ROC curves for the network modeling of the yeast cell cycle data using plain BN (A), Werhli and Husmeier's (B), and our algorithm (C)</b>. ChIP-chip binding data were used as benchmark. Adding prior knowledge significantly improved BN performance at identifying the TF-target pairs.</p>
</text><graphic file="1471-2105-12-359-5" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>Validation using mouse pancreas development data</p>
</st>
<p>We also validated our algorithm using a mammal dataset. The experiment profiled gene expression changes in pancreas during embryonic development or during compensatory growth after partial pancreatectomy. Elucidating the networks is key to understand the complex nature of pancreas development and function <abbrgrp>
<abbr bid="B50">50</abbr>
<abbr bid="B51">51</abbr>
</abbrgrp>. A number of efforts have been made to manually annotate the key transcription factors and the gene networks they regulate based on low-throughput data, nicely reviewed by Servitja and Ferrer <abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp>. In Table <tblr tid="T6">6</tblr>, we list the 24 experimentally confirmed gene-gene regulatory relationships <abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp>, and their network is depicted in Figure <figr fid="F6">6A</figr>. With prior knowledge BN modeling of the expression data is able to recover half of them (12), as shown in Figure <figr fid="F6">6C</figr> and Table <tblr tid="T6">6</tblr>. In contrast, the plain BN is only able to identify 6 of them (Figure <figr fid="F6">6B</figr>). This is again a ~two-fold enhancement. In Figures <figr fid="F7">7A-7C</figr> the ROC curves are presented. Incorporation of prior knowledge significantly improved the ability to detect known interactions. Our algorithm performed comparably to Werhli and Husmeier's.</p>
<tbl id="T6"><title><p>Table 6</p></title><caption><p>Established pancreas gene regulatory relationships that are identified by BN modeling</p></caption><tblbdy cols="3">
      <r>
         <c ca="left">
            <p>
               <b>Known regulatory relationship</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Identified by BN modeling with prior knowledge</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Identified by the plain BN without prior knowledge</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Hes1&#8594;Neurog3</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Hnf4a&#8594;Tcf1</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pdx1&#8594;Gck</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pdx1&#8594;Hnf4a</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pdx1&#8594;Iapp</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pdx1&#8594;Ins2</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pdx1&#8594;Nr5a2</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mafb&#8594;Ins2</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Mafb&#8594;Pdx1</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Neurog3&#8594;Nkx2-2</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Nkx2-2&#8594;Gck</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Nkx2-2&#8594;Iapp</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Nkx2-2&#8594;Ins2</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Onecut1&#8594;Pdx1</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Onecut1&#8594;Neurog3</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Onecut1&#8594;Tcf1</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pax6&#8594;Gck</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pax6&#8594;Iapp</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pax6&#8594;Ins2</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Pax6&#8594;Pdx1</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Tcf1&#8594;Hnf4a</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Tcf1&#8594;Pdx1</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Tcf1&#8594;Pklr</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Tcf1&#8594;Slc2a2</p>
         </c>
         <c ca="left">
            <p>&#8730;</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>BN with prior knowledge can recover half of the experimentally confirmed transcriptional regulations during mouse pancreas development, two times more than the plain BN without prior knowledge.</p>
   </tblfn></tbl>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>The pancreas development network already established by existing experiments (A), predicted by the plain BN (B), and by BN + prior knowledge (C)</p></caption><text>
   <p><b>The pancreas development network already established by existing experiments (A), predicted by the plain BN (B), and by BN + prior knowledge (C)</b>. The bold edges in (B) and (C) are those that overlap with the edges in (A).</p>
</text><graphic file="1471-2105-12-359-6" hint_layout="single"/></fig>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>ROC curves for the pancreas development data with plain BN (A), Werhli and Husmeier's algorithm (B), and our approach (C)</p></caption><text>
   <p><b>ROC curves for the pancreas development data with plain BN (A), Werhli and Husmeier's algorithm (B), and our approach (C)</b>. 24 experimentally confirmed interactions were used as benchmark.</p>
</text><graphic file="1471-2105-12-359-7" hint_layout="single"/></fig>
<p>In Additional file <supplr sid="S1">1</supplr>, we listed the GO similarity and PubMed co-citation of the gene pairs with known regulatory relationships that were missed by plain BN. Clearly, almost all of them have high GO similarity and share a significant number of co-citations. Adding the functional linkage as prior knowledge helped to recover them.</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Predicted regulatory relationships missed by the plain BN</b>. most established regulatory relationships missed by the plain BN involve two genes that share significant GO similarity and PubMed co-citation.</p>
</text>
<file name="1471-2105-12-359-S1.DOCX">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>In this study we proposed a new algorithm to quantitatively utilize prior biological knowledge in the network modeling of gene expression data. First the functional linkage of gene pairs was assessed based on multiple data sources using the na&#239;ve Bayesian classifier. The result was then utilized to construct a candidate network edge reservoir, where the number of replicate edges between each gene pair was proportional to their function linkage probability. During simulation new candidate network structure was formed by sampling from this reservoir at each iteration. Since the edges of gene pairs with stronger functional linkage had more representations in the reservoir, these biologically meaningful edges enjoyed a preferential treatment in network simulation. With both the simulated and real gene expression data, we demonstrated that incorporating the prior knowledge significantly improved the network modeling performance. More information of the gene interaction network could be extracted from the microarray data with higher accuracy. In contrast, in all datasets, without the prior knowledge, though the number of benchmark regulations recovered is more than a random selection, the improvement is not statistically significant, demonstrating the necessity to supplement the gene expression data with additional information. This finding that plain BN did not perform better than random selection was not unexpected, similar observations was recently reported for a number of publically available reverse-engineering algorithms when gene expression data is the sole source of information <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp>.</p>
<p>Our algorithm provides a practical way to integrate the probabilistic biological knowledge that is different from previous efforts by others <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. The quantitative nature makes it capable to handle soft constraints. Using the approach by Werhli and Husmeier for instance <abbrgrp>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>, we differ in several key steps. First, they encode multiple sources of prior knowledge in a weighted sum via an energy function; we integrate information from multiple sources through a Bayesian classifier. Furthermore, in our approach the MCMC samples from a candidate edge distribution defined by the prior knowledge, rather than from the network posterior distribution where the network prior is defined by the prior knowledge. Our algorithm utilizes the prior knowledge at interaction level, while theirs at the network level. Finally the Werhli and Husmeier approach is more computational intensive. To reduce the computational complexity, they sum over all parent configurations of each node and limit the number of parents of each node to 3 or less; the complexity of this operation is <inline-formula>
<m:math name="1471-2105-12-359-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mrow>
      <m:mo>(</m:mo>
      <m:mrow>
         <m:mtable>
            <m:mtr>
               <m:mtd>
                  <m:mrow>
                     <m:mi>N</m:mi>
                     <m:mo>&#8722;</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:mtd>
            </m:mtr>
            <m:mtr>
               <m:mtd>
                  <m:mi>m</m:mi>
               </m:mtd>
            </m:mtr>
         </m:mtable>
      </m:mrow>
      <m:mo>)</m:mo>
   </m:mrow>
</m:mrow>
</m:math>
</inline-formula> (where N is size of the network, and m the maximum FanIn) <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>. We find that it is still memory consuming for networks of moderate or large sizes. For instance, a Dell Optiplex 755 with 2GHZ DUO CPU, 3.25 GB RAM ran out of memory when simulating the 107-gene yeast network. Our algorithm does not have this problem.</p>
<p>We used two sources of prior evidence of functional linkage to assist network modeling: the PubMed co-citation and GO schematic similarity. However, our framework by design allows the integration of other types of data or knowledge, for instance, high throughput genomic data including PPI and ChIP-chip; gene-gene relationships derived from advanced methods including text mining <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp>, database curation, and computational modeling of sequence information; and many other sources. It has been demonstrated that the degree of improvement brought in by prior knowledge highly depends on the quality of the information being added <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp>. Low quality prior knowledge could even lower the performance of BN <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp>. Presently, most of the available prior knowledge each on its own suffers from high false positive rate and being incomplete, which can limit their efficacy in network modeling. Integration of data from different sources and utilizing their consensus provides an effective means to deal with this issue <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
</abbrgrp>. A caveat here is, when considering more sources of data, the inter-dependency among them need to be scrutinized more carefully, and maybe a more sophisticated integration method than the na&#239;ve Bayesian classifier is needed.</p>
<p>A number of different approaches have been developed to integrate multiple sources of prior information in the BN modeling of gene expression data, at the different steps of the simulation process <abbrgrp>
<abbr bid="B4">4</abbr>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
</abbrgrp>. It would be of interest to compare the efficiency of the different approaches, investigate whether the optimal approach depends on the types of prior knowledge, and if the different approaches can be combined for a most efficient utilization of prior knowledge in network modeling.</p>
</sec>
<sec>
<st>
<p>Conclusion</p>
</st>
<p>In this paper we proposed a new algorithm to integrate and utilize the prior biological knowledge in the BN modeling of gene expression data. Our study demonstrated that incorporating prior knowledge at the step of network structure simulation is an efficient way to preserve the quantitative information in it, and to improve the performance of network modeling.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Preparation of gene expression data for algorithm validation</p>
</st>
<sec>
<st>
<p>Simulated data</p>
</st>
<p>The simulated time course gene expression dataset was generated using SynTReN <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp> for a artificial network with 76 genes, of which 24 act as regulators with a total of 124 regulatory relationships (<it>i.e</it>. 124 edges). The total number of time points is 50. All parameters of SynTReN were set to default values <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>, except number of correlated inputs, which was set to 50%. The topological structure and inner interacting relationships are sampled from the characteristics of the yeast transcriptional network, therefore the results will be indicative of the algorithm performance on real data.</p>
</sec>
<sec>
<st>
<p>Yeast cell cycle study</p>
</st>
<p>Yeast cell cycle gene expression data were downloaded from <url>http://genome-www.stanford.edu/cellcycle/</url>. These studies <abbrgrp>
<abbr bid="B48">48</abbr>
<abbr bid="B55">55</abbr>
</abbrgrp> profiled expression changes in 6178 genes at ~20 time points under each condition following alpha factor arrest (18 time points from 0-119 minutes), elutriation ELU (14 time points from 0-390 minutes), and arrest of a cdc15 (24 time points from 10-290 minutes) and a cdc28 (28 time points from 0-160 minutes) temperature sensitive mutant. Many genes have missing data points. The cdc28 data is the most severely affected, ~80% of genes contains at least 1 missing values. For the remaining three datasets, it ranged 6-27%. In this study, we chose the cdc15 dataset, as it contains the most number of time points out of the three <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp>. Network modeling was performed on the 107 known cell cycle genes <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp>. The list is given in Table <tblr tid="T7">7</tblr>. These are the genes that most likely to have interesting interactions during the time course being studied.</p>
<tbl id="T7"><title><p>Table 7</p></title><caption><p>The 107 Yeast cell cycle genes that were simulated for their network structure</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p>ACE2 (850822)</p>
         </c>
         <c ca="left">
            <p>CLB6 (853003)</p>
         </c>
         <c ca="left">
            <p>HHF2 (855701)</p>
         </c>
         <c ca="left">
            <p>MSH6 (851671)</p>
         </c>
         <c ca="left">
            <p>RFA3 (853266)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>AGA1 (855780)</p>
         </c>
         <c ca="left">
            <p>CLN1 (855239)</p>
         </c>
         <c ca="left">
            <p>HHT1 (852295)</p>
         </c>
         <c ca="left">
            <p>MST1 (853640)</p>
         </c>
         <c ca="left">
            <p>RME1 (852935)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ASE1 (854223)</p>
         </c>
         <c ca="left">
            <p>CLN2 (855819)</p>
         </c>
         <c ca="left">
            <p>HHT1 (855700)</p>
         </c>
         <c ca="left">
            <p>NDD1 (854554)</p>
         </c>
         <c ca="left">
            <p>RNR1 (856801)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ASF1 (853327)</p>
         </c>
         <c ca="left">
            <p>CLN3 (851191)</p>
         </c>
         <c ca="left">
            <p>HHT2 (852295)</p>
         </c>
         <c ca="left">
            <p>NUM1 (851727)</p>
         </c>
         <c ca="left">
            <p>RNR3 (854744)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ASF2 (851330)</p>
         </c>
         <c ca="left">
            <p>CTS1 (850992)</p>
         </c>
         <c ca="left">
            <p>HHT2 (855700)</p>
         </c>
         <c ca="left">
            <p>PCL1 (855427)</p>
         </c>
         <c ca="left">
            <p>SED1 (851649)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ASH1 (853650)</p>
         </c>
         <c ca="left">
            <p>CWP1 (853766)</p>
         </c>
         <c ca="left">
            <p>HO (851371)</p>
         </c>
         <c ca="left">
            <p>PCL2 (851430)</p>
         </c>
         <c ca="left">
            <p>SIC1 (850768)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDC14 (850585)</p>
         </c>
         <c ca="left">
            <p>CWP2 (853765)</p>
         </c>
         <c ca="left">
            <p>HSL1 (853760)</p>
         </c>
         <c ca="left">
            <p>PCL9 (851375)</p>
         </c>
         <c ca="left">
            <p>SPC42 (853824)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDC20 (852762)</p>
         </c>
         <c ca="left">
            <p>DBF2 (852984)</p>
         </c>
         <c ca="left">
            <p>HTA1 (851811)</p>
         </c>
         <c ca="left">
            <p>PDS1 (851691)</p>
         </c>
         <c ca="left">
            <p>SPO12 (856557)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDC21 (854241)</p>
         </c>
         <c ca="left">
            <p>DBF4 (851623)</p>
         </c>
         <c ca="left">
            <p>HTA2 (852283)</p>
         </c>
         <c ca="left">
            <p>PMS1 (855642)</p>
         </c>
         <c ca="left">
            <p>SST2 (851173)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDC45 (850793)</p>
         </c>
         <c ca="left">
            <p>DPB2 (856305)</p>
         </c>
         <c ca="left">
            <p>HTB1 (851810)</p>
         </c>
         <c ca="left">
            <p>POL1 (855621)</p>
         </c>
         <c ca="left">
            <p>STE2 (850518)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDC5 (855013)</p>
         </c>
         <c ca="left">
            <p>DPB3 (852580)</p>
         </c>
         <c ca="left">
            <p>HTB2 (852284)</p>
         </c>
         <c ca="left">
            <p>POL12 (852245)</p>
         </c>
         <c ca="left">
            <p>SWE1 (853252)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDC6 (853244)</p>
         </c>
         <c ca="left">
            <p>EGT2 (855389)</p>
         </c>
         <c ca="left">
            <p>KAR3 (856263)</p>
         </c>
         <c ca="left">
            <p>POL2 (855459)</p>
         </c>
         <c ca="left">
            <p>SWI4 (856847)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDC8 (853520)</p>
         </c>
         <c ca="left">
            <p>FAR1 (853283)</p>
         </c>
         <c ca="left">
            <p>KAR4 (850303)</p>
         </c>
         <c ca="left">
            <p>POL30 (852385)</p>
         </c>
         <c ca="left">
            <p>SWI5 (851724)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CDC9 (851391)</p>
         </c>
         <c ca="left">
            <p>FKH1 (854675)</p>
         </c>
         <c ca="left">
            <p>KIN3 (851273)</p>
         </c>
         <c ca="left">
            <p>PRI1 (854825)</p>
         </c>
         <c ca="left">
            <p>SWI6 (850879)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CHS1 (855529)</p>
         </c>
         <c ca="left">
            <p>FKH2 (855656)</p>
         </c>
         <c ca="left">
            <p>KRE6 (856287)</p>
         </c>
         <c ca="left">
            <p>PRI2 (853821)</p>
         </c>
         <c ca="left">
            <p>TEC1 (852377)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CHS3 (852311)</p>
         </c>
         <c ca="left">
            <p>FKS1 (851055)</p>
         </c>
         <c ca="left">
            <p>MBP1 (851503)</p>
         </c>
         <c ca="left">
            <p>PSA1 (851504)</p>
         </c>
         <c ca="left">
            <p>TIP1 (852359)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CIK1 (855238)</p>
         </c>
         <c ca="left">
            <p>FUS1 (850330)</p>
         </c>
         <c ca="left">
            <p>MCD1 (851561)</p>
         </c>
         <c ca="left">
            <p>RAD17 (854550)</p>
         </c>
         <c ca="left">
            <p>TIR1 (856729)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CLB1 (853002)</p>
         </c>
         <c ca="left">
            <p>GAS1 (855355)</p>
         </c>
         <c ca="left">
            <p>MCM1 (855060)</p>
         </c>
         <c ca="left">
            <p>RAD27 (853747)</p>
         </c>
         <c ca="left">
            <p>UNG1 (854987)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CLB2 (856236)</p>
         </c>
         <c ca="left">
            <p>GIC2 (851904)</p>
         </c>
         <c ca="left">
            <p>MFA2 (855577)</p>
         </c>
         <c ca="left">
            <p>RAD51 (856831)</p>
         </c>
         <c ca="left">
            <p>YRO2 (852343)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CLB3 (851400)</p>
         </c>
         <c ca="left">
            <p>HHF1 (852294)</p>
         </c>
         <c ca="left">
            <p>MNN1 (856718)</p>
         </c>
         <c ca="left">
            <p>RAD54 (852713)</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CLB4 (850907)</p>
         </c>
         <c ca="left">
            <p>HHF1 (855701)</p>
         </c>
         <c ca="left">
            <p>MOB1 (854700)</p>
         </c>
         <c ca="left">
            <p>RFA1 (851266)</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CLB5 (856237)</p>
         </c>
         <c ca="left">
            <p>HHF2 (852294)</p>
         </c>
         <c ca="left">
            <p>MSH2 (854063)</p>
         </c>
         <c ca="left">
            <p>RFA2 (855404)</p>
         </c>
         <c>
            <p/>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>In parenthesis are the corresponding gene IDs.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Mouse pancreas development and regeneration after damage</p>
</st>
<p>The pancreas development and growth expression data was downloaded from the RNA Abundance Database <url>http://www.cbil.upenn.edu/RAD</url>, with study IDs 2 and 1790. Study 2 profiled mouse pancreas gene expression at six different developmental time points: embryonic day 14.5, 16.5, 18.5, at birth, at postnatal day 7, and at adulthood. 4 samples at E14.5, and 6 at all the following time points, totaling 34 samples. Study 1790 profiled gene expression in mice pancreas following partial pancreatectomy and Exendin-4 treatment. Exendin-4 is a glucagon-like peptide-1 receptor agonist that augments the pancreatic islet beta-cell mass by increasing beta-cell neogenesis and proliferation and by reducing apoptosis. Mice underwent 50% pancreatectomy or sham operation, and received Exendin-4 or vehicle every 24 hours. 3-4 animals from each group were sacrificed at each time point of 12, 24 and 48 hr after operation, together with 4 animals that received no operation, totaling 46 samples. Because the two studies each only contain a few time points, we combined their data for network modeling <abbrgrp>
<abbr bid="B58">58</abbr>
</abbrgrp>. Replicate samples under the same condition at the same time point were averaged.</p>
<p>The network modeling was performed on 36 genes manually collected from a recent review by Servitja and Ferrer <abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp>, which are known to be important in pancreas development. They are listed in Table <tblr tid="T8">8</tblr>.</p>
<tbl id="T8"><title><p>Table 8</p></title><caption><p>The 36 mouse genes chosen to reconstruct interaction networks during pancreas development and growth</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>Acvr1 (11477)</p>
         </c>
         <c ca="left">
            <p>Hes1 (15205)</p>
         </c>
         <c ca="left">
            <p>Nfe2l2 (18024)</p>
         </c>
         <c ca="left">
            <p>Pdx1 (18609)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Anxa4 (11746)</p>
         </c>
         <c ca="left">
            <p>Hnf4a (15378)</p>
         </c>
         <c ca="left">
            <p>Nfkb1 (18033)</p>
         </c>
         <c ca="left">
            <p>Pklr (18770)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Bmp2 (12156)</p>
         </c>
         <c ca="left">
            <p>Iapp (15874)</p>
         </c>
         <c ca="left">
            <p>Nfkbia (18035)</p>
         </c>
         <c ca="left">
            <p>Ppib (19035)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cbfb (12400)</p>
         </c>
         <c ca="left">
            <p>Ins2 (16334)</p>
         </c>
         <c ca="left">
            <p>Nkx2-2 (18088)</p>
         </c>
         <c ca="left">
            <p>Psen2 (19165)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Chuk (12675)</p>
         </c>
         <c ca="left">
            <p>Isl1 (16392)</p>
         </c>
         <c ca="left">
            <p>Npm1 (18148)</p>
         </c>
         <c ca="left">
            <p>Rps4x (20102)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cryz (12972)</p>
         </c>
         <c ca="left">
            <p>Mafb (16658)</p>
         </c>
         <c ca="left">
            <p>Nr5a2 (26424)</p>
         </c>
         <c ca="left">
            <p>Slc2a2 (20526)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Foxa2 (15376)</p>
         </c>
         <c ca="left">
            <p>Myo6 (17920)</p>
         </c>
         <c ca="left">
            <p>Nrp1 (18186)</p>
         </c>
         <c ca="left">
            <p>Stat3 (20848)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Foxa3 (15377)</p>
         </c>
         <c ca="left">
            <p>Nckap1 (50884)</p>
         </c>
         <c ca="left">
            <p>Onecut1 (15379)</p>
         </c>
         <c ca="left">
            <p>Tcf1 (21405)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Gck (103988)</p>
         </c>
         <c ca="left">
            <p>Neurog3 (11925)</p>
         </c>
         <c ca="left">
            <p>Pax6 (18508)</p>
         </c>
         <c ca="left">
            <p>Ugcg (22234)</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>In parenthesis are the corresponding gene IDs.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Digitization of gene expression data</p>
</st>
<p>Expression data were further discretized into three levels. In each data set, we calculated the mean (&#956;) and standard deviation (SD) of expression across all time points for each gene. Each expression value is then assigned to 0, 1 or 2 according to whether the value is less than &#956;-SD, between &#956;-SD and &#956;+SD, or above &#956;+SD.</p>
</sec>
<sec>
<st>
<p>Prior data of interaction and transcription binding</p>
</st>
<p>Annotations of known yeast gene interaction were downloaded from the Biomolecular Interaction Network Database (BIND, <url>http://bind.ca</url>), a database designed to store full descriptions of interactions, molecular complexes and pathways <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. BIND includes both directed (such as protein-DNA interaction) and un-directed (such as protein-protein interaction) interactions. Therefore when comparing to BIND annotations, we ignored direction.</p>
<p>Simon <it>et al </it>studied the transcription regulation of yeast genes by 9 cell cycle regulating transcription factors (TF): Fkh1, Fkh2, Ndd1, Mcm1, Ace2, Swi5, Mbp1, Swi4, and Swi6, using the ChIP-chip technology <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>. These nine TFs are among the 107 cell cycle genes that we performed network modeling. The data were downloaded from <url>http://staffa.wi.mit.edu/cgi-bin/young_public/navframe.cgi?s=17&amp;f=downloaddata</url>. For each TF, the study derived a binding p-value for each gene which reflects the likelihood that the TF binds to the promoter of this gene. We constructed a positive control target set for each TF that consists of those with <it>p </it>&lt; 0.001, a negative control target set for each TF that consists of those with <it>p </it>&gt; 0.1. Note that the transcription binding data provide directed information.</p>
</sec>
</sec>
</sec>
<sec>
<st>
<p>List of abbreviations used</p>
</st>
<p>AUC: area under curve; BN: Bayesian Network; DAG: directed acyclic graph; GO: Gene Ontology; MCMC: Markov Chain Monte Carlo; PPI: protein-protein interaction; ROC: receiver operating characteristic; TF: transcription factor.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>SG, and XW designed the study. SG wrote the algorithms, performed the analysis, and created the figures and tables. SG and XW wrote the manuscript, read and approved the final version of the manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>This work was supported in part by National Institute of Diabetes and Digestive and Kidney Diseases Grant R01DK080100 (XW).</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>A probabilistic view of gene function</p></title><aug><au><snm>Fraser</snm><fnm>AGME</fnm></au></aug><source>Nat Genet</source><pubdate>2004</pubdate><volume>6</volume><fpage>559</fpage><lpage>564</lpage></bibl><bibl id="B2"><title><p>A Probabilistic Functional Network of Yeast Genes</p></title><aug><au><snm>Lee</snm><fnm>IDS</fnm></au><au><snm>Adai</snm><fnm>AT</fnm></au><au><snm>Marcotte</snm><fnm>EM</fnm></au></aug><source>Science</source><pubdate>2004</pubdate><volume>306</volume><fpage>1555</fpage><lpage>1558</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1099511</pubid><pubid idtype="pmpid" link="fulltext">15567862</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>An effective structure learning method for constructing gene networks</p></title><aug><au><snm>Xue-wen Chen</snm><fnm>GAaXW</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><fpage>1367</fpage><lpage>1374</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl090</pubid><pubid idtype="pmpid" link="fulltext">16543279</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Combining Microarrays and Biological Knowledge for Estimating Gene Networks via Bayesian Networks</p></title><aug><au><snm>Imoto</snm><fnm>SHT</fnm></au><au><snm>Goto</snm><fnm>T</fnm></au><au><snm>Tashiro</snm><fnm>K</fnm></au><au><snm>Kuhara</snm><fnm>S</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>J Bioinform Comput Biol</source><pubdate>2004</pubdate><volume>2</volume><fpage>77</fpage><lpage>98</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1142/S021972000400048X</pubid><pubid idtype="pmpid" link="fulltext">15272434</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Quantitative quality control of microarray experiments: toward accurate gene expression measurements</p></title><aug><au><snm>Wang</snm><fnm>X</fnm></au><au><snm>Hessner</snm><fnm>MJ</fnm></au></aug><source>Gene expression profiling by microarrays - clinical implications</source><publisher>HW: Cambridge</publisher><editor>K</editor><pubdate>2006</pubdate></bibl><bibl id="B6"><title><p>A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)</p></title><aug><au><snm>Troyanskaya</snm><fnm>OG</fnm></au><au><snm>Dolinski</snm><fnm>K</fnm></au><au><snm>Owen</snm><fnm>AB</fnm></au><au><snm>Altman</snm><fnm>RB</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2003</pubdate><volume>100</volume><fpage>8348</fpage><lpage>8353</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0832373100</pubid><pubid idtype="pmcid">166232</pubid><pubid idtype="pmpid" link="fulltext">12826619</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Diagnosis and clinical management of spinal muscular atrophy</p></title><aug><au><snm>Han</snm><fnm>JJ</fnm></au><au><snm>McDonald</snm><fnm>CM</fnm></au></aug><source>Phys Med Rehabil Clin N Am</source><pubdate>2008</pubdate><volume>19</volume><fpage>661</fpage><lpage>680</lpage><note>xii</note><xrefbib><pubid idtype="pmpid" link="fulltext">18625423</pubid></xrefbib></bibl><bibl id="B8"><title><p>Using Bayesian networks to analyze expression data</p></title><aug><au><snm>Friedman</snm><fnm>N</fnm></au><au><snm>Linial</snm><fnm>M</fnm></au><au><snm>Nachman</snm><fnm>I</fnm></au><au><snm>Pe&apos;er</snm><fnm>D</fnm></au></aug><source>J Comput Biol</source><pubdate>2000</pubdate><volume>7</volume><fpage>601</fpage><lpage>620</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1089/106652700750050961</pubid><pubid idtype="pmpid" link="fulltext">11108481</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>A tutorial on learning with Bayesian networks</p></title><aug><au><snm>Heckerman</snm><fnm>D</fnm></au></aug><source>Learning in Graphical Models</source><publisher>Kluwer, Dordrecht</publisher><editor>Jordan MI</editor><pubdate>1998</pubdate></bibl><bibl id="B10"><title><p>A bayesian method for the induction of probabilistic networks from data</p></title><aug><au><snm>Cooper</snm><fnm>GF</fnm></au><au><snm>Herskovits</snm><fnm>EA</fnm></au></aug><source>Machine Learning</source><pubdate>1992</pubdate><volume>9</volume><fpage>309</fpage><lpage>347</lpage></bibl><bibl id="B11"><title><p>Using prior knowledge to improve genetic network reconstruction from microarray data</p></title><aug><au><snm>Le Phillip</snm><fnm>P</fnm></au><au><snm>Bahl</snm><fnm>A</fnm></au><au><snm>Ungar</snm><fnm>LH</fnm></au></aug><source>In Silico Biol</source><pubdate>2004</pubdate><volume>4</volume><fpage>335</fpage><lpage>353</lpage><xrefbib><pubid idtype="pmpid">15724284</pubid></xrefbib></bibl><bibl id="B12"><title><p>Literature-based priors for gene regulatory networks</p></title><aug><au><snm>Steele</snm><fnm>E</fnm></au><au><snm>Tucker</snm><fnm>A</fnm></au><au><snm>t Hoen</snm><fnm>PA</fnm></au><au><snm>Schuemie</snm><fnm>MJ</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><fpage>1768</fpage><lpage>1774</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btp277</pubid><pubid idtype="pmpid" link="fulltext">19389730</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>A framework for elucidating regulatory networks based on prior information and expression data</p></title><aug><au><snm>Gevaert</snm><fnm>O</fnm></au><au><snm>Van Vooren</snm><fnm>S</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au></aug><source>Ann N Y Acad Sci</source><pubdate>2007</pubdate><volume>1115</volume><fpage>240</fpage><lpage>248</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1196/annals.1407.002</pubid><pubid idtype="pmpid" link="fulltext">17925352</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Using Prior Knowledge to Improve Genetic Network Reconstruction from Microarray Data</p></title><aug><au><snm>Le Phillip</snm><fnm>P</fnm></au><au><snm>ABA</snm><fnm>A</fnm></au><au><snm>Ungar</snm><mi>H</mi><fnm>Lyle</fnm></au></aug><source>In Silico Biology</source><pubdate>2004</pubdate><volume>4</volume><fpage>335</fpage><lpage>353</lpage><xrefbib><pubid idtype="pmpid">15724284</pubid></xrefbib></bibl><bibl id="B15"><title><p>Combining location and expression data for principled discovery of genetic regulatory network models</p></title><aug><au><snm>Hartemink</snm><fnm>AJ</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Jaakkola</snm><fnm>TS</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Pac Symp Biocomput</source><pubdate>2002</pubdate><fpage>437</fpage><lpage>449</lpage></bibl><bibl id="B16"><title><p>Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection</p></title><aug><au><snm>Tamada</snm><fnm>Y</fnm></au><au><snm>Kim</snm><fnm>S</fnm></au><au><snm>Bannai</snm><fnm>H</fnm></au><au><snm>Imoto</snm><fnm>S</fnm></au><au><snm>Tashiro</snm><fnm>K</fnm></au><au><snm>Kuhara</snm><fnm>S</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><issue>Suppl 2</issue><fpage>ii227</fpage><lpage>236</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btg1082</pubid><pubid idtype="pmpid" link="fulltext">14534194</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>A statistical method to incorporate biological knowledge for generating testable novel gene regulatory interactions from microarray experiments</p></title><aug><au><snm>Larsen</snm><fnm>P</fnm></au><au><snm>Almasri</snm><fnm>E</fnm></au><au><snm>Chen</snm><fnm>G</fnm></au><au><snm>Dai</snm><fnm>Y</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2007</pubdate><volume>8</volume><fpage>317</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-8-317</pubid><pubid idtype="pmcid">2082045</pubid><pubid idtype="pmpid" link="fulltext">17727721</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Incorprating Literature Knowledge in Baysian Network for Inferring Gene Networks with Gene Expression Data</p></title><aug><au><snm>Eyad Almasri</snm><fnm>PL</fnm></au><au><snm>Chen</snm><fnm>Guanrao</fnm></au><au><snm>Dai</snm><fnm>Yang</fnm></au></aug><source>Proceeding of the 4th International Symposium on Bioinformatics Research and Applications</source><pubdate>2008</pubdate><volume>4983</volume><fpage>184</fpage></bibl><bibl id="B19"><title><p>Seeded Bayesian Networks: constructing genetic networks from microarray data</p></title><aug><au><snm>Djebbari</snm><fnm>A</fnm></au><au><snm>Quackenbush</snm><fnm>J</fnm></au></aug><source>BMC Syst Biol</source><pubdate>2008</pubdate><volume>2</volume><fpage>57</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1752-0509-2-57</pubid><pubid idtype="pmcid">2474592</pubid><pubid idtype="pmpid" link="fulltext">18601736</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Estimation of genetic networks and functional structures between genes by using Bayesian network and nonparametric regression</p></title><aug><au><snm>Imoto</snm><fnm>S</fnm></au><au><snm>Goto</snm><fnm>T</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>Pac Symp Biocomput</source><pubdate>2002</pubdate><volume>7</volume><fpage>175</fpage><lpage>186</lpage></bibl><bibl id="B21"><title><p>Combining Microarrays and Biological Knowledge for Estimating Gene Networks via Bayesian Networks</p></title><aug><au><snm>Imoto</snm><fnm>S</fnm></au><au><snm>Higuchi</snm><fnm>T</fnm></au><au><snm>Goto</snm><fnm>T</fnm></au><au><snm>Tashiro</snm><fnm>K</fnm></au><au><snm>Kuhara</snm><fnm>S</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>J Bioinform Comput Biol</source><pubdate>2004</pubdate><volume>2</volume><fpage>77</fpage><lpage>98</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1142/S021972000400048X</pubid><pubid idtype="pmpid" link="fulltext">15272434</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Bayesian Integration of Biological Prior Knowledge into the Reconstruction of Gene Regulatory Networks with Bayesian Networks</p></title><aug><au><snm>Husmeier</snm><fnm>D</fnm></au><au><snm>Werhli</snm><fnm>AV</fnm></au></aug><source>Comput Syst Bioinformatics Conf</source><pubdate>2007</pubdate><volume>6</volume><fpage>85</fpage><lpage>95</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">17951815</pubid></xrefbib></bibl><bibl id="B23"><title><p>Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge</p></title><aug><au><snm>Werhli</snm><fnm>AV</fnm></au><au><snm>Husmeier</snm><fnm>D</fnm></au></aug><source>Stat Appl Genet Mol Biol</source><pubdate>2007</pubdate><volume>6</volume><note>Article15</note></bibl><bibl id="B24"><title><p>Combining microarrays and biological knowledge for estimating gene networks via bayesian networks</p></title><aug><au><snm>Imoto</snm><fnm>S</fnm></au><au><snm>Higuchi</snm><fnm>T</fnm></au><au><snm>Goto</snm><fnm>T</fnm></au><au><snm>Tashiro</snm><fnm>K</fnm></au><au><snm>Kuhara</snm><fnm>S</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>J Bioinform Comput Biol</source><pubdate>2004</pubdate><volume>2</volume><fpage>77</fpage><lpage>98</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1142/S021972000400048X</pubid><pubid idtype="pmpid" link="fulltext">15272434</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Testing MCMC algorithms with randomly generated Bayesian networks</p></title><aug><au><snm>Ide</snm><fnm>JS</fnm></au><au><snm>Cozman</snm><fnm>FG</fnm></au></aug><source>Workshop de Teses e Disserta&#231;&#245;es em IA (WTDIA2002)</source><publisher>Recife, Pernambuco, Brazil</publisher><pubdate>2002</pubdate></bibl><bibl id="B26"><title><p>The modular nature of genetic diseases</p></title><aug><au><snm>Oti</snm><fnm>M</fnm></au><au><snm>Brunner</snm><fnm>HG</fnm></au></aug><source>Clin Genet</source><pubdate>2007</pubdate><volume>71</volume><fpage>1</fpage><lpage>11</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">17204041</pubid></xrefbib></bibl><bibl id="B27"><title><p>A probabilistic view of gene function</p></title><aug><au><snm>Fraser</snm><fnm>AG</fnm></au><au><snm>Marcotte</snm><fnm>EM</fnm></au></aug><source>Nat Genet</source><pubdate>2004</pubdate><volume>6</volume><fpage>559</fpage><lpage>564</lpage></bibl><bibl id="B28"><title><p>A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data</p></title><aug><au><snm>Jansen</snm><fnm>R</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Greenbaum</snm><fnm>D</fnm></au><au><snm>Kluger</snm><fnm>Y</fnm></au><au><snm>Krogan</snm><fnm>NJ</fnm></au><au><snm>Chung</snm><fnm>S</fnm></au><au><snm>Emili</snm><fnm>A</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Greenblatt</snm><fnm>JF</fnm></au><au><snm>Gerstein</snm><fnm>M</fnm></au></aug><pubdate>2003</pubdate><volume>302</volume><fpage>449</fpage><lpage>453</lpage></bibl><bibl id="B29"><title><p>Bayesian Graphical Models for Discrete Data</p></title><aug><au><snm>Madigan</snm><fnm>D</fnm></au><au><snm>York</snm><fnm>J</fnm></au><au><snm>Allard</snm><fnm>D</fnm></au></aug><source>International Statistical Review</source><pubdate>1995</pubdate><volume>63</volume><fpage>215</fpage><lpage>232</lpage><xrefbib><pubid idtype="doi">10.2307/1403615</pubid></xrefbib></bibl><bibl id="B30"><title><p>A primer on learning in Bayesian networks for computational biology</p></title><aug><au><snm>Needham</snm><fnm>CJ</fnm></au><au><snm>Bradford</snm><fnm>JR</fnm></au><au><snm>Bulpitt</snm><fnm>AJ</fnm></au><au><snm>Westhead</snm><fnm>DR</fnm></au></aug><source>PLoS Comput Biol</source><pubdate>2007</pubdate><volume>3</volume><fpage>e129</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.0030129</pubid><pubid idtype="pmcid">1963499</pubid><pubid idtype="pmpid" link="fulltext">17784779</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Learning Bayesian Networks: The Combination of Knowledge and Statistical Data</p></title><aug><au><snm>Heckerman</snm><fnm>D</fnm></au><au><snm>Geiger</snm><fnm>D</fnm></au><au><snm>Chickering</snm><fnm>DM</fnm></au></aug><source>Machine Learning</source><pubdate>1995</pubdate><volume>20</volume><fpage>197</fpage><lpage>243</lpage></bibl><bibl id="B32"><title><p>Semantic search among heterogeneous biological databases based on gene ontology</p></title><aug><au><snm>Cao</snm><fnm>SL</fnm></au><au><snm>Qin</snm><fnm>L</fnm></au><au><snm>He</snm><fnm>WZ</fnm></au><au><snm>Zhong</snm><fnm>Y</fnm></au><au><snm>Zhu</snm><fnm>YY</fnm></au><au><snm>Li</snm><fnm>YX</fnm></au></aug><source>Acta Biochim Biophys Sin (Shanghai)</source><pubdate>2004</pubdate><volume>36</volume><fpage>365</fpage><lpage>370</lpage><xrefbib><pubid idtype="doi">10.1093/abbs/36.5.365</pubid></xrefbib></bibl><bibl id="B33"><title><p>The bayes net toolbox for matlab</p></title><aug><au><snm>Murphy</snm><fnm>K</fnm></au></aug><source>Computing Science and Statistics</source><pubdate>2001</pubdate><volume>33</volume></bibl><bibl id="B34"><title><p>Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks</p></title><aug><au><snm>Shannon</snm><fnm>P</fnm></au><au><snm>Markiel</snm><fnm>A</fnm></au><au><snm>Ozier</snm><fnm>O</fnm></au><au><snm>Baliga</snm><fnm>NS</fnm></au><au><snm>Wang</snm><fnm>JT</fnm></au><au><snm>Ramage</snm><fnm>D</fnm></au><au><snm>Amin</snm><fnm>N</fnm></au><au><snm>Schwikowski</snm><fnm>B</fnm></au><au><snm>Ideker</snm><fnm>T</fnm></au></aug><source>Genome Res</source><pubdate>2003</pubdate><volume>13</volume><fpage>2498</fpage><lpage>2504</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.1239303</pubid><pubid idtype="pmcid">403769</pubid><pubid idtype="pmpid" link="fulltext">14597658</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>A Probabilistic Functional Network of Yeast Genes</p></title><aug><au><snm>Lee</snm><fnm>I</fnm></au><au><snm>Date</snm><fnm>SV</fnm></au><au><snm>Adai</snm><fnm>AT</fnm></au><au><snm>Marcotte</snm><fnm>EM</fnm></au></aug><source>Science</source><pubdate>2004</pubdate><volume>306</volume><fpage>1555</fpage><lpage>1558</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1099511</pubid><pubid idtype="pmpid" link="fulltext">15567862</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae</p></title><aug><au><snm>Lee</snm><fnm>I</fnm></au><au><snm>Li</snm><fnm>Z</fnm></au><au><snm>Marcotte</snm><fnm>EM</fnm></au></aug><source>PloS one</source><pubdate>2007</pubdate><volume>2</volume><fpage>e988</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0000988</pubid><pubid idtype="pmcid">1991590</pubid><pubid idtype="pmpid" link="fulltext">17912365</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Analysis and comparison of metabolic pathway databases</p></title><aug><au><snm>Wittig</snm><fnm>U</fnm></au><au><snm>De Beuckelaer</snm><fnm>A</fnm></au></aug><source>Briefings in bioinformatics</source><pubdate>2001</pubdate><volume>2</volume><fpage>126</fpage><lpage>142</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/2.2.126</pubid><pubid idtype="pmpid" link="fulltext">11465731</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes</p></title><aug><au><snm>Franke</snm><fnm>L</fnm></au><au><snm>Bakel</snm><fnm>H</fnm></au><au><snm>Fokkens</snm><fnm>L</fnm></au><au><snm>de Jong</snm><fnm>ED</fnm></au><au><snm>Egmont-Petersen</snm><fnm>M</fnm></au><au><snm>Wijmenga</snm><fnm>C</fnm></au></aug><source>Am J Hum Genet</source><pubdate>2006</pubdate><volume>78</volume><fpage>1011</fpage><lpage>1025</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1086/504300</pubid><pubid idtype="pmcid">1474084</pubid><pubid idtype="pmpid" link="fulltext">16685651</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>MIPS: a database for genomes and protein sequences</p></title><aug><au><snm>Mewes</snm><fnm>HW</fnm></au><au><snm>Frishman</snm><fnm>D</fnm></au><au><snm>Guldener</snm><fnm>U</fnm></au><au><snm>Mannhaupt</snm><fnm>G</fnm></au><au><snm>Mayer</snm><fnm>K</fnm></au><au><snm>Mokrejs</snm><fnm>M</fnm></au><au><snm>Morgenstern</snm><fnm>B</fnm></au><au><snm>Munsterkotter</snm><fnm>M</fnm></au><au><snm>Rudd</snm><fnm>S</fnm></au><au><snm>Weil</snm><fnm>B</fnm></au></aug><source>Nucleic acids research</source><pubdate>2002</pubdate><volume>30</volume><fpage>31</fpage><lpage>34</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/30.1.31</pubid><pubid idtype="pmcid">99165</pubid><pubid idtype="pmpid" link="fulltext">11752246</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>On the Optimality of the Simple Bayesian Classifier under Zero-One Loss</p></title><aug><au><snm>Domingos</snm><fnm>P</fnm></au><au><snm>Pazzani</snm><fnm>M</fnm></au></aug><source>Machine Learning</source><pubdate>1997</pubdate><volume>29</volume><fpage>103</fpage><lpage>130</lpage><xrefbib><pubid idtype="doi">10.1023/A:1007413511361</pubid></xrefbib></bibl><bibl id="B41"><title><p>Bayesian Network Classifiers</p></title><aug><au><snm>Friedman</snm><fnm>N</fnm></au><au><snm>Geiger</snm><fnm>D</fnm></au><au><snm>Goldszmidt</snm><fnm>M</fnm></au></aug><source>Machine Learning</source><pubdate>1997</pubdate><volume>29</volume><fpage>131</fpage><lpage>163</lpage><xrefbib><pubid idtype="doi">10.1023/A:1007465528199</pubid></xrefbib></bibl><bibl id="B42"><title><p>A Bayesian networks approach for predicting protein-protein interactions from genomic data</p></title><aug><au><snm>Jansen</snm><fnm>R</fnm></au><au><snm>Yu</snm><fnm>H</fnm></au><au><snm>Greenbaum</snm><fnm>D</fnm></au><au><snm>Kluger</snm><fnm>Y</fnm></au><au><snm>Krogan</snm><fnm>NJ</fnm></au><au><snm>Chung</snm><fnm>S</fnm></au><au><snm>Emili</snm><fnm>A</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Greenblatt</snm><fnm>JF</fnm></au><au><snm>Gerstein</snm><fnm>M</fnm></au></aug><source>Science</source><pubdate>2003</pubdate><volume>302</volume><fpage>449</fpage><lpage>453</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1087361</pubid><pubid idtype="pmpid" link="fulltext">14564010</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Global mapping of the yeast genetic interaction network</p></title><aug><au><snm>Tong</snm><fnm>AH</fnm></au><au><snm>Lesage</snm><fnm>G</fnm></au><au><snm>Bader</snm><fnm>GD</fnm></au><au><snm>Ding</snm><fnm>H</fnm></au><au><snm>Xu</snm><fnm>H</fnm></au><au><snm>Xin</snm><fnm>X</fnm></au><au><snm>Young</snm><fnm>J</fnm></au><au><snm>Berriz</snm><fnm>GF</fnm></au><au><snm>Brost</snm><fnm>RL</fnm></au><au><snm>Chang</snm><fnm>M</fnm></au><etal/></aug><source>Science</source><pubdate>2004</pubdate><volume>303</volume><fpage>808</fpage><lpage>813</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1091317</pubid><pubid idtype="pmpid" link="fulltext">14764870</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Creating the gene ontology resource: design and implementation</p></title><aug><au><snm>Consortium</snm><fnm>GO</fnm></au></aug><source>Genome Res</source><pubdate>2001</pubdate><volume>11</volume><fpage>1425</fpage><lpage>1433</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.180801</pubid><pubid idtype="pmcid">311077</pubid><pubid idtype="pmpid" link="fulltext">11483584</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Serial regulation of transcriptional regulators in the yeast cell cycle</p></title><aug><au><snm>Simon</snm><fnm>I</fnm></au><au><snm>Barnett</snm><fnm>J</fnm></au><au><snm>Hannett</snm><fnm>N</fnm></au><au><snm>Harbison</snm><fnm>CT</fnm></au><au><snm>Rinaldi</snm><fnm>NJ</fnm></au><au><snm>Volkert</snm><fnm>TL</fnm></au><au><snm>Wyrick</snm><fnm>JJ</fnm></au><au><snm>Zeitlinger</snm><fnm>J</fnm></au><au><snm>Gifford</snm><fnm>DK</fnm></au><au><snm>Jaakkola</snm><fnm>TS</fnm></au><au><snm>Young</snm><fnm>RA</fnm></au></aug><source>Cell</source><pubdate>2001</pubdate><volume>106</volume><fpage>697</fpage><lpage>708</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0092-8674(01)00494-9</pubid><pubid idtype="pmpid" link="fulltext">11572776</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms</p></title><aug><au><snm>Van den Bulcke</snm><fnm>T</fnm></au><au><snm>Van Leemput</snm><fnm>K</fnm></au><au><snm>Naudts</snm><fnm>B</fnm></au><au><snm>van Remortel</snm><fnm>P</fnm></au><au><snm>Ma</snm><fnm>H</fnm></au><au><snm>Verschoren</snm><fnm>A</fnm></au><au><snm>De Moor</snm><fnm>B</fnm></au><au><snm>Marchal</snm><fnm>K</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>43</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-43</pubid><pubid idtype="pmcid">1373604</pubid><pubid idtype="pmpid" link="fulltext">16438721</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>How to infer gene networks from expression profiles</p></title><aug><au><snm>Bansal</snm><fnm>M</fnm></au><au><snm>Belcastro</snm><fnm>V</fnm></au><au><snm>Ambesi-Impiombato</snm><fnm>A</fnm></au><au><snm>di Bernardo</snm><fnm>D</fnm></au></aug><source>Mol Syst Biol</source><pubdate>2007</pubdate><volume>3</volume><fpage>122</fpage></bibl><bibl id="B48"><title><p>Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization</p></title><aug><au><snm>Spellman</snm><fnm>PT</fnm></au><au><snm>Sherlock</snm><fnm>G</fnm></au><au><snm>Zhang</snm><fnm>MQ</fnm></au><au><snm>Iyer</snm><fnm>VR</fnm></au><au><snm>Anders</snm><fnm>K</fnm></au><au><snm>Eisen</snm><fnm>MB</fnm></au><au><snm>Brown</snm><fnm>PO</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Futcher</snm><fnm>B</fnm></au></aug><source>Mol Biol Cell</source><pubdate>1998</pubdate><volume>9</volume><fpage>3273</fpage><lpage>3297</lpage><xrefbib><pubidlist><pubid idtype="pmcid">25624</pubid><pubid idtype="pmpid" link="fulltext">9843569</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>The Biomolecular Interaction Network Database and related tools 2005 update</p></title><aug><au><snm>Alfarano</snm><fnm>C</fnm></au><au><snm>Andrade</snm><fnm>CE</fnm></au><au><snm>Anthony</snm><fnm>K</fnm></au><au><snm>Bahroos</snm><fnm>N</fnm></au><au><snm>Bajec</snm><fnm>M</fnm></au><au><snm>Bantoft</snm><fnm>K</fnm></au><au><snm>Betel</snm><fnm>D</fnm></au><au><snm>Bobechko</snm><fnm>B</fnm></au><au><snm>Boutilier</snm><fnm>K</fnm></au><au><snm>Burgess</snm><fnm>E</fnm></au><etal/></aug><source>Nucleic Acids Res</source><pubdate>2005</pubdate><volume>33</volume><fpage>D418</fpage><lpage>424</lpage><xrefbib><pubidlist><pubid idtype="pmcid">540005</pubid><pubid idtype="pmpid" link="fulltext">15608229</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Stem/progenitor cells derived from adult tissues: potential for the treatment of diabetes mellitus</p></title><aug><au><snm>Lechner</snm><fnm>A</fnm></au><au><snm>Habener</snm><fnm>JF</fnm></au></aug><source>Am J Physiol Endocrinol Metab</source><pubdate>2003</pubdate><volume>284</volume><fpage>E259</fpage><lpage>266</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12531740</pubid></xrefbib></bibl><bibl id="B51"><title><p>Stem cell therapy for diabetes: do we need to make beta cells?</p></title><aug><au><snm>Burns</snm><fnm>CJ</fnm></au><au><snm>Persaud</snm><fnm>SJ</fnm></au><au><snm>Jones</snm><fnm>PM</fnm></au></aug><source>J Endocrinol</source><pubdate>2004</pubdate><volume>183</volume><fpage>437</fpage><lpage>443</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1677/joe.1.05981</pubid><pubid idtype="pmpid" link="fulltext">15590970</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>Transcriptional networks controlling pancreatic development and beta cell function</p></title><aug><au><snm>Servitja</snm><fnm>JM</fnm></au><au><snm>Ferrer</snm><fnm>J</fnm></au></aug><source>Diabetologia</source><pubdate>2004</pubdate><volume>47</volume><fpage>597</fpage><lpage>613</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s00125-004-1368-9</pubid><pubid idtype="pmpid" link="fulltext">15298336</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>A literature network of human genes for high-throughput analysis of gene expression</p></title><aug><au><snm>Jenssen</snm><fnm>TK</fnm></au><au><snm>Laegreid</snm><fnm>A</fnm></au><au><snm>Komorowski</snm><fnm>J</fnm></au><au><snm>Hovig</snm><fnm>E</fnm></au></aug><source>Nat Genet</source><pubdate>2001</pubdate><volume>28</volume><fpage>21</fpage><lpage>28</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">11326270</pubid></xrefbib></bibl><bibl id="B54"><title><p>Analyzing the Effect of Prior Knowledge in Genetic Regulatory Network Inference</p></title><aug><au><snm>Bastos</snm><fnm>G</fnm></au><au><snm>Guimaraes</snm><fnm>KS</fnm></au></aug><source>Pattern Recognition and Machine Intelligence, Lecture Notes in Computer Science</source><pubdate>2005</pubdate><volume>3776</volume><fpage>611</fpage><lpage>616</lpage><xrefbib><pubid idtype="doi">10.1007/11590316_97</pubid></xrefbib></bibl><bibl id="B55"><title><p>A genome-wide transcriptional analysis of the mitotic cell cycle</p></title><aug><au><snm>Cho</snm><fnm>RJ</fnm></au><au><snm>Campbell</snm><fnm>MJ</fnm></au><au><snm>Winzeler</snm><fnm>EA</fnm></au><au><snm>Steinmetz</snm><fnm>L</fnm></au><au><snm>Conway</snm><fnm>A</fnm></au><au><snm>Wodicka</snm><fnm>L</fnm></au><au><snm>Wolfsberg</snm><fnm>TG</fnm></au><au><snm>Gabrielian</snm><fnm>AE</fnm></au><au><snm>Landsman</snm><fnm>D</fnm></au><au><snm>Lockhart</snm><fnm>DJ</fnm></au><au><snm>Davis</snm><fnm>RW</fnm></au></aug><source>Mol Cell</source><pubdate>1998</pubdate><volume>2</volume><fpage>65</fpage><lpage>73</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1097-2765(00)80114-8</pubid><pubid idtype="pmpid" link="fulltext">9702192</pubid></pubidlist></xrefbib></bibl><bibl id="B56"><title><p>Global analysis of phase locking in gene expression during cell cycle: the potential in network modeling</p></title><aug><au><snm>Gao</snm><fnm>S</fnm></au><au><snm>Hartman</snm><fnm>J</fnm></au><au><snm>Carter</snm><fnm>JL</fnm></au><au><snm>Hessner</snm><fnm>MJ</fnm></au><au><snm>Wang</snm><fnm>X</fnm></au></aug><source>BMC Syst Biol</source><pubdate>2010</pubdate><volume>4</volume><fpage>167</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1752-0509-4-167</pubid><pubid idtype="pmcid">3017040</pubid><pubid idtype="pmpid" link="fulltext">21129191</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>A geneome-wide transciptional analysis of the mitotic cell cycle</p></title><aug><au><snm>Cho</snm><fnm>RJ</fnm></au><au><snm>Campbell</snm><fnm>MJ</fnm></au><au><snm>Winzeler</snm><fnm>EA</fnm></au><au><snm>Steimetz</snm><fnm>L</fnm></au><au><snm>Conway</snm><fnm>A</fnm></au><au><snm>Wolfsberg</snm><fnm>TG</fnm></au></aug><source>Mol Cell</source><pubdate>1998</pubdate><volume>2</volume><fpage>65</fpage><lpage>73</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1097-2765(00)80114-8</pubid><pubid idtype="pmpid" link="fulltext">9702192</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>Characterizing Dynamic Changes in the Human Blood Transcriptional Network</p></title><aug><au><snm>Zhu</snm><fnm>JCY</fnm></au><au><snm>Leonardson</snm><fnm>AS</fnm></au><au><snm>Wang</snm><fnm>K</fnm></au><au><snm>Lamb</snm><fnm>JR</fnm></au><etal/></aug><source>PLoS Comput Biol</source><pubdate>2001</pubdate><volume>6</volume></bibl></refgrp>
</bm></art>