<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-269</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>An integrated approach to the prediction of domain-domain interactions</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Lee</snm>
               <fnm>Hyunju</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>hyunjul@usc.edu</email>
            </au>
            <au id="A2">
               <snm>Deng</snm>
               <fnm>Minghua</fnm>
               <insr iid="I2"/>
               <email>dengmh@math.pku.edu.cn</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Sun</snm>
               <fnm>Fengzhu</fnm>
               <insr iid="I3"/>
               <email>fsun@usc.edu</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Chen</snm>
               <fnm>Ting</fnm>
               <insr iid="I3"/>
               <email>tingchen@usc.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA</p>
            </ins>
            <ins id="I2">
               <p>School of Mathematical Sciences and Center for Theoretical Biology, Peking University, Beijing 100871, P.R. China</p>
            </ins>
            <ins id="I3">
               <p>Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089-2910, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>269</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/269</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16725050</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-269</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>19</day>
               <month>12</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>25</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>25</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Lee et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The development of high-throughput technologies has produced several large scale protein interaction data sets for multiple species, and significant efforts have been made to analyze the data sets in order to understand protein activities. Considering that the basic units of protein interactions are domain interactions, it is crucial to understand protein interactions at the level of the domains. The availability of many diverse biological data sets provides an opportunity to discover the underlying domain interactions within protein interactions through an integration of these biological data sets.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We combine protein interaction data sets from multiple species, molecular sequences, and gene ontology to construct a set of high-confidence domain-domain interactions. First, we propose a new measure, the expected number of interactions for each pair of domains, to score domain interactions based on protein interaction data in one species and show that it has similar performance as the E-value defined by Riley <it>et al</it>. <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Our new measure is applied to the protein interaction data sets from yeast, worm, fruitfly and humans. Second, information on pairs of domains that coexist in known proteins and on pairs of domains with the same gene ontology function annotations are incorporated to construct a high-confidence set of domain-domain interactions using a Bayesian approach. Finally, we evaluate the set of domain-domain interactions by comparing predicted domain interactions with those defined in iPfam database <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp> that were derived based on protein structures. The accuracy of predicted domain interactions are also confirmed by comparing with experimentally obtained domain interactions from <it>H. pylori </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. As a result, a total of 2,391 high-confidence domain interactions are obtained and these domain interactions are used to unravel detailed protein and domain interactions in several protein complexes.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our study shows that integration of multiple biological data sets based on the Bayesian approach provides a reliable framework to predict domain interactions. By integrating multiple data sources, the coverage and accuracy of predicted domain interactions can be significantly increased.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>With the completion of genome sequences of many species, comparative analysis of these organisms becomes increasingly important in understanding the function and evolution of genes and proteins. Comparison of the genome sequences between worm and yeast has revealed that most of the core biological functions were carried out by orthologous proteins, and that the multi-cellular worm had more diverse proteins than the unicellular yeast <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. In addition, more than 50 bacterial, archaeal, and eukaryotic genomes have been analyzed for protein function prediction, phylogenetic profiling of domains, and eukaryotic-signature domain organizations <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
         <p>The development of high-throughput technologies such as yeast two-hybrid assays has produced large scale protein interaction data sets for several species, and significant efforts have been made to analyze them. By combining protein interaction data sets and orthology information on yeast protein sequences and a bacterial pathogen, Kelley <it>et al</it>. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and Sharan <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> identified conserved protein interaction pathways and complexes. Further studies on conserved protein complexes and functional modules can be found in <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>The basic units of proteins are domains and proteins interact with each other through their domains. Therefore, it is crucial to understand protein interactions at the level of the domains <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Several groups have developed methods to understand domain interactions based on protein interactions. Sprinzak and Margalit <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> selected domain interaction pairs based on the frequency of observed protein interactions that contain the pair of domains over its expect value. Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> developed a maximum likelihood estimation (MLE) method and an Expectation-Maximization (EM) algorithm to infer underlying domain interactions from protein interactions. Liu <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> extended the MLE method to combine protein interactions from multiple species, and showed that the extension resulted in a higher accuracy in predicting protein interactions than using the yeast protein interactions alone. Liu <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> also showed that, for a single species, the approach by Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> was comparable to that of Gomez <it>et al</it>. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and outperformed those of the Sprinzak and Margalit <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> and the Gomez <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> for predicting protein interactions. More recently, Riley <it>et al</it>. <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> modified the Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> approach to be applicable to all the protein interactions in DIP <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp> assuming no false positives and false negatives. Most importantly, they presented a new score for domain interactions, the E-score, defined as the log likelihood ratio of the observed interactions assuming the domain pairs interact over assuming the domain pairs do not interact. They showed that the E-score outperformed the Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> method in predicting domain interactions. Other approaches for predicting domain interactions using multiple data sources were developed in <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. In this study, we focus on the integration of multiple data sources from multiple species to predict high-confidence domain interactions. First, we calculate the probability of domain interactions from four species: yeast <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>, worm <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, fruitfly <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and humans <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, respectively. Using these probabilities, we compute the expected number of interactions for each pair of domains within a species. Second, we investigate information on protein fusion and the domain functions. Third, a Bayesian approach is used to integrate those data sources to predict high-confidence domain interactions. These predictions help us to unravel the domain interactions in protein complexes and protein interactions. Our study differs from previous studies in several significant ways. Compared to Liu <it>et al</it>. and Ng <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>, our approach develop a new measure to score domain-domain interactions and validate it with experimentally derived domain interactions instead of using indirect ways such as validating re-inferred protein interactions. Compared to Riley <it>et al</it>. <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, protein fusion and Gene Ontology (GO) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> functions are also integrated using a Bayesian approach. We show that the integration significantly increases the accuracy of predicted domain-domain interactions.</p>
         <p>The paper is organized as follows. In the Methods section, we present the various data sources used in our analysis, followed by the methods for analyzing an integration of the different data sources. In the Results section, we present the results based on the various data sources separately, followed by the results based on integrated analysis. We evaluate our results by comparing with the domain-domain interactions in iPfam. Finally, we show limitations of our approach and further studies.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data sources</p>
            </st>
            <p>In this study, we collect protein interactions and protein domain information from various databases for yeast, worm, fruitfly, and humans. Protein domain information is based on the Pfam-A domains <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Table <tblr tid="T1">1</tblr> shows the number of proteins and protein interactions used in this study. Because only a subset of proteins contain Pfam-A domains, we use this subset along withtheir protein interactions in this study.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Data sets. The characteristics of protein interaction data sets for yeast, worm, fruitfly and humans, the corresponding-domain information, and the values of <it>fn </it>and <it>fp </it>used in the analysis. Only protein pairs with both proteins containing Pfam-A domains are included in the protein interaction data sets, and proteins in those protein interactions are counted. The numbers in the parenthesis are the total number of available protein interactions.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Yeast</p>
                     </c>
                     <c ca="center">
                        <p>Worm</p>
                     </c>
                     <c ca="center">
                        <p>Fruitfly</p>
                     </c>
                     <c ca="center">
                        <p>Humans</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Proteins</p>
                     </c>
                     <c ca="center">
                        <p>2,568</p>
                     </c>
                     <c ca="center">
                        <p>1,580</p>
                     </c>
                     <c ca="center">
                        <p>2,444</p>
                     </c>
                     <c ca="center">
                        <p>3,493</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein-protein interactions</p>
                     </c>
                     <c ca="center">
                        <p>7,985 (15,461)</p>
                     </c>
                     <c ca="center">
                        <p>2,193 (4,030)</p>
                     </c>
                     <c ca="center">
                        <p>3,944 (20,429)</p>
                     </c>
                     <c ca="center">
                        <p>10,906 (15,274)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Domains</p>
                     </c>
                     <c ca="center">
                        <p>1,386</p>
                     </c>
                     <c ca="center">
                        <p>888</p>
                     </c>
                     <c ca="center">
                        <p>1,195</p>
                     </c>
                     <c ca="center">
                        <p>1,401</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>False Negative (fn)</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.67</p>
                     </c>
                     <c ca="center">
                        <p>0.61</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>False Positive (fp)</p>
                     </c>
                     <c ca="center">
                        <p>0.0009</p>
                     </c>
                     <c ca="center">
                        <p>0.0007</p>
                     </c>
                     <c ca="center">
                        <p>0.0005</p>
                     </c>
                     <c ca="center">
                        <p>0.0007</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <sec>
               <st>
                  <p>Protein interactions for yeast and worm</p>
               </st>
               <p>We download the protein interaction data sets for yeast and worm from the DIP database <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Each protein is associated with a DIP number, SWISSPROT ID, GI number, etc. We use the SWISSPROT accession numbers to associate domain information from the Pfam database <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> with the proteins in the DIP. We also use the GI numbers to obtain additional Pfam domain information from the National Center for Biotechnology Information <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. For worm, the domain information collected using the GI numbers increases the number of protein interactions with domain information.</p>
            </sec>
            <sec>
               <st>
                  <p>Protein interactions for fruitfly</p>
               </st>
               <p>We obtain the protein interaction data set for fruitfly from Giot <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. In this data set, protein names are identified by CG numbers. To obtain the relationship between proteins and domains, we associate the CG numbers with the SWISSPROT accession numbers by the protein table Integr8 in EMBL-EBI <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The compiled SWISSPROT accession numbers are used to extract protein-domain relationship from the Pfam database.</p>
            </sec>
            <sec>
               <st>
                  <p>Protein interactions for human</p>
               </st>
               <p>We obtain the human protein interaction data set from the Human Protein Reference Database (HPRD) <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, which contains protein-protein interactions from individual small-scale experiments published in theliterature. The proteinsare identified by NP numbers. We associate the NP numbers in the HPRD with the SWISSPROT accession numbers using the protein table Integr8 in EMBL-EBI, and then extractprotein-domain relationship from the Pfam database.</p>
            </sec>
            <sec>
               <st>
                  <p>Domain functions</p>
               </st>
               <p>We obtain domain functions, biological process, using the mapping table from Pfam to GO in the Gene Ontology webpage <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and use the domains in the table to compile domain pairs with the same function.</p>
            </sec>
            <sec>
               <st>
                  <p>Domain fusion</p>
               </st>
               <p>We use protein-domain information in Pfam-A to identify pairs of domains co-existing in one protein. The method is referred to as domain fusion in the rest of the paper.</p>
            </sec>
            <sec>
               <st>
                  <p>Databases of domain interactions</p>
               </st>
               <p>We use two structure based domain interactions: iPfam <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and Protein Quaternary Structure (PQS) <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> to estimate the reliability of predicted domain-domain interactions. iPfam contains 2,580 domain interactions(July 2004 version). The domain interactions in iPfam are obtained by calculating all bonds between all pairs of residues between domains based on the protein structures in Protein Data Bank (PDB). PQS provides probable quaternary states for structures based on PDB. In PQS, the analysis of determining biologically relevant interactions and crystal packing is attempted based on some known properties such as hydrophobicity, shape analysis, and the size of the solvent-accessible surface area (asa). Note that biologically relevant domain interactions and crystal contacts are not distinguished in iPfam. As domains in PQS are annotated by SCOP superfamily, we associate them with the Pfam domains using the mapping table in the SCOP webpage <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Finally, we obtain 36,439 domain interactions.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Computational methods</p>
            </st>
            <p>In this subsection, we describe (1) the computational methods for calculating the probability of domain-domain interactions, (2) a new measure to evaluate the strength of domain-domain interactions, and (3) a Bayesian method for integrating different data sources to construct a high-confidence set of domain-domain interactions.</p>
            <sec>
               <st>
                  <p>The maximum likelihood estimation for probabilities of domain-domain interactions</p>
               </st>
               <p>The maximum likelihood estimation method proposed by Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> has been shown to have good performance in estimating the probabilities of domain-domain interactions. We adopt this method in this study and briefly describe the method as follows.</p>
               <p>The basic assumption of the MLE method is that two proteins interact if and only if at least one pair of domains from each of the two proteins interact. Given two proteins <it>P</it><sub><it>i </it></sub>and <it>P</it><sub><it>j</it></sub>, the probability that they interact is</p>
               <p>
                  <graphic file="1471-2105-7-269-i1.gif"/>
               </p>
               <p>where <it>P</it><sub><it>ij </it></sub>= 1 if they interact and 0 otherwise, and <it>D</it><sub><it>mn </it></sub>&#8712; <graphic file="1471-2105-7-269-i2.gif"/><sub><it>ij </it></sub>denotes that domains <it>D</it><sub><it>m </it></sub>and <it>D</it><sub><it>n </it></sub>belong to proteins <it>P</it><sub><it>i </it></sub>and <it>P</it><sub><it>j</it></sub>, respectively, and <it>D</it><sub><it>mn </it></sub>= 1 if domain <it>D</it><sub><it>m </it></sub>interacts with domain <it>D</it><sub><it>n</it></sub>. For an experiment in a species, the false positive rate (<it>fp</it>) is defined as the probability that two non-interacting proteins were observed to interact and the false negative rate (<it>fn</it>) is defined as the probability that two truly interacting proteins were not observed to interact in the experiment. Let <it>O</it><sub><it>ij </it></sub>= 1 if the interaction between proteins <it>P</it><sub><it>i </it></sub>and <it>P</it><sub><it>j </it></sub>is observed and <it>O</it><sub><it>ij </it></sub>= 0 otherwise. Thus, the probability for the observed protein interaction is</p>
               <p>Pr(<it>O</it><sub><it>ij </it></sub>= 1) = Pr(<it>P</it><sub><it>ij </it></sub>= 1)(1 - <it>fn</it>) + (1 - Pr(<it>P</it><sub><it>ij </it></sub>= l))<it>fp</it>. &#160;&#160;&#160; (2)</p>
               <p>The likelihood function-the probability of the whole interaction data set is</p>
               <p>
                  <m:math name="1471-2105-7-269-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>L</m:mi>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8719;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>Pr</m:mi>
                                       <m:mo>&#8289;</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>O</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>O</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                    </m:mrow>
                                 </m:msup>
                              </m:mrow>
                           </m:mstyle>
                           <m:msup>
                              <m:mrow>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>Pr</m:mi>
                                 <m:mo>&#8289;</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>O</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mn>1</m:mn>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mi>O</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:msup>
                           <m:mo>.</m:mo>
                           <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mn>3</m:mn>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGmbatcqGH9aqpdaqeqbqaaiabcIcaOiGbccfaqjabckhaYjabcIcaOiabd+eapnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeyypa0JaeGymaeJaeiykaKIaeiykaKYaaWbaaSqabeaacqWGpbWtdaWgaaadbaGaemyAaKMaemOAaOgabeaaaaaaleaacqWGPbqAcqWGQbGAaeqaniabg+GivdGccqGGOaakcqaIXaqmcqGHsislcyGGqbaucqGGYbGCcqGGOaakcqWGpbWtdaWgaaWcbaGaemyAaKMaemOAaOgabeaakiabg2da9iabigdaXiabcMcaPiabcMcaPmaaCaaaleqabaGaeGymaeJaeyOeI0Iaem4ta80aaSbaaWqaaiabdMgaPjabdQgaQbqabaaaaOGaeiOla4IaaCzcaiaaxMaadaqadaqaaiabiodaZaGaayjkaiaawMcaaaaa@5C8C@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>Our objective is to maximize the likelihood <it>L</it>, which can be represented as the function of <it>P</it>(<it>D</it><sub><it>mn </it></sub>= 1) with fixed <it>fp </it>and <it>fn </it>by incorporating Equations 1, 2, and 3. <it>P</it>(<it>D</it><sub><it>mn </it></sub>= 1) can be estimated by an expectation-maximization (EM) algorithm <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> presented a method to approximate the values of <it>fn </it>and <it>fp </it>based on the number of observed interactions. We combine this idea and the reliability of protein interaction data sets to approximate values of <it>fn </it>and <it>fp </it>in each species used in this study. The results are shown in Table <tblr tid="T1">1</tblr>. The details are presented in the <supplr sid="S1">additional file 1</supplr>.</p>
               <suppl id="S1">
                  <title>
                     <p>Additional file 1</p>
                  </title>
                  <text>
                     <p><b>False positive (<it>fp</it>) and false negative (<it>fn</it>) of the observed protein interactions</b>. It contains equations to calculate <it>fp </it>and <it>fn </it>values for the protein interactions used in the study and effects of various <it>fp </it>and <it>fn </it>values to the inference the domain interactions.</p>
                  </text>
                  <file name="1471-2105-7-269-S1.pdf">
                     <p>Click here for file</p>
                  </file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>The expected number of occurrences of domain interactions</p>
               </st>
               <p>Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> used the estimated value of <it>P</it>(<it>D</it><sub><it>mn </it></sub>= 1) to rank domain-domain interactions. One problem of the approach is that the estimated value of <it>P</it>(<it>D</it><sub><it>mn </it></sub>= 1) is generally large if (1) each of the two domains appears only in one protein, (2) each of these two proteins contains only one domain, and (3) these two proteins interact. Another problem is that the value of <it>P</it>(<it>D</it><sub><it>mn </it></sub>= 1) is generally small if (1) both domains appear in many proteins and (2) only a small proportion of these pairs of proteins having these two domains interact.</p>
               <p>In order to overcome these problems, we score each domain pairs by the expected number of occurrences of domain interactions.</p>
               <p>E(#<it>D</it><sub><it>mn</it></sub>) = <it>N</it><sub><it>mn </it></sub>Pr(<it>D</it><sub><it>mn </it></sub>= 1), &#160;&#160;&#160; (4)</p>
               <p>where <it>N</it><sub><it>mn </it></sub>is the number of protein pairs having domains <it>D</it><sub><it>m </it></sub>and <it>D</it><sub><it>n</it></sub>. Our intuition is that if a pair of domains are observed in multiple protein interactions, this pair of domains are more likely to interact. We use E() as a feature in our integrative model.</p>
            </sec>
            <sec>
               <st>
                  <p>Domain fusion</p>
               </st>
               <p>In addition to the protein interaction data, we also incorporate information on domain fusion and domain function to build a set of high-confidence domain-domain interactions. Enright <it>et al</it>. <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> and Marcotte <it>et al</it>. <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> showed that two proteins are more likely to interact if they are fused into one protein in another species. This idea can be further extended to domains in that if two domains are fused in one protein in any species, they are more likely to interact. Thus, we search proteins having multiple Pfam-A domains and 9,615 Pfam-A domain pairs that co-exist in the same proteins are obtained. We define CE(<it>D</it><sub><it>mn</it></sub>), where CE stands for Co-Existence, as the number of occurrences that domain <it>D</it><sub><it>m </it></sub>and domain <it>D</it><sub><it>n </it></sub>co-exist in the same proteins. It is expected that if CE(<it>D</it><sub><it>mn</it></sub>) is larger, domain <it>D</it><sub><it>m </it></sub>and domain <it>D</it><sub><it>n </it></sub>are more likely to interact. We use CE() as a feature in our integrative model.</p>
            </sec>
            <sec>
               <st>
                  <p>Domain functions</p>
               </st>
               <p>We obtain gene ontology terms of domains and find 57,907 domain pairs having the same GO terms in the category of the biological process. The gene ontology has a hierarchical structure (a directed acyclic graph), where the parents denote functions of more general terms and the offsprings represent functions of more specific terms. It is expected that two domains participating in the same GO function (biological process) are more likely to interact than they do in different functions. Moreover, two domains participating in a more specific function are more likely to interact than they do in a more general function. A more specific function generally covers a smaller number of domains. Assume that domain <it>D</it><sub><it>m </it></sub>and domain <it>D</it><sub><it>n </it></sub>have the same function <it>F</it><sub><it>f</it></sub>. We define SG(<it>D</it><sub><it>mn</it></sub>), where SG stands for the Same Gene ontology, as the number of domains having the function <it>F</it><sub><it>f</it></sub>. We use SG() as a feature in our integrative model.</p>
            </sec>
            <sec>
               <st>
                  <p>Integrating multiple data sources</p>
               </st>
               <p>The six information sources can be combined to construct a high-confidence set of domain-domain interactions. Several heuristic methods can be used for data integration. Here we consider three approaches: evidence counting, na&#239;ve Bayesian, and logistic regression.</p>
               <p>For each pair of domains, six information sources for their interaction can be obtained from the analysis of the expected number of domain interactions derived from protein interactions of four species, the number of occurrences in the domain fusion, and the number of domains with the same GO annotation. We applied the aforementioned three computational methods to integrate these six biological evidences to predict domain interactions. The methods are described as follows.</p>
               <sec>
                  <st>
                     <p>Evidence counting</p>
                  </st>
                  <p>The number of evidences supporting domain interactions is used to score domain pairs for potential interactions. For a pair of domains <it>D</it><sub><it>m </it></sub>and <it>D</it><sub><it>n</it></sub>, we say that the interaction between <it>D</it><sub><it>m </it></sub>and <it>D</it><sub><it>n </it></sub>is supported by the yeast protein interactions if the expected number of occurrences of domain interactions is at least 1, i.e <it>E</it>(#<it>D</it><sub><it>mn</it></sub>) &#8805; 1. We count this as one evidence. A domain interaction can have a maximum of 4 evidences from yeast, worm, fruitfly and humans. Similarly, we say that the interaction between <it>D</it><sub><it>m </it></sub>and <it>D</it><sub><it>n </it></sub>is supported by the domain fusion if <it>CE</it>(<it>D</it><sub><it>mn</it></sub>) &#8805; 1, and by the domain functions if <it>SG</it>(<it>D</it><sub><it>mn</it></sub>) &#8805; 1. The number of evidences for a pair of domains ranges from 0 to 6.</p>
               </sec>
               <sec>
                  <st>
                     <p>Na&#239;ve Bayesian</p>
                  </st>
                  <p>The na&#239;ve Bayesian approach assumes the independence of data sources, and has been applied to the integration of multiple data sources for predicting protein interactions <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. The basic idea is to calculate the likelihood ratio of each of the six evidences and then multiply these likelihood ratios. We define the set of observed interactions (Obs) as the interacting domain pairs in iPfam and the set of non-observed interactions (Nobs) as the domain pairs not presented in iPfam. The likelihood ratio for six data sources are calculated as follows. For each species, we split the values of E(#<it>D</it><sub><it>mn</it></sub>) into 7 intervals. We call an interval as a bin, and this process as a binning process. Let <it>d </it>= E(#<it>D</it><sub><it>mn</it></sub>) and <it>d </it>falls into the <it>t</it>-th bin. Let Pr(<it>d</it>|<it>Obs</it>) be the fraction of the observed interactions in the <it>t</it>-th bin and let Pr(<it>d</it>|<it>Nobs</it>) be the fraction of the non-observed interactions in the <it>t</it>-th bin. Then, the likelihood ratio for the <it>t</it>-th bin is Pr(<it>d</it>|<it>Obs</it>)/Pr(<it>d</it>|<it>Nobs</it>). Similarly, we bin the values of CE(<it>D</it><sub><it>mn</it></sub>) and SG(<it>D</it><sub><it>mn</it></sub>) and then calculate the likelihood ratio for each of them. <supplr sid="S2">Additional file 2</supplr> shows the likelihood ratios for each data source. Let <it>d</it><sub>1</sub>,..., <it>d</it><sub>4 </sub>be the values of E(#<it>D</it><sub><it>mn</it></sub>) in yeast, worm, fruitfly, and humans, respectively, and let <it>d</it><sub>5 </sub>and <it>d</it><sub>6 </sub>be the values of CE(<it>D</it><sub><it>mn</it></sub>) and SG(<it>D</it><sub><it>mn</it></sub>), respectively. Then, the total likelihood ratio is</p>
                  <suppl id="S2">
                     <title>
                        <p>Additional file 2</p>
                     </title>
                     <text>
                        <p><b>The likelihood ratio of six data sources</b>. The values for domain interactions inferred from six data sources are binned into discrete intervals and the likelihood ratio is calculated.</p>
                     </text>
                     <file name="1471-2105-7-269-S2.pdf">
                        <p>Click here for file</p>
                     </file>
                  </suppl>
                  <p>
                     <m:math name="1471-2105-7-269-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mtext>L</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>D</m:mi>
                                 <m:mrow>
                                    <m:mi>m</m:mi>
                                    <m:mi>n</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>=</m:mo>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8719;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mn>6</m:mn>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:mi>Pr</m:mi>
                                          <m:mo>&#8289;</m:mo>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>d</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo>|</m:mo>
                                          <m:mi>O</m:mi>
                                          <m:mi>b</m:mi>
                                          <m:mi>s</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>Pr</m:mi>
                                          <m:mo>&#8289;</m:mo>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>d</m:mi>
                                             <m:mi>i</m:mi>
                                          </m:msub>
                                          <m:mo>|</m:mo>
                                          <m:mi>N</m:mi>
                                          <m:mi>o</m:mi>
                                          <m:mi>b</m:mi>
                                          <m:mi>s</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                              </m:mstyle>
                              <m:mo>.</m:mo>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGmbatcqGGOaakcqWGebardaWgaaWcbaGaemyBa0MaemOBa4gabeaakiabcMcaPiabg2da9maarahabaWaaSaaaeaacyGGqbaucqGGYbGCcqGGOaakcqWGKbazdaWgaaWcbaGaemyAaKgabeaakiabcYha8jabd+eapjabdkgaIjabdohaZjabcMcaPaqaaiGbccfaqjabckhaYjabcIcaOiabdsgaKnaaBaaaleaacqWGPbqAaeqaaOGaeiiFaWNaemOta4Kaem4Ba8MaemOyaiMaem4CamNaeiykaKcaaaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaeGOnaydaniabg+GivdGccqGGUaGlaaa@568B@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </p>
               </sec>
               <sec>
                  <st>
                     <p>Logistic regression</p>
                  </st>
                  <p>Let <it>E</it><sub><it>y</it></sub>(#<it>D</it><sub><it>mn</it></sub>), <it>E</it><sub><it>w</it></sub>(#<it>D</it><sub><it>mn</it></sub>), <it>E</it><sub><it>f</it></sub>(#<it>D</it><sub><it>mn</it></sub>), and <it>E</it><sub><it>h</it></sub>(#<it>D</it><sub><it>mn</it></sub>) denote the expected number of occurrences of the domain interactions in yeast, worm, fruitfly and humans, respectively. Let I(<it>d</it>) be the indicator function: I(<it>d</it>) = 1 if <it>d </it>&#8805; 1 and 0, otherwise. Let <it>EV</it>(<it>D</it><sub><it>mn</it></sub>) be the number of evidences from the <it>evidence counting </it>method. We use the following model,</p>
                  <p>
                     <m:math name="1471-2105-7-269-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                        <m:semantics>
                           <m:mrow>
                              <m:mtable columnalign="left">
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mi>log</m:mi>
                                          <m:mo>&#8289;</m:mo>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>Pr</m:mi>
                                                <m:mo>&#8289;</m:mo>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:msub>
                                                   <m:mi>D</m:mi>
                                                   <m:mrow>
                                                      <m:mi>m</m:mi>
                                                      <m:mi>n</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mi>Pr</m:mi>
                                                <m:mo>&#8289;</m:mo>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:msub>
                                                   <m:mi>D</m:mi>
                                                   <m:mrow>
                                                      <m:mi>m</m:mi>
                                                      <m:mi>n</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                                <m:mo>=</m:mo>
                                                <m:mn>1</m:mn>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd columnalign="left">
                                       <m:mo>=</m:mo>
                                    </m:mtd>
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mi>&#945;</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>&#946;</m:mi>
                                             <m:mn>1</m:mn>
                                          </m:msub>
                                          <m:msub>
                                             <m:mi>E</m:mi>
                                             <m:mi>y</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mo>#</m:mo>
                                          <m:msub>
                                             <m:mi>D</m:mi>
                                             <m:mrow>
                                                <m:mi>m</m:mi>
                                                <m:mi>n</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>&#946;</m:mi>
                                             <m:mn>2</m:mn>
                                          </m:msub>
                                          <m:msub>
                                             <m:mi>E</m:mi>
                                             <m:mi>w</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mo>#</m:mo>
                                          <m:msub>
                                             <m:mi>D</m:mi>
                                             <m:mrow>
                                                <m:mi>m</m:mi>
                                                <m:mi>n</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>&#946;</m:mi>
                                             <m:mn>3</m:mn>
                                          </m:msub>
                                          <m:msub>
                                             <m:mi>E</m:mi>
                                             <m:mi>f</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mo>#</m:mo>
                                          <m:msub>
                                             <m:mi>D</m:mi>
                                             <m:mrow>
                                                <m:mi>m</m:mi>
                                                <m:mi>n</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>&#946;</m:mi>
                                             <m:mn>4</m:mn>
                                          </m:msub>
                                          <m:msub>
                                             <m:mi>E</m:mi>
                                             <m:mi>h</m:mi>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mo>#</m:mo>
                                          <m:msub>
                                             <m:mi>D</m:mi>
                                             <m:mrow>
                                                <m:mi>m</m:mi>
                                                <m:mi>n</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow/>
                                    </m:mtd>
                                    <m:mtd columnalign="left">
                                       <m:mrow/>
                                    </m:mtd>
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>&#946;</m:mi>
                                             <m:mn>5</m:mn>
                                          </m:msub>
                                          <m:mi>I</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>C</m:mi>
                                          <m:mi>E</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>D</m:mi>
                                             <m:mrow>
                                                <m:mi>m</m:mi>
                                                <m:mi>n</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>&#946;</m:mi>
                                             <m:mn>6</m:mn>
                                          </m:msub>
                                          <m:mi>I</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>S</m:mi>
                                          <m:mi>G</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>D</m:mi>
                                             <m:mrow>
                                                <m:mi>m</m:mi>
                                                <m:mi>n</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>+</m:mo>
                                          <m:msub>
                                             <m:mi>&#946;</m:mi>
                                             <m:mn>7</m:mn>
                                          </m:msub>
                                          <m:mi>E</m:mi>
                                          <m:mi>V</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>D</m:mi>
                                             <m:mrow>
                                                <m:mi>m</m:mi>
                                                <m:mi>n</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>.</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqaaeGadaaabaGagiiBaWMaei4Ba8Maei4zaC2aaSaaaeaacyGGqbaucqGGYbGCcqGGOaakcqWGebardaWgaaWcbaGaemyBa0MaemOBa4gabeaakiabg2da9iabigdaXiabcMcaPaqaaiabigdaXiabgkHiTiGbccfaqjabckhaYjabcIcaOiabdseaenaaBaaaleaacqWGTbqBcqWGUbGBaeqaaOGaeyypa0JaeGymaeJaeiykaKcaaaqaaiabg2da9aqaaGGaciab=f7aHjabgUcaRiab=j7aInaaBaaaleaacqaIXaqmaeqaaOGaemyrau0aaSbaaSqaaiabdMha5bqabaGccqGGOaakcqGGJaWicqWGebardaWgaaWcbaGaemyBa0MaemOBa4gabeaakiabcMcaPiabgUcaRiab=j7aInaaBaaaleaacqaIYaGmaeqaaOGaemyrau0aaSbaaSqaaiabdEha3bqabaGccqGGOaakcqGGJaWicqWGebardaWgaaWcbaGaemyBa0MaemOBa4gabeaakiabcMcaPiabgUcaRiab=j7aInaaBaaaleaacqaIZaWmaeqaaOGaemyrau0aaSbaaSqaaiabdAgaMbqabaGccqGGOaakcqGGJaWicqWGebardaWgaaWcbaGaemyBa0MaemOBa4gabeaakiabcMcaPiabgUcaRiab=j7aInaaBaaaleaacqaI0aanaeqaaOGaemyrau0aaSbaaSqaaiabdIgaObqabaGccqGGOaakcqGGJaWicqWGebardaWgaaWcbaGaemyBa0MaemOBa4gabeaakiabcMcaPaqaaaqaaaqaaiabgUcaRiab=j7aInaaBaaaleaacqaI1aqnaeqaaOGaemysaKKaeiikaGIaem4qamKaemyrauKaeiikaGIaemiraq0aaSbaaSqaaiabd2gaTjabd6gaUbqabaGccqGGPaqkcqGGPaqkcqGHRaWkcqWFYoGydaWgaaWcbaGaeGOnaydabeaakiabdMeajjabcIcaOiabdofatjabdEeahjabcIcaOiabdseaenaaBaaaleaacqWGTbqBcqWGUbGBaeqaaOGaeiykaKIaeiykaKIaey4kaSIae8NSdi2aaSbaaSqaaiabiEda3aqabaGccqWGfbqrcqWGwbGvcqGGOaakcqWGebardaWgaaWcbaGaemyBa0MaemOBa4gabeaakiabcMcaPiabc6caUaaaaaa@A726@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </p>
               </sec>
            </sec>
            <sec>
               <st>
                  <p>Validating the predicted domain interactions</p>
               </st>
               <p>To evaluate the reliability of the predicted domain interactions, we compare them with the domain interactions in iPfam. The interactions in iPfam are treated as the observed interactions. Although many domain interactions are not included in the database, a good score function for domain interactions should include a higher fraction of observed interactions in the highest ranked predictions than a random scoring function. Therefore, for a given scoring range, the fraction of the observed interactions among all domain pairs having scores within the range is calculated. We also calculate the ratio of this fraction over that from a random scoring function and refer to it as the fold value. For a good score function, the fold value should increase with the score.</p>
               <p>Another method to evaluate the reliability of predicted domain interactions is using the Receiver Operating Characteristic (ROC) curve representing the relationship between false positive rate (FPR) and sensitivity (SN). As we mentioned before, we use domain pairs in iPfam as the observed interactions and domain pairs not in iPfam as the non-observed interactions. Because this gives too many non-observed interactions (1,536,555), we randomly remove domain pairs without any evidence and finally obtain 84,385 domain pairs, about twice of the number of domain pairs with at least one evidence, for the non-observed set. For a given threshold value t, domain pairs with score larger than t are predicted as interacting and others as non-interacting. The results can be represented as</p>
               <p>The FPR and SN are defined as</p>
               <p>
                  <m:math name="1471-2105-7-269-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mi>F</m:mi>
                                       <m:mi>P</m:mi>
                                       <m:mi>R</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mi>F</m:mi>
                                             <m:mi>P</m:mi>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mi>F</m:mi>
                                             <m:mi>P</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mi>T</m:mi>
                                             <m:mi>N</m:mi>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mo>,</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mi>S</m:mi>
                                       <m:mi>N</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mi>T</m:mi>
                                             <m:mi>P</m:mi>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mi>T</m:mi>
                                             <m:mi>P</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mi>F</m:mi>
                                             <m:mi>N</m:mi>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mo>.</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqacaaabaGaemOrayKaemiuaaLaemOuaiLaeyypa0ZaaSaaaeaacqWGgbGrcqWGqbauaeaacqWGgbGrcqWGqbaucqGHRaWkcqWGubavcqWGobGtaaGaeiilaWcabaGaem4uamLaemOta4Kaeyypa0ZaaSaaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGobGtaaGaeiOla4caaaaa@45EC@</m:annotation>
                     </m:semantics>
                  </m:math>
               </p>
               <p>We use five-fold cross-validation to compare the performance. We use a subset of iPfam domain interactions for training to calculate the likelihood ratio of the Bayesian approach and the coefficients of the logistic regression. The remaining iPfam domain interactions are used for testing.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Conserved domain interactions across multiple species</p>
            </st>
            <p>We first show that domain interactions inferred from multiple species are reliable. The four species share many domains. Table <tblr tid="T1">1</tblr> shows the number of proteins, the numbers of protein-protein interactions, and the numbers of domains in each species. Figure <figr fid="F1">1</figr> shows the numbers of domains overlapped among the different species. Most domains appear in more than one species. For example, 953 out of 1,386 domains in yeast (69%) are found in at least one of the other three species. Similarly, 763 out of 888 domains in worm (86%) are found in other species. For fruitfly and humans, 82% and 68% are found in other species, respectively.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>A Venn diagram for the numbers of domains in yeast, worm, fruitfly, and humans</p>
               </caption>
               <text>
                  <p>A Venn diagram for the numbers of domains in yeast, worm, fruitfly, and humans. (a) The numbers of domains in yeast, worm, and fruitfly. (b) The numbers of domains between humans and the other three species.</p>
               </text>
               <graphic file="1471-2105-7-269-1"/>
            </fig>
            <p>We apply the MLE method to calculate probabilities of domain interactions. The numbers of domain interactions obtained (probability>0) for yeast, worm, fruitfly, and humans are 7,333, 2,397, 3,779, and 7,750, respectively. Figure <figr fid="F2">2</figr> shows the numbers of predicted domain interactions among four species together with the overlaps. 812 (4.0%) out of a total of 20,332 predicted domain interactions from the four species are presented in at least two species, which we call predicted conserved domain interactions. Although this fraction is relatively small, we find that this fraction is still three times higher than that of random interactions [See <supplr sid="S3">additional file 3</supplr>]. This result is consistent with other studies <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B38">38</abbr></abbrgrp>, which found only a small percentage of conserved protein interactions across several species. We compare these 812 domain interactions with iPfam. Table <tblr tid="T2">2</tblr> shows that, surprisingly, 18.2% of the 812 conserved domain interactions are found in iPfam, compared to only 3.0% for all of the predicted 20,332 domain interactions. Furthermore, 50% of the domain interactions presented in all four species belong to iPfam. The results suggest that the predicted conserved domain interactions obtained from at least two species are very reliable. Similar results are obtained (Table <tblr tid="T2">2</tblr>) by comparing the predicted conserved domain interactions with domain interactions obtained from the Protein Quaternary Structure (PQS) database <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The list of predicted conserved domain interactions from at least three species is presented in <supplr sid="S4">additional file 4</supplr>.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Comparison with predicted conserved domain interactions and random interactions</b>. Table S2 shows the significance of the number of predicted conserved domain interactions compared to the random interactions.</p>
               </text>
               <file name="1471-2105-7-269-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p><b>List of conserved domain interactions predicted from protein interactions of at least three species</b>. These conserved domain interaction have 31% of overlaps with domain interactions in iPfam.</p>
               </text>
               <file name="1471-2105-7-269-S4.htm">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>The numbers of predicted domain interactions using protein interactions. The predicted domain interactions classified by the number of species (1,2,3 and 4) and their overlaps with iPfam and PQS.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>Species</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>All</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Predicted domain interactions</p>
                     </c>
                     <c ca="center">
                        <p>19,520</p>
                     </c>
                     <c ca="center">
                        <p>707</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>20,332</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overlap with iPfam (Ratio)</p>
                     </c>
                     <c ca="center">
                        <p>468 (2.4%)</p>
                     </c>
                     <c ca="center">
                        <p>115 (16.2%)</p>
                     </c>
                     <c ca="center">
                        <p>28 (29.5%)</p>
                     </c>
                     <c ca="center">
                        <p>5 (50%)</p>
                     </c>
                     <c ca="center">
                        <p>616 (3.0%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overlap with PQS (Ratio)</p>
                     </c>
                     <c ca="center">
                        <p>883 (4.5%)</p>
                     </c>
                     <c ca="center">
                        <p>147 (20.8%)</p>
                     </c>
                     <c ca="center">
                        <p>31 (32.6%)</p>
                     </c>
                     <c ca="center">
                        <p>4 (40%)</p>
                     </c>
                     <c ca="center">
                        <p>1,065 (5.2%)</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>A Venn diagram for the numbers of predicted domain-domain interactions in yeast, worm, fruitfly, and humans</p>
               </caption>
               <text>
                  <p>A Venn diagram for the numbers of predicted domain-domain interactions in yeast, worm, fruitfly, and humans. (a) The numbers of predicted domain-domain interactions in yeast, worm, and fruitfly. (b) The numbers of predicted domain-domain interactions between humans and the other three species.</p>
               </text>
               <graphic file="1471-2105-7-269-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Contributions of each data source to the accuracy of predicted domain interactions</p>
            </st>
            <p>We first evaluate the contributions of each of the six information sources to the accuracy of predicted domain-domain interactions by comparing with the domain interactions in iPfam. To score domain interactions based on protein interactions, three measures are considered. The first measure is based on the estimated value of the probability of domain interactions (<it>P</it>(<it>D</it><sub><it>mn </it></sub>= 1)). The second is the number of times the domain pairs occur in protein pairs (<it>N</it><sub><it>mn</it></sub>). The last is the multiplication of the first two, <it>N</it><sub><it>mn</it></sub><it>P</it>(<it>D</it><sub><it>mn </it></sub>= 1). These measures are referred as <it>probability</it>, <it>frequency</it>, and <it>expectation</it>, respectively. We also compare with another measure defined as <it>E-value </it>by <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The performance of each score function is evaluated by the true positive rate (TP/(TP+FP)) among the top r ranked domain pairs. For each score function and a rank value r, domain pairs with the top r ranked scores are predicted as interacting. The predicted domain interactions are compared with domain interactions in iPfam. Figure <figr fid="F3">3</figr> shows the relationship between the true positive rate and the rank r based on the four different score functions. For given r, the fractions of observed interactions among the top r ranked domain pairs based on <it>expectation </it>and <it>E-value </it>are higher than that based on <it>probability </it>and <it>frequency</it>. Figure <figr fid="F3">3</figr> indicates that the scores based on <it>expectation </it>and <it>E-value </it>have similar performance and outperform the other two scores in evaluating domain interactions. As another way of comparison, we also draw ROC curves based on the four score functions and they are given in <supplr sid="S5">additional file 5</supplr>. The relative performance of the four score measures based on ROC curves is similar as above.</p>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p><b>ROC curves of predicted domain interactions using yeast, worm, fruitfly and humans</b>. Figure S2 shows the comparison of performances of score functions to predict domain interactions for four species.</p>
               </text>
               <file name="1471-2105-7-269-S5.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>The relationship between rank and true positive rate (TP/(TP+FP)) compared to the iPfam for four species based on four score functions</p>
               </caption>
               <text>
                  <p>The relationship between rank and true positive rate (TP/(TP+FP)) compared to the iPfam for four species based on four score functions. "Expectation" ranks domain pairs according to the expected number of occurrences of domain pairs in protein interactions; "Probability" ranks domain pairs according to the estimated probability of interactions from the MLE method; "Frequency" ranks domain pairs according to the number of protein interactions having domain pair; "E-value" ranks domain pairs according to the E-value defined in [1].</p>
               </text>
               <graphic file="1471-2105-7-269-3"/>
            </fig>
            <p>We next consider the relationship between domain fusion and domain interactions. Similar ideas have been applied to <it>E. coli </it>and <it>S. cerevisiae </it>to infer protein interactions <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. From Pfam, we collect 9,615 Pfam-A domain pairs that co-exist in the same proteins, among which 1,141 overlap with iPfam (Table <tblr tid="T3">3</tblr>). 859 domain pairs found through domain fusion are found to interact within at least one species based on protein interaction data, among which 283 (32.9%) overlap with iPfam. The results suggest that the co-existence of domain pairs is a reliable evidence for domain interactions and combining multiple evidences reduces the number of false positives.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>The numbers of predicted domain interactions using domain fusion, domain function, and combining six data sets. The predicted domain interactions, the number of evidences, and the overlaps with iPfam. Numbers in the first column indicate the number of evidences for the domain interactions, and the second column is the number of interactions having the corresponding evidences. "PPI" represents the protein interaction data sets. "Fraction" indicates the fraction of domain interactions in iPfam in a given set. "Fold" indicates the ratio of the fraction over expected value (0.17%).</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Evidence</p>
                     </c>
                     <c ca="center">
                        <p>Interactions</p>
                     </c>
                     <c ca="center">
                        <p>Overlap with iPfam</p>
                     </c>
                     <c ca="center">
                        <p>Fraction</p>
                     </c>
                     <c ca="center">
                        <p>Fold</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Random domain pairs</p>
                     </c>
                     <c ca="center">
                        <p>1,539,135</p>
                     </c>
                     <c ca="center">
                        <p>2,580</p>
                     </c>
                     <c ca="center">
                        <p>0.17%</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Domain fusion</p>
                     </c>
                     <c ca="center">
                        <p>9,615</p>
                     </c>
                     <c ca="center">
                        <p>1,141</p>
                     </c>
                     <c ca="center">
                        <p>11.8%</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Domain fusion &amp; PPI</p>
                     </c>
                     <c ca="center">
                        <p>859</p>
                     </c>
                     <c ca="center">
                        <p>283</p>
                     </c>
                     <c ca="center">
                        <p>32.9%</p>
                     </c>
                     <c ca="center">
                        <p>194</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Same GO terms</p>
                     </c>
                     <c ca="center">
                        <p>57,907</p>
                     </c>
                     <c ca="center">
                        <p>1,302</p>
                     </c>
                     <c ca="center">
                        <p>0.8%</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Same GO terms &amp; PPI</p>
                     </c>
                     <c ca="center">
                        <p>1,031</p>
                     </c>
                     <c ca="center">
                        <p>234</p>
                     </c>
                     <c ca="center">
                        <p>22.7%</p>
                     </c>
                     <c ca="center">
                        <p>134</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 1</p>
                     </c>
                     <c ca="center">
                        <p>23,606</p>
                     </c>
                     <c ca="center">
                        <p>2,071</p>
                     </c>
                     <c ca="center">
                        <p>8.8%</p>
                     </c>
                     <c ca="center">
                        <p>52</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 2</p>
                     </c>
                     <c ca="center">
                        <p>1,624</p>
                     </c>
                     <c ca="center">
                        <p>820</p>
                     </c>
                     <c ca="center">
                        <p>50.5%</p>
                     </c>
                     <c ca="center">
                        <p>297</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 3</p>
                     </c>
                     <c ca="center">
                        <p>307</p>
                     </c>
                     <c ca="center">
                        <p>200</p>
                     </c>
                     <c ca="center">
                        <p>65.1%</p>
                     </c>
                     <c ca="center">
                        <p>383</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 4</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                     <c ca="center">
                        <p>43</p>
                     </c>
                     <c ca="center">
                        <p>74.1%</p>
                     </c>
                     <c ca="center">
                        <p>436</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 5</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>76.9%</p>
                     </c>
                     <c ca="center">
                        <p>452</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>= 6</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>We also incorporate information on domain pairs with the same GO annotations. It is known that proteins having similar functions are more likely to interact <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. In fact, the observation is true for domains as well. We find 57,907 domain pairs having the same GO terms in the category of biological process. 1,031 domain pairs are also found in predicted domain interactions based on protein interaction data, among which 234 (22.7%) domain interactions are found in iPfam (Table <tblr tid="T3">3</tblr>).</p>
         </sec>
         <sec>
            <st>
               <p>Integration of multiple biological data sources</p>
            </st>
            <p>We integrate six data sources using different methods described in the Methods section, and compare the performance using a five-fold cross-validation. We first show the improvement of integrating multiple biological data sources. Table <tblr tid="T3">3</tblr> shows the percentages of overlaps between iPfam and the predicted domain interactions with multiple evidences. The results indicate that one single evidence is not sufficient for predicting domain interactions as only 8.8% of these domain interactions overlap with iPfam. However, the percentage of overlaps increases to 50.5% for domain interactions with two or more evidences. As the number of evidences increases, the predictions are more accurate but, the number of predictions decreases at the same time. Only 58 predicted domain interactions have four or more evidences and 43 out of 58 (= 74.1%) belong to iPfam.</p>
            <p>Table <tblr tid="T4">4</tblr> shows the percentages of overlaps between iPfam and the predicted domain interactions based on the Bayesian approach. The fraction of domain pairs overlapped with iPfam increases as the likelihood ratio score increases. 80.0% of the 420 domain pairs with likelihood ratio scores greater than 50 are found in iPfam, a 471-fold increase over that of random domain pairs. Comparing Table <tblr tid="T3">3</tblr> with Table <tblr tid="T4">4</tblr>, we conclude that the likelihood ratio score significantly increases the number of high-confidence domain interaction pairs.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>The likelihood ratio values of predicted domain interactions. The likelihood ratio values of predicted domain interaction, the numbers of predicted domain interactions, and the overlap with iPfam. Numbers in the first column indicate the likelihood ratio values for the domain interactions, and the second column is the number of interactions having the corresponding likelihood ratio values.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Likelihood ratio values</p>
                     </c>
                     <c ca="center">
                        <p>Interactions</p>
                     </c>
                     <c ca="center">
                        <p>Overlap with iPfam</p>
                     </c>
                     <c ca="center">
                        <p>Fraction</p>
                     </c>
                     <c ca="center">
                        <p>Fold</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Random domain pairs</p>
                     </c>
                     <c ca="center">
                        <p>1,539,135</p>
                     </c>
                     <c ca="center">
                        <p>2,580</p>
                     </c>
                     <c ca="center">
                        <p>0.17%</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>> 0</p>
                     </c>
                     <c ca="center">
                        <p>25,352</p>
                     </c>
                     <c ca="center">
                        <p>2,080</p>
                     </c>
                     <c ca="center">
                        <p>8.2%</p>
                     </c>
                     <c ca="center">
                        <p>48</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 1</p>
                     </c>
                     <c ca="center">
                        <p>6,386</p>
                     </c>
                     <c ca="center">
                        <p>1,641</p>
                     </c>
                     <c ca="center">
                        <p>25.7%</p>
                     </c>
                     <c ca="center">
                        <p>151</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 4</p>
                     </c>
                     <c ca="center">
                        <p>2,391</p>
                     </c>
                     <c ca="center">
                        <p>1,241</p>
                     </c>
                     <c ca="center">
                        <p>51.9%</p>
                     </c>
                     <c ca="center">
                        <p>305</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 6</p>
                     </c>
                     <c ca="center">
                        <p>2,044</p>
                     </c>
                     <c ca="center">
                        <p>1,142</p>
                     </c>
                     <c ca="center">
                        <p>55.9%</p>
                     </c>
                     <c ca="center">
                        <p>329</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 11</p>
                     </c>
                     <c ca="center">
                        <p>1,683</p>
                     </c>
                     <c ca="center">
                        <p>1,011</p>
                     </c>
                     <c ca="center">
                        <p>60.1%</p>
                     </c>
                     <c ca="center">
                        <p>353</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 21</p>
                     </c>
                     <c ca="center">
                        <p>886</p>
                     </c>
                     <c ca="center">
                        <p>634</p>
                     </c>
                     <c ca="center">
                        <p>71.6%</p>
                     </c>
                     <c ca="center">
                        <p>421</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#8805; 51</p>
                     </c>
                     <c ca="center">
                        <p>420</p>
                     </c>
                     <c ca="center">
                        <p>336</p>
                     </c>
                     <c ca="center">
                        <p>80.0%</p>
                     </c>
                     <c ca="center">
                        <p>471</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Figure <figr fid="F4">4</figr> shows the ROC curves of the Bayesian method using multiple data sources. Combining all six data sources gives the highest accuracy. It also shows that adding the domain-fusion and domain function information significantly improves the performance of the prediction. In addition, we compare the na&#239;ve Bayesian approach with the method by Liu <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> where they multiplied the likelihoods of the observed protein interactions from four species to achieve one likelihood function. Figure <figr fid="F4">4</figr> shows the ROC curves of the two approaches by using the protein interaction data from the four species. In both approaches, the <it>expectation </it>score of domain interactions is used. Although both approaches have similar performance, one advantage of the Bayesian approach is that other information such as domain fusion and domain function can easily be incorporated.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>The relationship between false positive rate and sensitivity for predicting domain interactions using the Bayesian method with different data sources</p>
               </caption>
               <text>
                  <p>The relationship between false positive rate and sensitivity for predicting domain interactions using the Bayesian method with different data sources. The letters Y, W, F, H, C, and G indicate domain interactions based on yeast, worm, fruitfly, humans, co-existence, and same GO function, respectively. YWFH.Liu shows the result of predicted domain interactions using the extended MLE method defined in Liu <it>et al</it>. [14] with protein interactions of yeast, worm, fruitfly, and humans.</p>
               </text>
               <graphic file="1471-2105-7-269-4"/>
            </fig>
            <p>We compare the power of the three methods for predicting domain interactions: evidence counting, na&#239;ve Bayesian, and logistic regression. Figure <figr fid="F5">5</figr> shows the ROC curves for the three methods. It is clear that the Bayesian approach outperforms the other two. The evidence counting method does not consider the quality of each data sources, and the logistic regression method is limited by many missing values. Finally, we select a set of 2,391 high-confidence domain interactions having the likelihood ratio value at least 4, among which more than half (51.9%) are found in the iPfam. This set covers 48.1% of the data in iPfam with a false positive rate of 2.3%. We list the top 10 predicted domain interactions that are not found in iPfam (July 2004 version) in Table <tblr tid="T5">5</tblr>. Among them, three were later included in the updated version of iPfam (Oct. 2005 version), showing the reliability of the high-confidence domain interactions. The list of the high-confidence domain interactions is shown in <supplr sid="S6">additional file 6</supplr> and likelihood ratio values of 25,352 domain pairs are given in <supplr sid="S7">additional file 7</supplr>. In these tables, the domain pairs are sorted based on the Bayesian approach. The rankings by the three methods, the Bayesian approach, the logistic regression, and the evidence counting, are also presented to show the similarity of three methods. We test the differences of the rankings of the 25,352 domain pairs by three methods using the Wilcox rank sum test based on the null hypothesis of no difference between rankings. All three p-values are around 0.5, showing that the null hypothesis cannot be rejected. However, it does not indicate that the rankings by three approaches are similar. The ROC curves in Figure <figr fid="F5">5</figr> show that the top ranked domain pairs by three methods are different.</p>
            <suppl id="S6">
               <title>
                  <p>Additional file 6</p>
               </title>
               <text>
                  <p><b>The 2,391 high-confidence domain interactions from the Bayesian approach</b>. Domain pairs are sorted by the rank based on the Bayesian approach. Rankings by evidence counting (EV) and Logistic Regression (LR) are presented with the number of evidences and the probability by LR.</p>
               </text>
               <file name="1471-2105-7-269-S6.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S7">
               <title>
                  <p>Additional file 7</p>
               </title>
               <text>
                  <p><b>The likelihood ratio of all domain interactions</b>. Domain pairs with larger than zero likelihood ratio are sorted by the rank based on the Bayesian approach. Rankings by evidence counting (EV) and Logistic Regression (LR) are presented with the number of evidences and the probability by LR.</p>
               </text>
               <file name="1471-2105-7-269-S7.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>The ten highest ranked domain-domain interactions. The ten highest ranked domain-domain interactions from the Bayesian approach which are not in iPfam. iPfam_2005 represents domain interactions found in updated version of iPfam (Oct 2005 version).</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Domain 1</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Domain 2</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>iPfam_2005</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Pfam ID</p>
                     </c>
                     <c ca="center">
                        <p>Accession</p>
                     </c>
                     <c ca="left">
                        <p>Pfam ID</p>
                     </c>
                     <c ca="center">
                        <p>Accession</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>WD40</p>
                     </c>
                     <c ca="center">
                        <p>PF00400</p>
                     </c>
                     <c ca="left">
                        <p>Pkinase</p>
                     </c>
                     <c ca="center">
                        <p>PF00069</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>zf-C2H2</p>
                     </c>
                     <c ca="center">
                        <p>PF00096</p>
                     </c>
                     <c ca="left">
                        <p>Pkinase</p>
                     </c>
                     <c ca="center">
                        <p>PF00069</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>zf-C3HC4</p>
                     </c>
                     <c ca="center">
                        <p>PF00097</p>
                     </c>
                     <c ca="left">
                        <p>zf-C3HC4</p>
                     </c>
                     <c ca="center">
                        <p>PF00097</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>F-box</p>
                     </c>
                     <c ca="center">
                        <p>PF00646</p>
                     </c>
                     <c ca="left">
                        <p>Skp1_POZ</p>
                     </c>
                     <c ca="center">
                        <p>PF03931</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>zf-C4</p>
                     </c>
                     <c ca="center">
                        <p>PF00105</p>
                     </c>
                     <c ca="left">
                        <p>Hormone_recep</p>
                     </c>
                     <c ca="center">
                        <p>PF00104</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SMC_hinge</p>
                     </c>
                     <c ca="center">
                        <p>PF06470</p>
                     </c>
                     <c ca="left">
                        <p>SMC_N</p>
                     </c>
                     <c ca="center">
                        <p>PF02463</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cation_ATPase_N</p>
                     </c>
                     <c ca="center">
                        <p>PF00690</p>
                     </c>
                     <c ca="left">
                        <p>Cation_ATPase_C</p>
                     </c>
                     <c ca="center">
                        <p>PF00689</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MutS_V</p>
                     </c>
                     <c ca="center">
                        <p>PF00488</p>
                     </c>
                     <c ca="left">
                        <p>MutS_I</p>
                     </c>
                     <c ca="center">
                        <p>PF01624</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cadherin</p>
                     </c>
                     <c ca="center">
                        <p>PF00028</p>
                     </c>
                     <c ca="left">
                        <p>Cadherin_C</p>
                     </c>
                     <c ca="center">
                        <p>PF01049</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>dsrm</p>
                     </c>
                     <c ca="center">
                        <p>PF00035</p>
                     </c>
                     <c ca="left">
                        <p>dsrm</p>
                     </c>
                     <c ca="center">
                        <p>PF00035</p>
                     </c>
                     <c ca="center">
                        <p>x</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p/>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Prediction</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Interacting</p>
                     </c>
                     <c ca="center">
                        <p>Non-interacting</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Observed</p>
                     </c>
                     <c ca="center">
                        <p>TP</p>
                     </c>
                     <c ca="center">
                        <p>FN</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Non-observed</p>
                     </c>
                     <c ca="center">
                        <p>FP</p>
                     </c>
                     <c ca="center">
                        <p>TN</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>The relationship between false positive rate and sensitivity for predicting domain interactions using different methods : evidence counting, logistic regression, and naive Bayesian</p>
               </caption>
               <text>
                  <p>The relationship between false positive rate and sensitivity for predicting domain interactions using different methods : evidence counting, logistic regression, and naive Bayesian.</p>
               </text>
               <graphic file="1471-2105-7-269-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Comparison with domain interactions in <it>H. pylori</it></p>
            </st>
            <p>Rain <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> reported a protein-protein interaction data set for <it>H. pylori </it>using yeast two hybrid assays. This data set provides the ranges of sequences of the prey proteins interacting with the bait proteins. We map these ranges in the preys to the Pfam-A domains when the overlap between them is larger than 50% of the Pfam domains. As we do not have such information for the baits, we assume that all domains in the baits interact with the specific site of the preys. We obtain a total of 1,101 interactions between Pfam-A domains. Note that the domain interactions from <it>H. pylori </it>may contain false positives as the interacting domains in the baits are not known. We compare our predicted domain interactions from the six data sources using the Bayesian approach with the experimentally derived domain interactions from <it>H. pylori</it>. For comparison, we use a subset of the predicted domain interactions with domains involved in domain interactions in <it>H. pylori</it>. <supplr sid="S8">Additional file 8</supplr> shows the percentages of overlaps between the domain interactions from <it>H. pylori </it>and the predicted domain interactions. The fraction of domain pairs overlapped with the domain interactions in <it>H. pylori </it>increases as the likelihood ratio score increases, confirming the accuracy of the predicted domain interactions.</p>
            <p>We also study our scoring algorithm using the <it>H. pylori </it>database. We infer domain interactions from <it>H. pylori </it>protein interactions using four scoring functions and compare the predicted domain interactions with the domain interactions from <it>H. pylori</it>. The number of domains in <it>H. pylori </it>is 848 and 848*849/2 = 359,976 are potential interacting pairs. From the <it>Expectation </it>scoring function, we obtain 1,150 predicted domain interactions (larger than zero). Among them, 750 predicted domain interactions overlap with the 1,011 domain interactions in <it>H. pylori</it>. <supplr sid="S9">Additional file 9</supplr> shows that true positive rate is around 0.8 in 1,150 ranked domain interactions, showing the accuracy of the scoring functions.</p>
            <suppl id="S8">
               <title>
                  <p>Additional file 8</p>
               </title>
               <text>
                  <p><b>The likelihood ratio values of predicted domain interaction, the numbers of predicted domain interactions, and the overlap with domain interactions from H. pylori</b>. We used 1,101 domain interactions in H. pylori involving 206 domains. Numbers in the first column indicate the likelihood ratio values for the domain interactions, and the second column is the number of interactions having the corresponding likelihood ratio values. "Fold" indicates the ratio of the fraction over expected value. (5.2%).</p>
               </text>
               <file name="1471-2105-7-269-S8.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S9">
               <title>
                  <p>Additional file 9</p>
               </title>
               <text>
                  <p><b>A ROC curve of predicted domain interactions using H. pylori</b>. Figure S3 shows the comparison of performances of score functions to predict domain interactions for <it>H. pylori</it>.</p>
               </text>
               <file name="1471-2105-7-269-S9.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Domain interactions in yeast complexes</p>
            </st>
            <p>We apply the set of high-confidence domain interactions to examine the detailed protein and domain interactions in yeast complexes <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Figure <figr fid="F6">6</figr> shows two examples of protein complexes. Figure <figr fid="F6">6(a)</figr> is the SCF (Skp1-Cdc53-F-box protein) complex. SCF is a multi-protein complex with Cdc53, Skp1, and at least three independent F-box proteins, Cdc4, Met30, and Grr1 <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. This complex acts as a ubiquitin ligase catalyzing the final ubiquitin-transfer reaction in the destruction of G1/S-cyclins. Our prediction of domain interaction is consistent with the literature in that only domain PF00646 (F-box domain) of F-box proteins such as Cdc4, Met30, and Grr1 interact with domain PF01466 of protein Skp1. Domain PF00400 (Leucine Rich Repeat domain) and domain PF00560 (WD domain, G-beta repeat) do not participate in protein-protein interactions. Patton <it>et al</it>. <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> suggested that Cdc53 is a scaffold protein for Cdc34 and Skp1 by showing that it has independent binding sites for Cdc34 and Skp1. Our result also shows that the domain PF00888 in the protein Cdc53 has interaction with both the domain PF00179 of the protein Cdc34 and the domain PF01466 of the protein Skp1.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Two examples of yeast complexes with predicted domain-domain interactions and MIPS physical protein interactions</p>
               </caption>
               <text>
                  <p>Two examples of yeast complexes with predicted domain-domain interactions and MIPS physical protein interactions. The black arrows are predicted DDIs, the grey arrows are DDIs in iPfam, and the red arrows are PPIs from DIP. (a) SCF (Skp1-Cdc53-F-box protein) complexes. Cdc53 controls G1/S transition. Cdc34 is E2 ubiquitin-conjugating enzyme. Skp1 is kinetochore protein complex Cbf3, subunit D. Cdc4, Met30, and Grr1 are the F-box proteins. (b) Pyruvate dehydrogenase complexes. Pdb1 is pyruvate dehydrogenase (lipoamide) beta chain precursor, Pda1 is pyruvate dehydrogenase (lipoamide) alpha chain precursor, Lpd1 is dihydrolipoamide dehydrogenase precursor, Pdx1 is pyruvate dehydrogenase complex protein X, and Lat1 is dihydrolipoamide S-acetyltransferase. For details, see the main text.</p>
               </text>
               <graphic file="1471-2105-7-269-6"/>
            </fig>
            <p>Figure <figr fid="F6">6(b)</figr> shows a Pyruvate dehydrogenase (PDH) complex. This complex converts pyruvate to acetyl CoA. The interaction between protein Lat1 and protein Pdb1 is mainly due to the interaction between domain PF02817 and domain PF02780. Domain PF02817 is an E3 binding domain, and PF02780 is the C-terminal domain of transketolase, which has been proposed as a regulatory molecule binding site. The interaction between protein Lap1 and protein Lpd1 occurs through the interaction of domain PF02817 and domain PF02852, which is the Pyridine nucleotide-disulphide oxidoreductase, dimerisation domain.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The basic units of proteins are domains. If two proteins interact, at least one pair of domains from each of the two proteins interact. However, current biotechnologies such as the yeast-two-hybrid system can only detect protein interactions and it is tedious and labor intensive to derive domain interactions. The prediction of domain interactions based on protein interactions from one species has been formulated as a missing value problem and an EM algorithm has been developed to achieve this objective <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The method has been modified to integrate protein interaction data sets from multiple species and the results have been improved <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B14">14</abbr></abbrgrp>. In this study, we further explore the problem of domain-domain interactions from multiple data sources including protein interactions from four species; yeast, worm, fruitfly, and humans, as well as domain fusion and domain function information. We first provide a score function, the expected number of domain-domain interactions in the observed interactions, to infer the reliability of domain interactions. By comparing with domain interactions in iPfam, we show that the new score outperforms the score of Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> for predicting domain interactions. The true positive rate among highly ranked domain interactions predicted from the new score is higher than that from Deng <it>et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. We further show that, by including the domain fusion and gene ontology information, the accuracy of the predicted domain interactions can be significantly increased. We also show that the simple na&#239;ve Bayesian approach works well to combine multiple biological information for predicting high-confidence domain interactions. There are several limitations of this study. First, we did not include all the interaction data from all the species as Riley <it>et al</it>. <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> did. The reason is that the size of data in other species is much smaller than those in the four species. Second, the protein interaction data sets used in this study are incomplete and contain many false positives. <supplr sid="S1">Additional file 1</supplr> shows the ROC curves of the prediction results using various values of false positive (fp) and false negative (fn). In particular, we compared the result based on the fp and fn values presented in Table <tblr tid="T1">1</tblr> with the result based on fp = fn = 0 used in Riley <it>et al</it>. <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Depending on species, the former approach is sometimes better than or similar to the latter approach, and sometimes is worse. Third, although we have shown that the na&#239;ve Bayesian approach outperforms the evidence counting and the logistic regression methods, there is room to improve the prediction by considering the correlations between data sources.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have shown that the likelihood ratio score provides a mean for evaluating the reliability of domain interactions. Based on the likelihood ratio score, we have derived a set of high-confidence domain interactions. This set has important implication in understanding protein functions at the domain level as well as in understanding protein interactions.</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>MLE &#8211; Maximum Likelihood Estimation</p>
         <p>EM &#8211; Expectation Maximization</p>
         <p>HPRD &#8211; Human Protein Reference Database</p>
         <p>GO &#8211; Gene Ontology</p>
         <p>ROC &#8211; Receiver Operating Characteristic</p>
         <p>FPR &#8211; False Positive Rate</p>
         <p>SN &#8211; Sensitivity</p>
         <p>PQS &#8211; Protein Quaternary Structure</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>HL developed and implemented methods of inferring domain interactions by combinig multiple biological data sets, collected biological data sets, and drafted the manuscript. MD provided the program for expectation-maximization algorithm to infer domain interactions from protein interactions and helped the data collection. FS and TC initiated and directed this research and helped in writing the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank two anonymous reviewers for several helpful suggestions, which significantly improved the manuscript. One reviewer suggested the comparison with <it>H. pylori </it>data which is now included in the manuscript. This research is supported by NIH/NSF joint mathematical biology initiative DMS-0241102. MH Deng is supported by the grants from the National Key Basic Research Project of China (No. 2003CB715903) and National Natural Science Foundation of China (No. 90208022, No.30570425).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Inferring protein domain interactions from databases of interacting proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Riley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sabatti</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Bio</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>10</issue>
            <fpage>R89</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1186/gb-2005-6-10-r89</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>iPfam</p>
            </title>
            <url>http://www.sanger.ac.uk/Software/Pfam/iPfam/</url>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Visualisation of protein-protein interactions at domains and amino acid resolutions</p>
            </title>
            <aug>
               <au>
                  <snm>Finn</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>410</fpage>
            <lpage>412</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti011</pubid>
                  <pubid idtype="pmpid">15353450</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The protein-protein interaction map of <it>Helicobacter pylori</it></p>
            </title>
            <aug>
               <au>
                  <snm>Rain</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Selig</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Reuse</snm>
                  <fnm>HD</fnm>
               </au>
               <au>
                  <snm>Battaglia</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Reverdy</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lenzen</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Petel</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wojcik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schachter</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Chemama</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Labigne</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>P</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>211</fpage>
            <lpage>215</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35051615</pubid>
                  <pubid idtype="pmpid">11196647</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Comparison of the Complete Protein Sets of Worm and Yeast: Orthology and Divergence</p>
            </title>
            <aug>
               <au>
                  <snm>Chervitz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mohr</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>D</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>2022</fpage>
            <lpage>2028</lpage>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Comparative Analysis of Protein Domain Organization</p>
            </title>
            <aug>
               <au>
                  <snm>Ye</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Godzik</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>343</fpage>
            <lpage>353</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">535408</pubid>
                  <pubid idtype="pmpid">14993202</pubid>
                  <pubid idtype="doi">10.1101/gr.1610504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Conserved pathways within bacteria and yeast as revealed by global protein network alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Kelley</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Sharan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Karp</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Sittler</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Root</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Stockwell</snm>
                  <fnm>BR</fnm>
               </au>
               <au>
                  <snm>Ideker</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>11394</fpage>
            <lpage>11399</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.1534710100</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data</p>
            </title>
            <aug>
               <au>
                  <snm>Sharan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ideker</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Karp</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>12</volume>
            <issue>6</issue>
            <fpage>835</fpage>
            <lpage>846</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/cmb.2005.12.835</pubid>
                  <pubid idtype="pmpid">16108720</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Interaction network containing conserved and essential protein complexes in Escherichia coli</p>
            </title>
            <aug>
               <au>
                  <snm>Butland</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Peregrin-Alvarez</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Canadien</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Starostine</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Beattie</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Krogan</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Davey</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>A</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>433</volume>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03239</pubid>
                  <pubid idtype="pmpid">15690043</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Novel specificities emerge by stepwise duplication of functional modules</p>
            </title>
            <aug>
               <au>
                  <snm>Pereira-Leal</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>552</fpage>
            <lpage>559</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1074369</pubid>
                  <pubid idtype="pmpid">15805495</pubid>
                  <pubid idtype="doi">10.1101/gr.3102105</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Protein-protein interaction map inference using interaction domain profile pairs</p>
            </title>
            <aug>
               <au>
                  <snm>Wojcik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schachter</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>Suppl 1</issue>
            <fpage>S296</fpage>
            <lpage>305</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11473021</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Correlated Sequence-signatures as Markers of Protein-Protein Interaction</p>
            </title>
            <aug>
               <au>
                  <snm>Sprinzak</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>311</volume>
            <fpage>681</fpage>
            <lpage>692</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4920</pubid>
                  <pubid idtype="pmpid">11518523</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Inferring domain-domain interactions from protein-protein interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Deng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1540</fpage>
            <lpage>1548</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187530</pubid>
                  <pubid idtype="pmpid">12368246</pubid>
                  <pubid idtype="doi">10.1101/gr.153002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Inferring protein-protein interactions through high-throughput interaction data from diverse organisms</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>15</issue>
            <fpage>3279</fpage>
            <lpage>3285</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti492</pubid>
                  <pubid idtype="pmpid">15905281</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Learning to predict protein-protein interactions from protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Gomez</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Noble</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>A</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>15</issue>
            <fpage>1875</fpage>
            <lpage>1881</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg352</pubid>
                  <pubid idtype="pmpid">14555619</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Probabilistic prediction of unknown metabolic and signal-transduction networks</p>
            </title>
            <aug>
               <au>
                  <snm>Gomez</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Lo</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>A</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2001</pubdate>
            <volume>159</volume>
            <issue>3</issue>
            <fpage>1291</fpage>
            <lpage>1298</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11729170</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>DIP</p>
            </title>
            <url>http://dip.doe-mbi.ucla.edu/</url>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The Database of Interacting Proteins: 2004 update</p>
            </title>
            <aug>
               <au>
                  <snm>Salwinski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Pettit</snm>
                  <fnm>FK</fnm>
               </au>
               <au>
                  <snm>Bowie</snm>
                  <fnm>JU</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <issue>32 Database</issue>
            <fpage>D449</fpage>
            <lpage>51</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308820</pubid>
                  <pubid idtype="pmpid">14681454</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh086</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Integrative approach for computationally inferring protein domain interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>8</issue>
            <fpage>923</fpage>
            <lpage>929</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg118</pubid>
                  <pubid idtype="pmpid">12761053</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>InterDom a database of putative interacting protein domains for validating predicted protein interactions and complexes</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>1</issue>
            <fpage>251</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165526</pubid>
                  <pubid idtype="pmpid">12519994</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg079</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>MIPS: a Database for Genomes and Protein Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Mewes</snm>
                  <fnm>HW</fnm>
               </au>
               <au>
                  <snm>Frishman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Guldener</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Mannhaupt</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mokrejs</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Munsterkotter</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rudd</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Weil</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>31</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99165</pubid>
                  <pubid idtype="pmpid">11752246</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.31</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Uetz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cagney</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mansfield</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Judson</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Knight</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Lockshon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Narayan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pochart</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Qureshi-Emili</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Godwin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Conover</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kalbfleisch</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vijayadamodar</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Johnston</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rothberg</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <fpage>623</fpage>
            <lpage>627</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35001009</pubid>
                  <pubid idtype="pmpid">10688190</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>A comprehensive two-hybrid analysis to explore the yeast protein interactome</p>
            </title>
            <aug>
               <au>
                  <snm>Ito</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chiba</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ozawa</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sakaki</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>4569</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">31875</pubid>
                  <pubid idtype="pmpid">11283351</pubid>
                  <pubid idtype="doi">10.1073/pnas.061034498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A map of the interactome network of the metazoan C. elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Armstrong</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Bertin</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>303</volume>
            <issue>5657</issue>
            <fpage>540</fpage>
            <lpage>543</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1126/science.1091403</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>A protein interaction map of Drosophila melanogaster</p>
            </title>
            <aug>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Brouwer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chaudhuri</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <issue>5651</issue>
            <fpage>1727</fpage>
            <lpage>1736</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090289</pubid>
                  <pubid idtype="pmpid">14605208</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Development of human protein reference database as an initial platform for approaching systems biology in humans</p>
            </title>
            <aug>
               <au>
                  <snm>Peri</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Navarro</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Amanchy</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kristiansen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Jonnalagadda</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Surendranath</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Niranjan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Muthusamy</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gandhi</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Gronborg</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ibarrola</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Deshpande</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Shanker</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shivashankar</snm>
                  <fnm>HN</fnm>
               </au>
               <au>
                  <snm>Rashmi</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Ramya</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Chandrika</snm>
                  <fnm>KN</fnm>
               </au>
               <au>
                  <snm>Padma</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Harsha</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Yatish</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Kavitha</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Menezes</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Choudhury</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Suresh</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Saravana</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chandran</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Krishna</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Joy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Anand</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Madavan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Joseph</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Schiemann</snm>
                  <fnm>WP</fnm>
               </au>
               <au>
                  <snm>Constantinescu</snm>
                  <fnm>SN</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Khosravi-Far</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Steen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tewari</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ghaffari</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Blobe</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Dang</snm>
                  <fnm>CV</fnm>
               </au>
               <au>
                  <snm>Garcia</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Pevsner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>ON</fnm>
               </au>
               <au>
                  <snm>Roepstorff</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Deshpande</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Chinnaiyan</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Hamosh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chakravarti</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>A</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>2363</fpage>
            <lpage>2371</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403728</pubid>
                  <pubid idtype="pmpid">14525934</pubid>
                  <pubid idtype="doi">10.1101/gr.1680803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Gene Ontology</p>
            </title>
            <url>http://www.geneontology.org/</url>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The Pfam Protein Families Database</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cerruti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>L</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>S</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Howe</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>276</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99071</pubid>
                  <pubid idtype="pmpid">11752314</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.276</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Pfam</p>
            </title>
            <url>http://www.sanger.ac.uk/Software/Pfam/</url>
         </bibl>
         <bibl id="B30">
            <title>
               <p>NCBI</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>EMBL-EBI</p>
            </title>
            <url>http://www.ebi.ac.uk/integr8/</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>PQS: a protein quaternary structure file server</p>
            </title>
            <aug>
               <au>
                  <snm>Henrick</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>1998</pubdate>
            <volume>23</volume>
            <issue>9</issue>
            <fpage>358</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(98)01253-5</pubid>
                  <pubid idtype="pmpid">9787643</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>SCOP</p>
            </title>
            <url>http://scop.berkeley.edu/</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Protein interaction maps for complete genomes based on gene fusion events</p>
            </title>
            <aug>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Iliopoulos</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <issue>6757</issue>
            <fpage>86</fpage>
            <lpage>90</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/47056</pubid>
                  <pubid idtype="pmpid">10573422</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Detecting Protein Function and Protein-protein Interactions from Genome Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>285</volume>
            <fpage>751</fpage>
            <lpage>753</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.285.5428.751</pubid>
                  <pubid idtype="pmpid">10427000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A Bayesian networks approach for predicting protein-protein interactions from genomic data</p>
            </title>
            <aug>
               <au>
                  <snm>Jansen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Greenbaum</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kluger</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Krogan</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Chung</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Emili</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>449</fpage>
            <lpage>453</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1087361</pubid>
                  <pubid idtype="pmpid">14564010</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>A probabilistic functional network of yeast genes</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Date</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Adai</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>1555</fpage>
            <lpage>1558</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1099511</pubid>
                  <pubid idtype="pmpid">15567862</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A first-draft human protein-interaction map</p>
            </title>
            <aug>
               <au>
                  <snm>Lehner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R63</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">522870</pubid>
                  <pubid idtype="pmpid">15345047</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-9-r63</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Prediction of Protein Function Using Protein-protein Interaction Data</p>
            </title>
            <aug>
               <au>
                  <snm>Deng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mehta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <issue>6</issue>
            <fpage>197</fpage>
            <lpage>206</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1089/106652703322756168</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Cdc53 is a scaffold protein for multiple Cdc34/Skp1/F-box proteincomplexes that regulate cell division and methionine biosynthesis in yeast</p>
            </title>
            <aug>
               <au>
                  <snm>Patton</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Willems</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Sa</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kuras</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Craig</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Tyers</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1998</pubdate>
            <volume>12</volume>
            <issue>5</issue>
            <fpage>692</fpage>
            <lpage>705</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">316590</pubid>
                  <pubid idtype="pmpid">9499404</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
