<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1752-0509-2-80</ui>
   <ji>1752-0509</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>The use of Gene Ontology terms for predicting highly-connected 'hub' nodes in protein-protein interaction networks</p>
         </title>
         <aug>
            <au id="A1" ca="yes" ce="yes">
               <snm>Hsing</snm>
               <fnm>Michael</fnm>
               <insr iid="I1"/>
               <email>mhsing@interchange.ubc.ca</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Byler</snm>
               <mnm>Grant</mnm>
               <fnm>Kendall</fnm>
               <insr iid="I2"/>
               <email>kbyler@interchange.ubc.ca</email>
            </au>
            <au id="A3">
               <snm>Cherkasov</snm>
               <fnm>Artem</fnm>
               <insr iid="I2"/>
               <email>artc@interchange.ubc.ca</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Bioinformatics Graduate Program, Faculty of Graduate Studies, University of British Columbia. 100-570 West 7th Avenue. Vancouver, BC, V5T 4S6, Canada. </p>
            </ins>
            <ins id="I2">
               <p>Division of Infectious Diseases, Department of Medicine, Faculty of Medicine, University of British Columbia. D 452 HP, VGH. 2733 Heather Street. Vancouver, BC, V5Z 3J5, Canada. </p>
            </ins>
         </insg>
         <source>BMC Systems Biology</source>
         <issn>1752-0509</issn>
         <pubdate>2008</pubdate>
         <volume>2</volume>
         <issue>1</issue>
         <fpage>80</fpage>
         <url>http://www.biomedcentral.com/1752-0509/2/80</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18796161</pubid>
               <pubid idtype="doi">10.1186/1752-0509-2-80</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>01</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>9</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>9</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Hsing et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Protein-protein interactions mediate a wide range of cellular functions and responses and have been studied rigorously through recent large-scale proteomics experiments and bioinformatics analyses. One of the most important findings of those endeavours was the observation that 'hub' proteins participate in significant numbers of protein interactions and play critical roles in the organization and function of cellular protein interaction networks (PINs) <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. It has also been demonstrated that such hub proteins may constitute an important pool of attractive drug targets.</p>
               <p>Thus, it is crucial to be able to identify hub proteins based not only on experimental data but also by means of bioinformatics predictions.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>A hub protein classifier has been developed based on the available interaction data and Gene Ontology (GO) annotations for proteins in the <it>Escherichia coli</it>, <it>Saccharomyces cerevisiae</it>, <it>Drosophila melanogaster </it>and <it>Homo sapiens </it>genomes. In particular, by utilizing the machine learning method of boosting trees we were able to create a predictive bioinformatics tool for the identification of proteins that are likely to play the role of a hub in protein interaction networks. Testing the developed hub classifier on external sets of experimental protein interaction data in Methicillin-resistant <it>Staphylococcus aureus </it>(MRSA) 252 and <it>Caenorhabditis elegans </it>demonstrated that our approach can predict hub proteins with a high degree of accuracy.</p>
               <p>A practical application of the developed bioinformatics method has been illustrated by the effective protein bait selection for large-scale pull-down experiments that aim to map complete protein-protein interaction networks for several species.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The successful development of an accurate hub classifier demonstrated that highly-connected proteins tend to share certain relevant functional properties reflected in their Gene Ontology annotations. It is anticipated that the developed bioinformatics hub classifier will represent a useful tool for the theoretical prediction of highly-interacting proteins, the study of cellular network organizations, and the identification of prospective drug targets &#8211; even in those organisms that currently lack large-scale protein interaction data.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>A broad range of cellular functions are mediated through complex protein-protein interactions, which are commonly visualized as two-dimensional networks connecting thousands of proteins by their physical interactions. Such a network perspective suggests that cellular effects and functions of proteins can only be fully understood in context with their interacting partners in a protein interaction network (PIN).</p>
         <p>The study of PINs has been made possible through recent advancements in high-throughput proteomics that have detected protein-protein interactions on a genome-wide scale and have generated large amounts of interaction data for several species including <it>Saccharomyces cerevisiae </it><abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, <it>Escherichia coli </it><abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, <it>Drosophila melanogaster </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, <it>Caenorhabditis elegans </it><abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, and <it>Homo sapiens </it><abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. The corresponding protein interaction networks have been made publicly accessible through open access databases such as IntAct <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and DIP <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
         <p>The accumulated protein interaction data have further supported recent protein network analyses that demonstrated the scale-free organization of PINs, where the majority of proteins have a low number of interactions in the network, with a few highly-connected proteins (also called <it>hubs</it>) having a significant number of interacting partners <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Such inhomogeneous network topology allows a PIN to be robust against random removal of protein nodes, but vulnerable to targeted removal of network hubs <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. In addition, previous studies have shown defined relationships between the degree of connectivity of proteins in PINs, their sequence conservation, and cellular essentiality properties <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Those studies indicated that highly-connected proteins (or hubs) represent very attractive subjects for understanding cellular functions, identifying novel drug targets, and for use in the rational design of large-scale pull-down experiments.</p>
         <p>Although large-scale PINs have already been experimentally determined for several species (and thus represent suitable training sets for hub-predicting bioinformatics approaches), in general, protein interaction data are still lacking for many organisms. Thus, several computational approaches have been developed to predict protein-protein interactions utilizing existing bioinformatics data such as gene proximity information <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>, gene fusion events <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>, gene co-expression data <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, phylogenetic profiling <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, orthologous protein interactions <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and identification of interacting protein domains <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. Several bioinformatics approaches have also been developed to identify hypothetical interactions between proteins based on their three-dimensional structures <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp> or by applying text-mining techniques <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. Traditionally, such computational predictions have focused on the identification of pairwise protein-protein interactions with varying degrees of accuracy <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>; however, none of them have been explicitly focused on predicting hypothetical hub proteins.</p>
         <p>At the same time, it is reasonable to hypothesize that hub proteins should share certain common sequence or structural features that not only enable them to participate in multitudes of protein interactions, but also can be utilized for the theoretical identification of such hub proteins without prior knowledge of the corresponding PINs. Therefore, the goal of this study is to develop such a 'hub predictor' (or classifier), capitalizing on experimental and bioinformatics data available to date for proteins in several model organisms with already-determined PINs.</p>
         <p>We have focused the construction of the hub classifier on Gene Ontology (GO) data, which provide functional annotations for individual proteins using an expert knowledge base <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. The advantage of applying GO annotation to hub prediction lies in the readily available information for proteins in hundreds of species. Importantly, the GO annotations have been shown to reflect certain properties that can mediate protein-protein interactions <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, but the annotation itself does not rely on the availability of corresponding experimental data. Thus, the GO-based hub classifier should be suitable for predicting highly-connected proteins, even in organisms that lack protein interaction data.</p>
         <p>Here we present the development of such a hub protein classifier, trained on the existing GO and protein-protein interaction data for <it>Escherichia coli</it>, <it>Saccharomyces cerevisiae</it>, <it>Drosophila melanogaster </it>and <it>Homo sapiens </it>species. The generated models were cross-validated and tested on two external protein interaction data sets: Methicillin-resistant <it>Staphylococcus aureus </it>(MRSA) 252 and <it>Caenorhabditis elegans</it>. The developed bioinformatics approach has not only demonstrated an improved accuracy in identifying highly-connected PIN nodes (as compared to homology- or protein domain-based predicting methods), but has also shown an improved speed and a lower demand on computational resources.</p>
         <p>To illustrate a possible application of the developed tool, we have used it for rationalizing a bait selection strategy for a large-scale protein complex pull-down experiment.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data acquisition</p>
            </st>
            <sec>
               <st>
                  <p>Protein-protein interaction data</p>
               </st>
               <p>Protein interaction data used for the training and testing of the hub protein classifier were obtained from the IntAct database <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> for the following species: <it>Escherichia coli </it>K 12 (taxonomy ID: 83333), <it>Saccharomyces cerevisiae </it>(taxonomy ID: 4932), <it>Drosophila melanogaster </it>(taxonomy ID: 7227), and <it>Homo sapiens </it>(taxonomy ID: 9606) (acquisition date: Sep. 25<sup>th</sup>, 2007). Two external validation data sets were collected for protein interactions in MRSA252 (provided by the PREPARE project in Vancouver B.C. Canada <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>) and <it>Caenorhabditis elegans </it>(obtained from IntAct database on Sep. 25<sup>th</sup>, 2007). Table <tblr tid="T1">1</tblr> lists the total number of proteins and their interactions of the four species in the training and testing, which have been combined into a single data set for the subsequent analyses. Similar information on the external validation sets is shown in Table <tblr tid="T2">2</tblr>.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>A summary of protein interaction and GO annotation data used in the training and testing of the hub classifiers.</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c ca="left">
                           <p>Training/Testing set</p>
                        </c>
                        <c ca="right">
                           <p>E. coli</p>
                        </c>
                        <c ca="right">
                           <p>S. cerevisiae</p>
                        </c>
                        <c ca="right">
                           <p>D. melanogaster</p>
                        </c>
                        <c ca="right">
                           <p>H. sapiens</p>
                        </c>
                        <c ca="right">
                           <p>total of 4 species</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of proteins</p>
                        </c>
                        <c ca="right">
                           <p>2860</p>
                        </c>
                        <c ca="right">
                           <p>5397</p>
                        </c>
                        <c ca="right">
                           <p>6935</p>
                        </c>
                        <c ca="right">
                           <p>6592</p>
                        </c>
                        <c ca="right">
                           <p>21784</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of hubs (10% of total proteins)</p>
                        </c>
                        <c ca="right">
                           <p>286</p>
                        </c>
                        <c ca="right">
                           <p>535</p>
                        </c>
                        <c ca="right">
                           <p>628</p>
                        </c>
                        <c ca="right">
                           <p>620</p>
                        </c>
                        <c ca="right">
                           <p>2069</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of non-hubs (90% of total proteins)</p>
                        </c>
                        <c ca="right">
                           <p>2574</p>
                        </c>
                        <c ca="right">
                           <p>4862</p>
                        </c>
                        <c ca="right">
                           <p>6307</p>
                        </c>
                        <c ca="right">
                           <p>5972</p>
                        </c>
                        <c ca="right">
                           <p>19715</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of protein interactions</p>
                        </c>
                        <c ca="right">
                           <p>13888</p>
                        </c>
                        <c ca="right">
                           <p>37167</p>
                        </c>
                        <c ca="right">
                           <p>19994</p>
                        </c>
                        <c ca="right">
                           <p>19115</p>
                        </c>
                        <c ca="right">
                           <p>90164</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>minimum # of interactions per hub</p>
                        </c>
                        <c ca="right">
                           <p>20</p>
                        </c>
                        <c ca="right">
                           <p>33</p>
                        </c>
                        <c ca="right">
                           <p>16</p>
                        </c>
                        <c ca="right">
                           <p>13</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of proteins with at least one GO term</p>
                        </c>
                        <c ca="right">
                           <p>1378</p>
                        </c>
                        <c ca="right">
                           <p>4738</p>
                        </c>
                        <c ca="right">
                           <p>5931</p>
                        </c>
                        <c ca="right">
                           <p>5097</p>
                        </c>
                        <c ca="right">
                           <p>17144</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of proteins without any GO term</p>
                        </c>
                        <c ca="right">
                           <p>1482</p>
                        </c>
                        <c ca="right">
                           <p>659</p>
                        </c>
                        <c ca="right">
                           <p>1004</p>
                        </c>
                        <c ca="right">
                           <p>1495</p>
                        </c>
                        <c ca="right">
                           <p>4640</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>% of proteins with at least one GO term</p>
                        </c>
                        <c ca="right">
                           <p>48.18%</p>
                        </c>
                        <c ca="right">
                           <p>87.79%</p>
                        </c>
                        <c ca="right">
                           <p>85.52%</p>
                        </c>
                        <c ca="right">
                           <p>77.32%</p>
                        </c>
                        <c ca="right">
                           <p>78.70%</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of different GO terms &#8211; process</p>
                        </c>
                        <c ca="right">
                           <p>30</p>
                        </c>
                        <c ca="right">
                           <p>41</p>
                        </c>
                        <c ca="right">
                           <p>48</p>
                        </c>
                        <c ca="right">
                           <p>49</p>
                        </c>
                        <c ca="right">
                           <p>50</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of different GO terms &#8211; function</p>
                        </c>
                        <c ca="right">
                           <p>21</p>
                        </c>
                        <c ca="right">
                           <p>37</p>
                        </c>
                        <c ca="right">
                           <p>38</p>
                        </c>
                        <c ca="right">
                           <p>37</p>
                        </c>
                        <c ca="right">
                           <p>40</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of different GO terms &#8211; component</p>
                        </c>
                        <c ca="right">
                           <p>4</p>
                        </c>
                        <c ca="right">
                           <p>27</p>
                        </c>
                        <c ca="right">
                           <p>31</p>
                        </c>
                        <c ca="right">
                           <p>29</p>
                        </c>
                        <c ca="right">
                           <p>35</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of different GO terms &#8211; total</p>
                        </c>
                        <c ca="right">
                           <p>55</p>
                        </c>
                        <c ca="right">
                           <p>105</p>
                        </c>
                        <c ca="right">
                           <p>117</p>
                        </c>
                        <c ca="right">
                           <p>115</p>
                        </c>
                        <c ca="right">
                           <p>125</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>The top table lists the protein interactions and hubs in each of the four species, and the bottom part of the table lists the number of unique GO terms for each annotation category.</p>
                  </tblfn>
               </tbl>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>A summary of protein interaction and GO annotation data used in the external validation of the hub classifiers.</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="left">
                           <p>External validation set</p>
                        </c>
                        <c ca="right">
                           <p>MRSA252</p>
                        </c>
                        <c ca="right">
                           <p>C. elegans</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of proteins</p>
                        </c>
                        <c ca="right">
                           <p>133</p>
                        </c>
                        <c ca="right">
                           <p>2890</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of hubs (10% of total proteins)</p>
                        </c>
                        <c ca="right">
                           <p>13</p>
                        </c>
                        <c ca="right">
                           <p>276</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of non-hubs (90% of total proteins)</p>
                        </c>
                        <c ca="right">
                           <p>120</p>
                        </c>
                        <c ca="right">
                           <p>2614</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of protein interactions</p>
                        </c>
                        <c ca="right">
                           <p>2401</p>
                        </c>
                        <c ca="right">
                           <p>4594</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>minimum # of interactions per hub</p>
                        </c>
                        <c ca="right">
                           <p>45</p>
                        </c>
                        <c ca="right">
                           <p>7</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of proteins with at least one GO term</p>
                        </c>
                        <c ca="right">
                           <p>109</p>
                        </c>
                        <c ca="right">
                           <p>2403</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of proteins without any GO term</p>
                        </c>
                        <c ca="right">
                           <p>24</p>
                        </c>
                        <c ca="right">
                           <p>487</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>% of proteins with at least one GO term</p>
                        </c>
                        <c ca="right">
                           <p>81.95%</p>
                        </c>
                        <c ca="right">
                           <p>83.15%</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of different GO terms &#8211; process</p>
                        </c>
                        <c ca="right">
                           <p>27</p>
                        </c>
                        <c ca="right">
                           <p>46</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of different GO terms &#8211; function</p>
                        </c>
                        <c ca="right">
                           <p>19</p>
                        </c>
                        <c ca="right">
                           <p>34</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of different GO terms &#8211; component</p>
                        </c>
                        <c ca="right">
                           <p>5</p>
                        </c>
                        <c ca="right">
                           <p>22</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p># of different GO terms &#8211; total</p>
                        </c>
                        <c ca="right">
                           <p>51</p>
                        </c>
                        <c ca="right">
                           <p>102</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>The top table lists the protein interactions and hubs in each of the two species, and the bottom part of the table lists the number of unique GO terms for each annotation category.</p>
                  </tblfn>
               </tbl>
               <p>Hub proteins were identified based on their numbers of protein interactions and their percentile ranking relative to other proteins in the same species. Proteins of the same species were divided into different percentile groups, sorted by the number of protein-protein interactions in a decreasing order (ie. higher percentile proteins have more interactions than lower percentile proteins). It is clear that hub proteins have more interactions than non-hubs, but currently there is no consensus on exactly how many interactions a hub protein should have. Often, hubs are defined arbitrarily to have at least certain number of interactions <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. In our study, the hub selection criterion was based on the position of a sharp turn (or inflection point) on an accumulative protein interaction distribution plot from each of the four species. As shown in Figure <figr fid="F1">1</figr>, the protein interactions followed a power law distribution, such that a sharp turn is visible around the 90<sup>th </sup>protein percentile position on the interaction plots.</p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>Accumulative protein interaction distribution plots</p>
                  </caption>
                  <text>
                     <p><b>Accumulative protein interaction distribution plots</b>. a) <it>E. coli</it>, b) <it>S. cerevisiae</it>, c) <it>D. melanogaster</it>, d) <it>H. sapiens</it>. On each plot, the (x, y) coordinate of the sharp turn or the inflection point is shown.</p>
                  </text>
                  <graphic file="1752-0509-2-80-1"/>
               </fig>
               <p>To achieve a consistent hub definition across the four studied species, hub proteins were defined as above or equal to the 90<sup>th </sup>percentiles of interactors; in other words, the hubs represented the top 10 percent of highly-connected interactors, and the non-hubs were consisted of the bottom 90 percent of the proteins. Using this definition, hub proteins were determined from each of the four PINs individually. At the 90<sup>th </sup>protein percentile, <it>E. coli </it>hubs have at least 20 protein interactions, <it>S. cerevisiae </it>hubs have at least 33 protein interactions, <it>D. melanogaster </it>hubs have at least 16 protein interactions, and <it>H. sapiens </it>hubs have at least 13 interactions. The number of assigned hub and non-hub classifications is shown in Table <tblr tid="T1">1</tblr>.</p>
               <p>Figure <figr fid="F2">2</figr> illustrates the subsequent steps involved in the development of the hub protein classifiers and their corresponding bioinformatics analyses.</p>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>A flow chart of the development of the hub protein classifiers and their corresponding bioinformatics analyses</p>
                  </caption>
                  <text>
                     <p>
                        <b>A flow chart of the development of the hub protein classifiers and their corresponding bioinformatics analyses.</b>
                     </p>
                  </text>
                  <graphic file="1752-0509-2-80-2"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Gene Ontology (GO) data</p>
               </st>
               <p>Each protein obtained from the IntAct database was identified by a unique UniProt accession number, which enabled a fast collection of GO annotation data from the Uniprot Retrieval System <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B41">41</abbr></abbrgrp> (Uniprot protein data obtained on Oct. 1st, 2007). The complete UniProt protein annotation pages were downloaded as flat texts, which were then parsed by PERL scripts to extract the GO annotations in the three categories: biological process, molecular function, and cellular component. Because each GO term could be assigned to a different level of the annotation hierarchy, we established a fixed general GO level that represented all of the specific GO terms of the proteins in the study. This general GO annotation level was determined based on the GO slim project, which provides a list of generic GO terms on which many bioinformatics analyses can be performed <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. Importantly, the GO slim generic terms provided a reasonable number of protein 'predictors' for a machine learning method to effectively operate. The tool 'map2slim' <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> was used to map specific GO terms to the 'GO slim' generic terms (GO annotation files were obtained from <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> on Oct. 17<sup>th</sup>, 2007; GO format-version: 1.2, GO date: 16:10:2007 16:19, GO revision: 5.514; GO slim format-version: 1.2, GO slim date: 01:10:2007 16:53, GO slim revision: 1.682). This generic version of GO slim contained 53 [biological process] terms, 42 [molecular functions] terms and 37 [cellular component] terms.</p>
               <p>Table <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr> list the number of GO slim terms used to annotate the proteins in each species and the number of the proteins with or without a GO annotation term.</p>
               <p>All protein interaction data and GO annotations were stored in a local MySQL database for fast data searching and reporting.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Hub protein classification by boosting trees</p>
            </st>
            <p>To train models that classify a protein as a hub or a non-hub, the protein interaction data from the four species were combined into a single data set (90,164 interactions involving 2,069 hubs and 19,715 non-hubs). A four-fold cross-validation strategy was used in which four non-overlapping testing sets (25% of the total protein set), and four training sets (75% of the total protein set) were utilized for building the hub classifiers. Each training and testing set maintained the same hub to non-hub (1:9) ratio. In addition, the proteins in the training sets have maintained the same distribution of GO annotation terms as the proteins in the testing sets. Figure <figr fid="F3">3</figr> illustrates the distribution of each of the 125 GO terms, represented by the percentage of proteins with this term in the training sets vs. the testing sets of the four cross-validation samples. A high correlation R<sup>2 </sup>values of 0.9981 ~0.9983 indicated an equal GO distribution between the training and testing sets. It is also shown that the majority of the GO terms were associated with less than 10% of the proteins in a given data set.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Distribution of GO annotation terms between the training and testing sets in the four cross-validation samples</p>
               </caption>
               <text>
                  <p><b>Distribution of GO annotation terms between the training and testing sets in the four cross-validation samples</b>. Each point on a graph represents the percentage of proteins annotated with a given GO term in the training set (x-axis), and the percentage of proteins annotated with the same GO term in the testing set (y-axis). All four plots were fitted with linear regression lines, with high R<sup>2 </sup>values of 0.998. This indicates an equal distribution of the GO terms between the training and testing sets of the four samples.</p>
               </text>
               <graphic file="1752-0509-2-80-3"/>
            </fig>
            <p>We focused the machine-learning effort on hub classification by applying boosting trees, which is one of the best methods for classifying complex data and providing interpretable results <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. The training and testing of the hub-predicting classification trees were performed on 125 GO terms as predictor variables by using the boosting tree application as implemented in STATISTICA version 8 <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The input data were formatted as tables of binary data, where each column represented a GO term variable (1 = present, 0 = absent) and each row represented a sample protein.</p>
            <p>Four classifiers were built (one for each of the four training sets) and compiled in the C++ language under Linux. In addition to the four testing sets in the cross-validation study, the best of the four hub classifiers has been validated on two external data sets, which were consisted of experimentally-determined PINs in MRSA252 and <it>C. elegans</it>. The classifier predicted each protein in the data sets as either a hub or a non-hub, and the classification statistics were calculated as the following:</p>
            <p>
               <display-formula>Sensitivity = TP/(TP + FN)</display-formula>
            </p>
            <p>
               <display-formula>Specificity = TN/(TN + FP)</display-formula>
            </p>
            <p>
               <display-formula>Accuracy = (TP + TN)/(TP + TN + FP + FN)</display-formula>
            </p>
            <p>
               <display-formula>PPV (Positive Predictive Value) = TP/(TP + FP)</display-formula>
            </p>
            <p>
               <display-formula>NPV (Negative Predictive Value) = TN/(TN + FN)</display-formula>
            </p>
            <p>, where TP = True Positive, FP = False Positive, TN = True Negative, and FN = False Negative.</p>
            <p>A useful output feature of the boosting tree method is the relative predictor importance, which measures the average influence of a predictor variable on the prediction outcome over all of the trees <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. The most important predictor is assigned a value of 100, and the other variables are scaled accordingly.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison of the hub classifiers with other existing protein interaction prediction approaches</p>
            </st>
            <p>To further assess the performance of the hub classifier against other existing approaches for predicting hub proteins, we applied three different types of bioinformatics methods to construct hypothetical PINs in MRSA252, where hub proteins were determined by the number of predicted pairwise protein-protein interactions.</p>
            <sec>
               <st>
                  <p>Hypothetical PIN &#8211; pathway maps</p>
               </st>
               <p>The first type of hypothetical PIN represented the known protein-protein interactions available for MRSA252. A total of 513 protein interactions were manually extracted from the pathway maps in the KEGG database <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> (acquisition date: May 3<sup>rd</sup>, 2006).</p>
            </sec>
            <sec>
               <st>
                  <p>Hypothetical PIN &#8211; orthologous interactions</p>
               </st>
               <p>The second type of PIN was constructed based on known protein-protein interactions between orthologs from three other species: <it>Helicobacter pylori</it>, <it>Saccharomyces cerevisiae</it>, and <it>Escherichia coli</it>. The experimental PIN in <it>H. pylori </it>was obtained from the BIND database <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> (acquisition date: Aug. 11<sup>th</sup>, 2005). Two sources were used to build the <it>S. cerevisiae </it>PIN: the BIND database (acquisition date: Aug. 11<sup>th</sup>, 2005) and Gavin's study <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> (acquisition date from the IntAct database <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>: Feb. 7<sup>th</sup>, 2006). We extracted the <it>E. coli </it>PIN in Butland's study <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> from the IntAct database <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> (acquisition date: Apr. 13<sup>th</sup>, 2006).</p>
               <p>2656 protein sequences in MRSA252 were obtained from the RefSeq databases at NCBI <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> (acquisition date: Feb. 4<sup>th</sup>, 2006). The orthologs of the interacting proteins from each of the above species were identified in MRSA252 by using the program InParanoid <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> (version 1.35). If a pair of MRSA252 proteins whose orthologs interacted in one of the three species, the pair was assigned as an interacting protein pair. A total of 3258 protein interactions were predicted for this type of MRSA252 PIN reconstruction.</p>
            </sec>
            <sec>
               <st>
                  <p>Hypothetical PIN &#8211; interacting domains</p>
               </st>
               <p>The third type of MRSA PIN was predicted based on protein domain-domain interactions. First, the presence of Pfam domains <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> in each of the 2656 MRSA252 proteins was determined by scanning the Pfam domain profiles (version 19.0) with the program HMMER <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> (version 2.3.2). Second, domain-domain interaction data were acquired from two sources: InterDom <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> (version: 1.2) and iPfam <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> (version: 19.0). If a pair of MRSA252 proteins contained interacting domains according to one of the two sources, the pair was assigned as an interacting protein pair. A total of 11,608 protein interactions were predicted based by this method.</p>
            </sec>
            <sec>
               <st>
                  <p>Validating the prediction on an experimental MRSA252 PIN</p>
               </st>
               <p>The experimental MRSA252 PIN provided by the PREPARE project contained interaction data for 133 proteins and was used as the external validation set for measuring the prediction performance of the hub classifier and the different types of hypothetical PINs.</p>
               <p>We have compared the prediction results in two different ways. In the first type of comparison, both the hub classifier and the combined hypothetical PINs classified the 133 MRSA proteins as hubs or non-hubs, while the same 133 proteins were also classified as hubs or non-hubs based on the experimental results provided by PREPARE. In the case of the hub classifier, hubs and non-hubs were reported explicitly from the prediction program. In the cases of hypothetical and experimental PINs, hubs were defined as above or equal to the 90<sup>th </sup>percentile of proteins ranked by the number of interactions (same criterion as the hub classifier). The following classification statistics were calculated: sensitivity, specificity, accuracy, PPV and NPV.</p>
               <p>In the second type of comparison, we compared ranked lists of proteins based on their 'hub-likeness' property. In the case of the hub classifier, the proteins were ranked based on the differences between predicted hub probabilities and non-hub probabilities as computed by the boosting tree method. In the case of the hypothetical and experimental PINs, the proteins were ranked by their numbers of protein interactions. The ranked lists were compared to the list of proteins ranked by the number of experimental interactions in MRSA252 by using a Spearman rank order correlation as implemented in STATISTICA 8.</p>
            </sec>
            <sec>
               <st>
                  <p>Validating the prediction on an experimental <it>C. elegans </it>PIN</p>
               </st>
               <p>In addition to MRSA252, we have tested the hub protein classifier on an external set of protein interaction data in <it>C. elegans</it>. The same procedure was applied to determine hub prediction statistics, as described above.</p>
            </sec>
            <sec>
               <st>
                  <p>Test of significance</p>
               </st>
               <p>To test the hub protein classifier against a null hypothesis, which claims there is no difference of GO term distribution between hubs and non-hubs, we have randomized the protein interaction data in the following ways. Firstly, the same 5445 proteins in the testing set (25% of the total protein set consisted of the four species) for the hub classifier were used in the construction of a randomized data set. Secondly, 10% of those proteins were randomly assigned as hubs, while the other 90% of proteins were randomly assigned as non-hubs. Thirdly, the GO terms originally associated with those proteins were randomly distributed within the data set. The combination of the above two randomization methods ensured that there was no significant difference in GO term distribution between the hub and non-hub proteins. Finally, the hub classifier was used to predict hubs and non-hubs in the randomized data set, and prediction statistics were obtained.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Simulation of protein bait selections and network coverage</p>
            </st>
            <p>The effectiveness of protein bait selections assisted by the hub classifier has been simulated by using yeast protein-protein interaction data determined by protein-complex pull-down and mass spectrometry experiments, available from Gavin's study <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. One major goal of such large-scale experiments is to maximize the number of protein interactions identified by using a small set of proteins as 'baits' to pull down their interactors (preys). Therefore, it is crucial to select protein baits based on properties that will produce the best network coverage, as measured by the ratio between the number of protein interactions identified by an experiment and the total number of interactions in an organism.</p>
            <p>In our simulation experiments, 18,028 interactions, involving 2551 proteins from Gavin's yeast data set (acquisition date from the IntAct database <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>: Feb. 7<sup>th</sup>, 2006), were hypothetically treated as the total number of protein interactions in <it>Saccharomyces cerevisiae</it>. To simulate the bait selection process, we selected a subset of proteins (ranged from 5% up to 100% of the 2551 yeast proteins) as baits and calculated the number of interactions such baits would "pull-out" from the yeast interaction data set and computed the overall network coverage. Two selection criteria were used. In one simulation, the baits were randomly selected from the total pool of the yeast proteins. In the other simulation, the baits were selected from the pool of hub proteins predicted by the hub classifier.</p>
            <p>In addition to the bait selection strategy described above (referred to as <it>one-round selection</it>), we simulated the network coverage results by applying a second round of selections. In this type of selection, baits were divided into two sets: one-third as the first round of baits, and two-thirds as the second round of baits. The first-round baits were chosen by either random selection or by hub prediction. The second round of baits was selected from the most abundant preys pulled down by the first round of baits. Such an approach is also referred to as the "name your friend" method and has been applied to maximize the effectiveness in vaccinations against infectious diseases <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>, as well as in some protein complex experiments <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Prediction performance of the hub prediction classifier</p>
            </st>
            <p>One prediction model was constructed for each of the four cross-validation samples; therefore, a total of four hub classifiers were generated. The executable files of the classifiers were complied by the Gnu C++ compiler in Linux. The classifier programs used a list of query proteins and their corresponding GO term occurrences as the input file, and produced the same list of the proteins with hub prediction results and probability scores. The running time was only a few seconds for predicting hubs from over 21,000 proteins on a 3.0 GHz Pentium D personal computer.</p>
            <p>Overall, the classification statistics were consistent between the training and testing sets for the four classifiers. Within the training sets, the sensitivity of the classifiers ranged from 33.33% ~36.51%, the specificity ranged from 90.50% ~90.94%, and the accuracy ranged from 85.21% ~85.58%; PPV (positive predictive value) varied from 27.40% ~29.12%, and NPV (Negative predictive value) varied from 92.86% ~93.14%. Within the testing sets, the sensitivity ranged from 25.87% ~30.89%, the specificity ranged from 89.45% ~91.09%, and the accuracy ranged from 83.75% ~85.37%; PPV varied from 21.51% ~26.71% and NPV varied from 92.04% ~92.61%. The classification statistics on the best of the four hub classifiers is shown in Table <tblr tid="T3">3</tblr>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Prediction performance of the hub classifier in the combined data set of the four species</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>Hub classifier (# of nodes in each tree = 15, FN: FP penalty = 1:1.9, total # of trees = 187)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Training</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>13381</p>
                     </c>
                     <c ca="right">
                        <p>1405</p>
                     </c>
                     <c ca="right">
                        <p>36.51%</p>
                     </c>
                     <c ca="right">
                        <p>90.50%</p>
                     </c>
                     <c ca="right">
                        <p>85.37%</p>
                     </c>
                     <c ca="right">
                        <p>28.75%</p>
                     </c>
                     <c ca="right">
                        <p>93.14%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>986</p>
                     </c>
                     <c ca="right">
                        <p>567</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Testing</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>4415</p>
                     </c>
                     <c ca="right">
                        <p>514</p>
                     </c>
                     <c ca="right">
                        <p>28.10%</p>
                     </c>
                     <c ca="right">
                        <p>89.57%</p>
                     </c>
                     <c ca="right">
                        <p>83.75%</p>
                     </c>
                     <c ca="right">
                        <p>22.00%</p>
                     </c>
                     <c ca="right">
                        <p>92.25%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>371</p>
                     </c>
                     <c ca="right">
                        <p>145</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>All</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>17796</p>
                     </c>
                     <c ca="right">
                        <p>1919</p>
                     </c>
                     <c ca="right">
                        <p>34.41%</p>
                     </c>
                     <c ca="right">
                        <p>90.27%</p>
                     </c>
                     <c ca="right">
                        <p>84.96%</p>
                     </c>
                     <c ca="right">
                        <p>27.06%</p>
                     </c>
                     <c ca="right">
                        <p>92.91%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>1357</p>
                     </c>
                     <c ca="right">
                        <p>712</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The observed vs. predicted hubs and non-hubs and their corresponding classification statistics are shown for the best classifier based on the training, testing and all (training + testing) data sets</p>
               </tblfn>
            </tbl>
            <p>We have further validated the prediction accuracy of the best hub classifier in the external MRSA252 data set. As indicated in Table <tblr tid="T4">4</tblr>, in comparison to the other protein prediction methods, the hub classifier has the highest prediction statistics, with 30.77% sensitivity, 90.83% specificity, 84.96% accuracy, 26.67% PPV and 92.37% NPV. The next best hub prediction result was achieved by the hypothetical MRSA PIN based on orthologous interactions. On the other hand, the results from the predicted PINs of pathway maps and interacting domains were poor as none of them had any true positives.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Hub prediction comparison of the classifier and the hypothetical PINs in MRSA252.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Hub classifier</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>109</p>
                     </c>
                     <c ca="right">
                        <p>11</p>
                     </c>
                     <c ca="right">
                        <p>30.77%</p>
                     </c>
                     <c ca="right">
                        <p>90.83%</p>
                     </c>
                     <c ca="right">
                        <p>84.96%</p>
                     </c>
                     <c ca="right">
                        <p>26.67%</p>
                     </c>
                     <c ca="right">
                        <p>92.37%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>9</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Hypothetical PIN &#8211; pathway maps</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>111</p>
                     </c>
                     <c ca="right">
                        <p>9</p>
                     </c>
                     <c ca="right">
                        <p>0.00%</p>
                     </c>
                     <c ca="right">
                        <p>92.50%</p>
                     </c>
                     <c ca="right">
                        <p>83.46%</p>
                     </c>
                     <c ca="right">
                        <p>0.00%</p>
                     </c>
                     <c ca="right">
                        <p>89.52%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>13</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Hypothetical PIN &#8211; orthologous interactions</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>110</p>
                     </c>
                     <c ca="right">
                        <p>10</p>
                     </c>
                     <c ca="right">
                        <p>23.08%</p>
                     </c>
                     <c ca="right">
                        <p>91.67%</p>
                     </c>
                     <c ca="right">
                        <p>84.96%</p>
                     </c>
                     <c ca="right">
                        <p>23.08%</p>
                     </c>
                     <c ca="right">
                        <p>91.67%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>10</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Hypothetical PIN &#8211; interacting domains</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>117</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>0.00%</p>
                     </c>
                     <c ca="right">
                        <p>97.50%</p>
                     </c>
                     <c ca="right">
                        <p>87.97%</p>
                     </c>
                     <c ca="right">
                        <p>0.00%</p>
                     </c>
                     <c ca="right">
                        <p>90.00%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>13</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Combined hypothetical PIN &#8211; (pathway maps + orthologous interactions)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>110</p>
                     </c>
                     <c ca="right">
                        <p>10</p>
                     </c>
                     <c ca="right">
                        <p>23.08%</p>
                     </c>
                     <c ca="right">
                        <p>91.67%</p>
                     </c>
                     <c ca="right">
                        <p>84.96%</p>
                     </c>
                     <c ca="right">
                        <p>23.08%</p>
                     </c>
                     <c ca="right">
                        <p>91.67%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>10</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Combined hypothetical PIN &#8211; (pathway maps + orthologous interactions + interacting domains)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>108</p>
                     </c>
                     <c ca="right">
                        <p>12</p>
                     </c>
                     <c ca="right">
                        <p>7.69%</p>
                     </c>
                     <c ca="right">
                        <p>90.00%</p>
                     </c>
                     <c ca="right">
                        <p>81.95%</p>
                     </c>
                     <c ca="right">
                        <p>7.69%</p>
                     </c>
                     <c ca="right">
                        <p>90.00%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>12</p>
                     </c>
                     <c ca="right">
                        <p>1</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The prediction performance of the hub classifier is compared to that of the hypothetical PINs in MRSA252. The classification statistics is reported.</p>
               </tblfn>
            </tbl>
            <p>In the other comparison, we correlated a ranked list of proteins based from their 'hub-likeness' (determined from either the hub classifier or the hypothetical PINs) to that of the experimental MRSA PIN. As shown in Table <tblr tid="T5">5</tblr>, the hub classifier had a correlation coefficient of 0.32 &#8211; highest among all other methods. The next best correlation was achieved by the hypothetical PIN of orthologous interactions.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Comparing ranked lists of hub-likeness properties between the classifier and the hypothetical PINs in MRSA252.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hub prediction methods</p>
                     </c>
                     <c ca="right">
                        <p>correlation coefficient</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hub classifier</p>
                     </c>
                     <c ca="right">
                        <p>0.320523</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical PIN &#8211; pathway maps</p>
                     </c>
                     <c ca="right">
                        <p>0.108682</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical PIN &#8211; orthologous interactions</p>
                     </c>
                     <c ca="right">
                        <p>0.27396</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical PIN &#8211; interacting domain</p>
                     </c>
                     <c ca="right">
                        <p>-0.291846</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Combined hypothetical PIN &#8211; (pathway maps + orthologous interactions)</p>
                     </c>
                     <c ca="right">
                        <p>0.23882</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Combined hypothetical PIN &#8211; (pathway maps + orthologous interactions + interacting domains)</p>
                     </c>
                     <c ca="right">
                        <p>-0.011494</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The ranked protein lists based on hub-likeness properties, produced by either the classifier or the hypothetical PINs, has been compared to that of the experimental PIN in MRSA252. The coefficient of Spearman rank order correlation is reported with p-value &lt; 0.05.</p>
               </tblfn>
            </tbl>
            <p>In addition to MRSA252, the hub protein classifier has achieved comparable prediction results in the <it>C. elegans </it>validation data set, with 32.97% sensitivity, 86.84% specificity, 81.70% accuracy, 20.92% PPV and 92.46% NPV, as shown in Table <tblr tid="T6">6</tblr>.</p>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Hub prediction result in <it>C. elegans</it>.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>2270</p>
                     </c>
                     <c ca="right">
                        <p>344</p>
                     </c>
                     <c ca="right">
                        <p>32.97%</p>
                     </c>
                     <c ca="right">
                        <p>86.84%</p>
                     </c>
                     <c ca="right">
                        <p>81.70%</p>
                     </c>
                     <c ca="right">
                        <p>20.92%</p>
                     </c>
                     <c ca="right">
                        <p>92.46%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>185</p>
                     </c>
                     <c ca="right">
                        <p>91</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The prediction performance of the hub classifier was validated, based on the experimental PIN in C. elegans.</p>
               </tblfn>
            </tbl>
            <p>The prediction statistics of the hub classifier on the randomized data set are summarized in Table <tblr tid="T7">7</tblr>. The result shows that the hub classifier was not able to achieve a significant hub prediction when the GO terms and protein hubs were randomly assigned. The prediction only reached 11.43% sensitivity and 8.39% PPV in the randomized set, compared to 28.10% sensitivity and 22.00% PPV in the testing set before the randomizations. The specificity and NPV were comparable before and after the randomizations, due to the inherited 1:9 ratio between the number of hubs and non-hubs. Therefore, it is easier to make a correct prediction on non-hub proteins than hub proteins. The comparison of the prediction results between the testing set and the randomized set indicates that hub proteins have a distinct distribution of GO terms, which contributed to the predictability of the hub classifier.</p>
            <tbl id="T7">
               <title>
                  <p>Table 7</p>
               </title>
               <caption>
                  <p>Hub prediction result in the randomized data set.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="left">
                        <p>observed</p>
                     </c>
                     <c ca="right">
                        <p>predicted non-hub</p>
                     </c>
                     <c ca="right">
                        <p>predicted hub</p>
                     </c>
                     <c ca="right">
                        <p>sensitivity</p>
                     </c>
                     <c ca="right">
                        <p>specificity</p>
                     </c>
                     <c ca="right">
                        <p>accuracy</p>
                     </c>
                     <c ca="right">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>NPV</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>non-hub</p>
                     </c>
                     <c ca="right">
                        <p>4285</p>
                     </c>
                     <c ca="right">
                        <p>644</p>
                     </c>
                     <c ca="right">
                        <p>11.43%</p>
                     </c>
                     <c ca="right">
                        <p>86.93%</p>
                     </c>
                     <c ca="right">
                        <p>79.78%</p>
                     </c>
                     <c ca="right">
                        <p>8.39%</p>
                     </c>
                     <c ca="right">
                        <p>90.36%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>hub</p>
                     </c>
                     <c ca="right">
                        <p>457</p>
                     </c>
                     <c ca="right">
                        <p>59</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The prediction performance of the hub classifier was tested on the null hypothesis that there is no difference of GO term distribution between hubs and non-hubs.</p>
               </tblfn>
            </tbl>
            <p>Overall, the hub classifier built on the Gene Ontology annotations achieved high specificity and NPV, but had lower than expected sensitivity and PPV. We attribute this to the lack of GO annotations for certain proteins in the training sets, as the level of annotations varied among the four species. For instance, <it>S. cerevisiae </it>had the highest percentage of the proteins with GO annotations (87.8%), while only 48.2% of the proteins in <it>E. coli </it>had any GO annotation. Therefore, the performance of the current hub classifier primarily relied on the number of GO annotations available for each species. We expect the sensitivity value of the hub classifier to be improved when more annotation data become available for the four species in the training sets.</p>
         </sec>
         <sec>
            <st>
               <p>GO term predictor importance</p>
            </st>
            <p>An indicator of the contribution of each GO term used in the boosted trees classifiers was provided by the <it>relative importance of predictors </it>in the training output. The importance value ranged from 0 to 100, where 100 indicated that a predictor had the most influence on the hub prediction outcome, and 0 meant a predictor had the least influence. The top 20 GO annotation terms that were likely to be shared among hub proteins are listed in Table <tblr tid="T8">8</tblr>.</p>
            <tbl id="T8">
               <title>
                  <p>Table 8</p>
               </title>
               <caption>
                  <p>Top 20 important GO term predictors.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>GO ID</p>
                     </c>
                     <c ca="left">
                        <p>GO name</p>
                     </c>
                     <c ca="left">
                        <p>GO Type</p>
                     </c>
                     <c ca="right">
                        <p>predictor importance</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0005730</p>
                     </c>
                     <c ca="left">
                        <p>nucleolus</p>
                     </c>
                     <c ca="left">
                        <p>cellular component</p>
                     </c>
                     <c ca="right">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0003723</p>
                     </c>
                     <c ca="left">
                        <p>RNA binding</p>
                     </c>
                     <c ca="left">
                        <p>molecular function</p>
                     </c>
                     <c ca="right">
                        <p>97</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0005515</p>
                     </c>
                     <c ca="left">
                        <p>protein binding</p>
                     </c>
                     <c ca="left">
                        <p>molecular function</p>
                     </c>
                     <c ca="right">
                        <p>96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006412</p>
                     </c>
                     <c ca="left">
                        <p>translation</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>95</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006139</p>
                     </c>
                     <c ca="left">
                        <p>nucleobase, nucleoside, nucleotide and nucleic acid metabolic process</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>90</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006996</p>
                     </c>
                     <c ca="left">
                        <p>organelle organization and biogenesis</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0030246</p>
                     </c>
                     <c ca="left">
                        <p>carbohydrate binding</p>
                     </c>
                     <c ca="left">
                        <p>molecular function</p>
                     </c>
                     <c ca="right">
                        <p>87</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0005840</p>
                     </c>
                     <c ca="left">
                        <p>ribosome</p>
                     </c>
                     <c ca="left">
                        <p>cellular component</p>
                     </c>
                     <c ca="right">
                        <p>86</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0005777</p>
                     </c>
                     <c ca="left">
                        <p>peroxisome</p>
                     </c>
                     <c ca="left">
                        <p>cellular component</p>
                     </c>
                     <c ca="right">
                        <p>85</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0009719</p>
                     </c>
                     <c ca="left">
                        <p>response to endogenous stimulus</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>82</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0007049</p>
                     </c>
                     <c ca="left">
                        <p>cell cycle</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>81</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0004871</p>
                     </c>
                     <c ca="left">
                        <p>signal transducer activity</p>
                     </c>
                     <c ca="left">
                        <p>molecular function</p>
                     </c>
                     <c ca="right">
                        <p>77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0005654</p>
                     </c>
                     <c ca="left">
                        <p>nucleoplasm</p>
                     </c>
                     <c ca="left">
                        <p>cellular component</p>
                     </c>
                     <c ca="right">
                        <p>77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0008219</p>
                     </c>
                     <c ca="left">
                        <p>cell death</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006118</p>
                     </c>
                     <c ca="left">
                        <p>electron transport</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006259</p>
                     </c>
                     <c ca="left">
                        <p>DNA metabolic process</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0050789</p>
                     </c>
                     <c ca="left">
                        <p>regulation of biological process</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0006950</p>
                     </c>
                     <c ca="left">
                        <p>response to stress</p>
                     </c>
                     <c ca="left">
                        <p>biological process</p>
                     </c>
                     <c ca="right">
                        <p>72</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0005811</p>
                     </c>
                     <c ca="left">
                        <p>lipid particle</p>
                     </c>
                     <c ca="left">
                        <p>cellular component</p>
                     </c>
                     <c ca="right">
                        <p>71</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GO:0008135</p>
                     </c>
                     <c ca="left">
                        <p>translation factor activity, nucleic acid binding</p>
                     </c>
                     <c ca="left">
                        <p>molecular function</p>
                     </c>
                     <c ca="right">
                        <p>70</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The top GO terms included several annotations such as 'RNA binding', 'translation', and 'ribosome', commonly used to annotate ribosomal proteins, which were often identified as the top interacting proteins in other experiments <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr></abbrgrp>. The list of important predictors indicated that hub proteins tend to participate in several common cellular processes, including translation, nucleotide metabolism, organelle biogenesis, cell cycle, signal transduction, cell death, and electron transport.</p>
         </sec>
         <sec>
            <st>
               <p>Applying hub classifier to protein bait selection</p>
            </st>
            <p>The bait selection strategy, assisted by the hub classifier, was simulated in the experimental PIN of <it>Saccharomyces cerevisiae</it>. In the case of one-round selection, choosing baits that were predicted as hubs by the classifier has greatly increased the network coverage in comparison to random selection. For instance, as illustrated in Figure <figr fid="F4">4</figr>, when 15% of total proteins were selected as baits based on the result of the hub classifier, 42.39% of the network coverage was achieved. On the other hand, only 26.53% of the network coverage was generated by the random bait selection.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Network coverage of different bait selection strategies in protein complex pull-down experiments, simulated in <it>Saccharomyces cerevisiae</it></p>
               </caption>
               <text>
                  <p>Network coverage of different bait selection strategies in protein complex pull-down experiments, simulated in <it>Saccharomyces cerevisiae</it>.</p>
               </text>
               <graphic file="1752-0509-2-80-4"/>
            </fig>
            <p>In the case of the two-round selection, the network coverage produced by either random or hub bait selection has shown a great improvement from the one-round selection. The hub bait selection performed slightly better than random in the two-round selection.</p>
            <p>The results suggest that the hub classifier is a useful tool for selecting baits and prioritizing proteins for protein interaction experiments. Although it was not explored in the present study, we expect that the hub classifier can also assist in the identification of highly-interacting proteins in pathogens as potential drug targets.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have studied the available interaction and Gene Ontology data for proteins in <it>Escherichia coli</it>, <it>Saccharomyces cerevisiae</it>, <it>Drosophila melanogaster </it>and <it>Homo sapiens </it>genomes. By utilizing the boosting trees classification method, we have shown that highly-connected proteins in the studied PINs share certain common GO terms; this observation enabled the development of a hub classifier capable of distinguishing hub proteins from non-hubs. This classifier has improved accuracy for hub prediction relative to other traditional approaches for protein interaction prediction. It is anticipated that the hub classifier can serve as a useful tool to identify highly-interacting proteins in species without any available protein interaction data, with potential applications in optimizing protein pull-down experiments and identifying new drug targets against pathogens.</p>
      </sec>
      <sec>
         <st>
            <p>Availability</p>
         </st>
         <p>The source code and executable program of the hub classifier is freely available for download at: <url>http://www.cnbi2.com/hub/</url></p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>MH acquired and analyzed protein interaction and Gene Ontology data, designed and developed the hub classifiers, built the hypothetical PINs, simulated the protein bait selection experiments, and drafted and revised the manuscript. KGB analyzed the statistical models and tools of boosting trees, and revised the manuscript. AC conceived and designed the study, and revised the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>MH was supported by the Michael Smith Foundation for Health Research (MSFHR) and the Natural Sciences and Engineering Research Council (NSERC). KGB and AC were funded by Genome Canada and Genome BC through the PRoteomics for Emerging PAthogen REsponse (PREPARE) project.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Network biology: understanding the cell's functional organization</p>
            </title>
            <aug>
               <au>
                  <snm>Barabasi</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Oltvai</snm>
                  <fnm>ZN</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>2</issue>
            <fpage>101</fpage>
            <lpage>113</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1272</pubid>
                  <pubid idtype="pmpid" link="fulltext">14735121</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Scale-free networks in cell biology</p>
            </title>
            <aug>
               <au>
                  <snm>Albert</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Cell Sci</source>
            <pubdate>2005</pubdate>
            <volume>118</volume>
            <issue>Pt 21</issue>
            <fpage>4947</fpage>
            <lpage>4957</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1242/jcs.02714</pubid>
                  <pubid idtype="pmpid" link="fulltext">16254242</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Uetz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cagney</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mansfield</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Judson</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Knight</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Lockshon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Narayan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pochart</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Qureshi-Emili</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Godwin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Conover</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kalbfleisch</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vijayadamodar</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Johnston</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rothberg</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <issue>6770</issue>
            <fpage>623</fpage>
            <lpage>627</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35001009</pubid>
                  <pubid idtype="pmpid" link="fulltext">10688190</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>A comprehensive two-hybrid analysis to explore the yeast protein interactome</p>
            </title>
            <aug>
               <au>
                  <snm>Ito</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chiba</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ozawa</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sakaki</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>8</issue>
            <fpage>4569</fpage>
            <lpage>4574</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">31875</pubid>
                  <pubid idtype="pmpid" link="fulltext">11283351</pubid>
                  <pubid idtype="doi">10.1073/pnas.061034498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry</p>
            </title>
            <aug>
               <au>
                  <snm>Ho</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gruhler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Heilbut</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Millar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Boutilier</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wolting</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Donaldson</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Schandorff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shewnarane</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Taggart</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Goudreault</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Muskat</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Alfarano</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dewar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Michalickova</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Willems</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Sassi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Rasmussen</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Andersen</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Johansen</snm>
                  <fnm>LE</fnm>
               </au>
               <au>
                  <snm>Hansen</snm>
                  <fnm>LH</fnm>
               </au>
               <au>
                  <snm>Jespersen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Podtelejnikov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Crawford</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Poulsen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Sorensen</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Matthiesen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hendrickson</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Gleeson</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Pawson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Durocher</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hogue</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Figeys</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tyers</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <issue>6868</issue>
            <fpage>180</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415180a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11805837</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Proteome survey reveals modularity of the yeast cell machinery</p>
            </title>
            <aug>
               <au>
                  <snm>Gavin</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Aloy</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Grandi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Boesche</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marzioch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rau</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Bastuck</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dumpelfeld</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Edelmann</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Heurtier</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Hoefert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Klein</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hudak</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Michon</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Schelder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schirle</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Remor</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rudi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hooper</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bouwmeester</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Casari</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Drewes</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Neubauer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Kuster</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Superti-Furga</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>440</volume>
            <issue>7084</issue>
            <fpage>631</fpage>
            <lpage>636</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04532</pubid>
                  <pubid idtype="pmpid" link="fulltext">16429126</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Global landscape of protein complexes in the yeast Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Krogan</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Cagney</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhong</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Ignatchenko</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Datta</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Tikuisis</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Punna</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Peregrin-Alvarez</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Shales</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Davey</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Paccanaro</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bray</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Sheung</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Beattie</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Canadien</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lalev</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mena</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Starostine</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Canete</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Vlasblom</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Orsi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Chandran</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Haw</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rilstone</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Gandi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Musso</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>St Onge</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ghanny</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Butland</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Altaf-Ul</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Kanaya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shilatifard</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>O'Shea</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Weissman</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Ingles</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wodak</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Emili</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>440</volume>
            <issue>7084</issue>
            <fpage>637</fpage>
            <lpage>643</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04670</pubid>
                  <pubid idtype="pmpid" link="fulltext">16554755</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Interaction network containing conserved and essential protein complexes in Escherichia coli</p>
            </title>
            <aug>
               <au>
                  <snm>Butland</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Peregrin-Alvarez</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Canadien</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Starostine</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Beattie</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Krogan</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Davey</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Emili</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>433</volume>
            <issue>7025</issue>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03239</pubid>
                  <pubid idtype="pmpid" link="fulltext">15690043</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A protein interaction map of Drosophila melanogaster</p>
            </title>
            <aug>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Brouwer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chaudhuri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kuang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Ooi</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Godwin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Vitols</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Vijayadamodar</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pochart</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Machineni</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Welsh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kong</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zerhusen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Malcolm</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Varrone</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Collis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Minto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Burgess</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McDaniel</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Stimpson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Spriggs</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Neurath</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ioime</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Agee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Voss</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Furtak</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Renzulli</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Aanensen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Carrolla</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bickelhaupt</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lazovatsky</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>DaSilva</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zhong</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stanyon</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Finley</snm>
                  <fnm>RL</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Braverman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jarvie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gold</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Leach</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Knight</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shimkets</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>McKenna</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Chant</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rothberg</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <issue>5651</issue>
            <fpage>1727</fpage>
            <lpage>1736</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090289</pubid>
                  <pubid idtype="pmpid" link="fulltext">14605208</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A map of the interactome network of the metazoan C. elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Armstrong</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Bertin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Milstein</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Boxem</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vidalain</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Chesneau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Goldberg</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Martinez</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rual</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Lamesch</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Tewari</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Berriz</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Jacotot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Vaglio</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Reboul</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hirozane-Kishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Gabel</snm>
                  <fnm>HW</fnm>
               </au>
               <au>
                  <snm>Elewa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Baumgartner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bosak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sequerra</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mango</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Saxton</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Strome</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Heuvel</snm>
                  <mnm>Van Den</mnm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Piano</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Vandenhaute</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sardet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Doucette-Stamm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gunsalus</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Harper</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Cusick</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>FP</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>303</volume>
            <issue>5657</issue>
            <fpage>540</fpage>
            <lpage>543</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1698949</pubid>
                  <pubid idtype="pmpid" link="fulltext">14704431</pubid>
                  <pubid idtype="doi">10.1126/science.1091403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Towards a proteome-scale map of the human protein-protein interaction network</p>
            </title>
            <aug>
               <au>
                  <snm>Rual</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Venkatesan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hirozane-Kishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Dricot</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Berriz</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Gibbons</snm>
                  <fnm>FD</fnm>
               </au>
               <au>
                  <snm>Dreze</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ayivi-Guedehoussou</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Klitgord</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Boxem</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Milstein</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rosenberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Goldberg</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Franklin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Albala</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fraughton</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Llamosas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cevik</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bex</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lamesch</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sikorski</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Vandenhaute</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zoghbi</snm>
                  <fnm>HY</fnm>
               </au>
               <au>
                  <snm>Smolyar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bosak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sequerra</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Doucette-Stamm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cusick</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>FP</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <issue>7062</issue>
            <fpage>1173</fpage>
            <lpage>1178</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04209</pubid>
                  <pubid idtype="pmpid" link="fulltext">16189514</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A human protein-protein interaction network: a resource for annotating the proteome</p>
            </title>
            <aug>
               <au>
                  <snm>Stelzl</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Worm</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Lalowski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haenig</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brembeck</snm>
                  <fnm>FH</fnm>
               </au>
               <au>
                  <snm>Goehler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Stroedicke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zenkner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schoenherr</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Koeppen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Timm</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mintzlaff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Abraham</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bock</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kietzmann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goedde</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Toksoz</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Droege</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Krobitsch</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Korn</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Birchmeier</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lehrach</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wanker</snm>
                  <fnm>EE</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2005</pubdate>
            <volume>122</volume>
            <issue>6</issue>
            <fpage>957</fpage>
            <lpage>968</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2005.08.029</pubid>
                  <pubid idtype="pmpid" link="fulltext">16169070</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>IntAct: an open source molecular interaction database</p>
            </title>
            <aug>
               <au>
                  <snm>Hermjakob</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Montecchi-Palazzi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lewington</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mudali</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kerrien</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Orchard</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Roechert</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Roepstorff</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Armstrong</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cesareni</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sherman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <issue>32 Database</issue>
            <fpage>D452</fpage>
            <lpage>455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308786</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681455</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh052</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The Database of Interacting Proteins: 2004 update</p>
            </title>
            <aug>
               <au>
                  <snm>Salwinski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Pettit</snm>
                  <fnm>FK</fnm>
               </au>
               <au>
                  <snm>Bowie</snm>
                  <fnm>JU</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <issue>32 Database</issue>
            <fpage>D449</fpage>
            <lpage>451</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308820</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681454</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh086</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Error and attack tolerance of complex networks</p>
            </title>
            <aug>
               <au>
                  <snm>Albert</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Barabasi</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <issue>6794</issue>
            <fpage>378</fpage>
            <lpage>382</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35019019</pubid>
                  <pubid idtype="pmpid" link="fulltext">10935628</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Lethality and centrality in protein networks</p>
            </title>
            <aug>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mason</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Barabasi</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Oltvai</snm>
                  <fnm>ZN</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>411</volume>
            <issue>6833</issue>
            <fpage>41</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35075138</pubid>
                  <pubid idtype="pmpid" link="fulltext">11333967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Why do hubs tend to be essential in protein networks?</p>
            </title>
            <aug>
               <au>
                  <snm>He</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>6</issue>
            <fpage>e88</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1473040</pubid>
                  <pubid idtype="pmpid" link="fulltext">16751849</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0020088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Conservation of gene order: a fingerprint of proteins that physically interact</p>
            </title>
            <aug>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>1998</pubdate>
            <volume>23</volume>
            <issue>9</issue>
            <fpage>324</fpage>
            <lpage>328</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(98)01274-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">9787636</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The use of gene clusters to infer functional coupling</p>
            </title>
            <aug>
               <au>
                  <snm>Overbeek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fonstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>D'Souza</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pusch</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Maltsev</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <issue>6</issue>
            <fpage>2896</fpage>
            <lpage>2901</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">15866</pubid>
                  <pubid idtype="pmpid" link="fulltext">10077608</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.6.2896</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Detecting protein function and protein-protein interactions from genome sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>285</volume>
            <issue>5428</issue>
            <fpage>751</fpage>
            <lpage>753</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.285.5428.751</pubid>
                  <pubid idtype="pmpid" link="fulltext">10427000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Protein interaction maps for complete genomes based on gene fusion events</p>
            </title>
            <aug>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Iliopoulos</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <issue>6757</issue>
            <fpage>86</fpage>
            <lpage>90</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/47056</pubid>
                  <pubid idtype="pmpid" link="fulltext">10573422</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Ge</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>4</issue>
            <fpage>482</fpage>
            <lpage>486</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng776</pubid>
                  <pubid idtype="pmpid" link="fulltext">11694880</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Grigoriev</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>17</issue>
            <fpage>3513</fpage>
            <lpage>3519</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55876</pubid>
                  <pubid idtype="pmpid" link="fulltext">11522820</pubid>
                  <pubid idtype="doi">10.1093/nar/29.17.3513</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Relating whole-genome expression data with protein-protein interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Jansen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Greenbaum</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>1</issue>
            <fpage>37</fpage>
            <lpage>46</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155252</pubid>
                  <pubid idtype="pmpid" link="fulltext">11779829</pubid>
                  <pubid idtype="doi">10.1101/gr.205602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Assigning protein functions by comparative genome analysis: protein phylogenetic profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <issue>8</issue>
            <fpage>4285</fpage>
            <lpage>4288</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">16324</pubid>
                  <pubid idtype="pmpid" link="fulltext">10200254</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.8.4285</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or "interologs"</p>
            </title>
            <aug>
               <au>
                  <snm>Matthews</snm>
                  <fnm>LR</fnm>
               </au>
               <au>
                  <snm>Vaglio</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Reboul</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Garrels</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vincent</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <issue>12</issue>
            <fpage>2120</fpage>
            <lpage>2126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311221</pubid>
                  <pubid idtype="pmpid" link="fulltext">11731503</pubid>
                  <pubid idtype="doi">10.1101/gr.205301</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Towards the prediction of complete protein &#8211; protein interaction networks</p>
            </title>
            <aug>
               <au>
                  <snm>Gomez</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Rzhetsky</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2002</pubdate>
            <fpage>413</fpage>
            <lpage>424</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11928495</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Integrative approach for computationally inferring protein domain interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>8</issue>
            <fpage>923</fpage>
            <lpage>929</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg118</pubid>
                  <pubid idtype="pmpid" link="fulltext">12761053</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Computational prediction of protein-protein interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Obenauer</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Yaffe</snm>
                  <fnm>MB</fnm>
               </au>
            </aug>
            <source>Methods Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>261</volume>
            <fpage>445</fpage>
            <lpage>468</lpage>
            <xrefbib>
               <pubid idtype="pmpid">15064475</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Predicting protein-peptide interactions via a network-based motif sampler</p>
            </title>
            <aug>
               <au>
                  <snm>Reiss</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Schwikowski</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>Suppl 1</issue>
            <fpage>I274</fpage>
            <lpage>282</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth922</pubid>
                  <pubid idtype="pmpid" link="fulltext">15262809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading</p>
            </title>
            <aug>
               <au>
                  <snm>Lu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Skolnick</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2002</pubdate>
            <volume>49</volume>
            <issue>3</issue>
            <fpage>350</fpage>
            <lpage>364</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10222</pubid>
                  <pubid idtype="pmpid" link="fulltext">12360525</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Interrogating protein interaction networks through structural biology</p>
            </title>
            <aug>
               <au>
                  <snm>Aloy</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <issue>9</issue>
            <fpage>5896</fpage>
            <lpage>5901</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">122873</pubid>
                  <pubid idtype="pmpid" link="fulltext">11972061</pubid>
                  <pubid idtype="doi">10.1073/pnas.092147999</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Extracting human protein interactions from MEDLINE using a full-sentence parser</p>
            </title>
            <aug>
               <au>
                  <snm>Daraselia</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yuryev</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Egorov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Novichkova</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nikitin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mazo</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>5</issue>
            <fpage>604</fpage>
            <lpage>611</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg452</pubid>
                  <pubid idtype="pmpid" link="fulltext">15033866</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Text mining for metabolic pathways, signaling cascades, and protein networks</p>
            </title>
            <aug>
               <au>
                  <snm>Hoffmann</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Krallinger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Andres</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tamames</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Blaschke</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Sci STKE</source>
            <pubdate>2005</pubdate>
            <volume>2005</volume>
            <issue>283</issue>
            <fpage>pe21</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/stke.2832005pe21</pubid>
                  <pubid idtype="pmpid" link="fulltext">15886388</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Evaluation of different biological data and computational classification methods for use in protein interaction prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Qi</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bar-Joseph</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Klein-Seetharaman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2006</pubdate>
            <volume>63</volume>
            <issue>3</issue>
            <fpage>490</fpage>
            <lpage>500</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20865</pubid>
                  <pubid idtype="pmpid" link="fulltext">16450363</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium</p>
            </title>
            <aug>
               <au>
                  <snm>Ashburner</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ball</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Eppig</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Issel-Tarver</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kasarskis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Matese</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Ringwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Sherlock</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2000</pubdate>
            <volume>25</volume>
            <issue>1</issue>
            <fpage>25</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/75556</pubid>
                  <pubid idtype="pmpid" link="fulltext">10802651</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology</p>
            </title>
            <aug>
               <au>
                  <snm>Camon</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dimmer</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Maslen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Binns</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <issue>32 Database</issue>
            <fpage>D262</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308756</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681408</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh021</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Use and misuse of the gene ontology annotations</p>
            </title>
            <aug>
               <au>
                  <snm>Rhee</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Draghici</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <issue>7</issue>
            <fpage>509</fpage>
            <lpage>515</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg2363</pubid>
                  <pubid idtype="pmpid" link="fulltext">18475267</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>PRoteomics for Emerging PAthogen REsponse (PREPARE)</p>
            </title>
            <url>http://www.prepare.med.ubc.ca/</url>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Intrinsic disorder is a common feature of hub proteins from four eukaryotic interactomes</p>
            </title>
            <aug>
               <au>
                  <snm>Haynes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Oldfield</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Ji</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Klitgord</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cusick</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Radivojac</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Uversky</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Iakoucheva</snm>
                  <fnm>LM</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>8</issue>
            <fpage>e100</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1526461</pubid>
                  <pubid idtype="pmpid" link="fulltext">16884331</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>UniProt batch retrieval system</p>
            </title>
            <url>http://beta.uniprot.org/?tab=batch</url>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Go Slim</p>
            </title>
            <url>http://www.geneontology.org/GO.slims.shtml</url>
         </bibl>
         <bibl id="B43">
            <title>
               <p>map2slim</p>
            </title>
            <url>http://search.cpan.org/~cmungall/go-perl/scripts/map2slim</url>
         </bibl>
         <bibl id="B44">
            <title>
               <p>the Gene Ontology</p>
            </title>
            <url>http://www.geneontology.org/</url>
         </bibl>
         <bibl id="B45">
            <title>
               <p>The elements of statistical learning; data mining, inference, and prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <publisher>New York: Springer</publisher>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B46">
            <title>
               <p>STATISTICA</p>
            </title>
            <url>http://www.statsoft.com/</url>
         </bibl>
         <bibl id="B47">
            <title>
               <p>KEGG for linking genomes to life and the environment</p>
            </title>
            <aug>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Araki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Goto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hirakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Katayama</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kawashima</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Okuda</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tokimatsu</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yamanishi</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <issue>36 Database</issue>
            <fpage>D480</fpage>
            <lpage>484</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238879</pubid>
                  <pubid idtype="pmpid" link="fulltext">18077471</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>The Biomolecular Interaction Network Database and related tools 2005 update</p>
            </title>
            <aug>
               <au>
                  <snm>Alfarano</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Andrade</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Anthony</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bahroos</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bajec</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bantoft</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Betel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bobechko</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Boutilier</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Burgess</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Buzadzija</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Cavero</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>D'Abreo</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Donaldson</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Dorairajoo</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dumontier</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Dumontier</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Earles</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Farrall</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Garderman</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gong</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gonzaga</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Grytsan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Gryz</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Haldorsen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Halupa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Haw</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hrvojic</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hurrell</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Isserlin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jack</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Juma</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Khan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kon</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Konopinsky</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Le</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ling</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Magidin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moniakis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Montojo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Muskat</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Paraiso</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Parker</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Pintilie</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pirone</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Salama</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Sgro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shan</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Shu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Siew</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Skinner</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Stasiuk</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Strumpf</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tuekam</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tao</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Willis</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wolting</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wrong</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Xin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yates</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pawson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ouellette</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Hogue</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <issue>33 Database</issue>
            <fpage>D418</fpage>
            <lpage>D424</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540005</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608229</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <issue>35 Database</issue>
            <fpage>D61</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1716718</pubid>
                  <pubid idtype="pmpid" link="fulltext">17130148</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl842</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Automatic clustering of orthologs and in-paralogs from pairwise species comparisons</p>
            </title>
            <aug>
               <au>
                  <snm>Remm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Storm</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>314</volume>
            <issue>5</issue>
            <fpage>1041</fpage>
            <lpage>1052</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.5197</pubid>
                  <pubid idtype="pmpid" link="fulltext">11743721</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Pfam: clans, web tools and services</p>
            </title>
            <aug>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Mistry</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schuster-Bockler</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hollich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lassmann</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <issue>34 Database</issue>
            <fpage>D247</fpage>
            <lpage>251</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347511</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381856</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>HMMER</p>
            </title>
            <url>http://hmmer.janelia.org/</url>
         </bibl>
         <bibl id="B53">
            <title>
               <p>InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes</p>
            </title>
            <aug>
               <au>
                  <snm>Ng</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>1</issue>
            <fpage>251</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165526</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519994</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg079</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions</p>
            </title>
            <aug>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>3</issue>
            <fpage>410</fpage>
            <lpage>412</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti011</pubid>
                  <pubid idtype="pmpid" link="fulltext">15353450</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Modeling prevention strategies for gonorrhea and Chlamydia using stochastic network simulations</p>
            </title>
            <aug>
               <au>
                  <snm>Kretzschmar</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>van Duynhoven</snm>
                  <fnm>YT</fnm>
               </au>
               <au>
                  <snm>Severijnen</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Am J Epidemiol</source>
            <pubdate>1996</pubdate>
            <volume>144</volume>
            <issue>3</issue>
            <fpage>306</fpage>
            <lpage>317</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8686700</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Ring vaccination</p>
            </title>
            <aug>
               <au>
                  <snm>Muller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schonfisch</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kirkilionis</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Math Biol</source>
            <pubdate>2000</pubdate>
            <volume>41</volume>
            <issue>2</issue>
            <fpage>143</fpage>
            <lpage>171</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s002850070003</pubid>
                  <pubid idtype="pmpid" link="fulltext">11039695</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
