<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-4-r51</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Evolution of protein complexes by duplication of homomeric interactions</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Pereira-Leal</snm>
               <mi>B</mi>
               <fnm>Jose</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jleal@igc.gulbenkian.pt</email>
            </au>
            <au id="A2">
               <snm>Levy</snm>
               <mi>D</mi>
               <fnm>Emmanuel</fnm>
               <insr iid="I2"/>
               <email>elevy@mrc-lmb.cam.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Kamp</snm>
               <fnm>Christel</fnm>
               <insr iid="I3"/>
               <email>mail@christelkamp.de</email>
            </au>
            <au id="A4">
               <snm>Teichmann</snm>
               <mi>A</mi>
               <fnm>Sarah</fnm>
               <insr iid="I2"/>
               <email>sat@mrc-lmb.cam.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Instituto Gulbenkian de Ci&#234;ncia, Apartado 14, P-2781-901 Oeiras, Portugal</p>
            </ins>
            <ins id="I2">
               <p>MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK</p>
            </ins>
            <ins id="I3">
               <p>Paul-Ehrlich-Institut, Federal Agency for Sera and Vaccines, Paul-Ehrlich-Stra&#223;e, 63225 Langen, Germany</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>4</issue>
         <fpage>R51</fpage>
         <url>http://genomebiology.com/2007/8/4/R51</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17411433</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-4-r51</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>3</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>15</day>
               <month>1</month>
               <year>2007</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>5</day>
               <month>4</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>05</day>
               <month>04</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Pereira-Leal et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Protein complex evolution</p>
      </shorttitle>
      <shortabs>
         <p>A study of yeast protein complexes, complexes of known three-dimensional structure in the Protein Data Bank and clusters of pair-wise protein interactions in the networks of several organisms revealed that duplication of homomeric interactions often results in the formation of complexes of paralogous proteins.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Cellular functions are accomplished by the concerted actions of functional modules. The mechanisms driving the emergence and evolution of these modules are still unclear. Here we investigate the evolutionary origins of protein complexes, modules in physical protein-protein interaction networks.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We studied protein complexes in <it>Saccharomyces cerevisiae</it>, complexes of known three-dimensional structure in the Protein Data Bank and clusters of pairwise protein interactions in the networks of several organisms. We found that duplication of homomeric interactions, a large class of protein interactions, frequently results in the formation of complexes of paralogous proteins. This route is a common mechanism for the evolution of complexes and clusters of protein interactions. Our conclusions are further confirmed by theoretical modelling of network evolution. We propose reasons for why this is favourable in terms of structure and function of protein complexes.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our study provides the first insight into the evolution of functional modularity in protein-protein interaction networks, and the origins of a large class of protein complexes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010001">Biochemistry and structural biology</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The success of genome sequencing projects has resulted in the accumulation of catalogues of genes for hundreds of genomes. Within each genome, the genes and their proteins interact to form complex networks with properties that transcend those of individual genes. One such network is formed by the totality of physical protein-protein interactions in the cell: the protein interaction network (PIN). These networks, like many other naturally occurring networks, such as the transcriptional <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp> and metabolic networks <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, have a modular organization <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. They are organized into a number of functional modules, which are sets of interacting proteins accomplishing discrete biological functions in relative spatial, temporal or chemical isolation from other modules in the network <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Protein complexes are functional modules in the sense that the protein subunits of the complex are sufficient for its function, even when isolated from the system, as has been demonstrated by <it>in vitro </it>reconstitution of active protein complexes in a variety of studies (for example, <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>).</p>
         <p>The mechanisms that drive the emergence and subsequent evolution of modularity in cellular networks are unclear. This is in part due to the fuzzy nature of the concept and the difficulty in identifying functional modules in cellular networks. Here, we focus on the evolution of one specific type of functional module, protein complexes, and propose an evolutionary route that accounts for the origin and evolution of a proportion of these modules. We hypothesize that duplication of self-interacting proteins (homomers) is critical for the establishment and evolution of a proportion of protein complexes, and hence of functional modularity in protein interaction networks (Figure <figr fid="F1">1</figr>). Our hypothesis was based on the following considerations.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>A hypothesis for the origins and evolution of protein complexes</p>
            </caption>
            <text>
               <p>A hypothesis for the origins and evolution of protein complexes. Gene duplication with conservation of protein interactions is frequent [9]. Self-interactions (homomeric interactions) have special structural properties (see text for details) that are conserved into the duplicated interaction between paralogous proteins (light-dark interaction). Interactions between paralogous proteins are more versatile functionally and structurally, and are systematically selected for in evolution. These interactions are central in the establishment and evolution of clusters in PINs and protein complexes.</p>
            </text>
            <graphic file="gb-2007-8-4-r51-1"/>
         </fig>
         <p>First, gene duplication and divergence is the most important force driving the expansion of eukaryotic proteomes (for example, <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>). Conservation of protein interactions is frequent after duplication and paralogous genes thus frequently share interaction partners <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Mathematical models of network evolution based on this principle of duplication and divergence result in networks that display topological properties comparable to those of biological protein interaction networks, in particular high clustering coefficients <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Clusters in protein networks are frequently part of protein complexes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. The clustering coefficient of a network (<it>C</it>) is a measure that quantifies how interconnected the proteins are <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, partly reflecting modularity of the network. So duplication followed by conservation of protein interactions is linked with modularity in theoretical simulations of network evolution.</p>
         <p>A second piece of evidence is that the oligomerization of multiple, identical subunits is a simple way of forming large, functional structures in a genetically economical manner. Smaller component subunits will fold more readily than a single large protein and are less prone to translational errors <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Multiple copies of the same protein will tend to be co-localized in the cell as they can be synthesized from the same mRNA. This may promote oligomerization, for example, by domain swapping <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> or other mechanisms. Furthermore, evolution of a homomeric interface by incremental mutation is aided by the fact that the effect of one advantageous mutation will apply to all subunits of a homo-oligomer, and is thus, <it>a priori</it>, the most likely type of interface to occur <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In protein interaction networks, homomeric interactions are indeed over-represented <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. They are also very abundant in complexes of known three-dimensional structure, present in 85% of all complexes in the Protein Quaternary Structure (PQS) database (Table <tblr tid="T1">1</tblr>).</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Data sets investigated in this study</p>
            </caption>
            <tblbdy cols="10">
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c cspan="6" ca="center">
                     <p>Pairwise interactions (%)</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c cspan="6">
                     <hr/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Dataset</p>
                  </c>
                  <c ca="center">
                     <p>PPIs/Complexes</p>
                  </c>
                  <c ca="center">
                     <p>No. of proteins</p>
                  </c>
                  <c ca="center">
                     <p>HD</p>
                  </c>
                  <c ca="center">
                     <p>PD</p>
                  </c>
                  <c ca="center">
                     <p>F(HD)</p>
                  </c>
                  <c ca="center">
                     <p>F(PD)</p>
                  </c>
                  <c ca="center">
                     <p>F(HI)</p>
                  </c>
                  <c ca="center">
                     <p>F(PI)</p>
                  </c>
                  <c ca="left">
                     <p>Description</p>
                  </c>
               </r>
               <r>
                  <c cspan="10">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c cspan="10" ca="left">
                     <p>
                        <b>Pairwise interactions</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="10">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Yeast [36]</p>
                  </c>
                  <c ca="center">
                     <p>1,011</p>
                  </c>
                  <c ca="center">
                     <p>753</p>
                  </c>
                  <c ca="center">
                     <p>1.9</p>
                  </c>
                  <c ca="center">
                     <p>13.4</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Manual curation of small scale data (does not include yeast two hybrid data)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Yeast-large [37]</p>
                  </c>
                  <c ca="center">
                     <p>15,393</p>
                  </c>
                  <c ca="center">
                     <p>4,741</p>
                  </c>
                  <c ca="center">
                     <p>1.8</p>
                  </c>
                  <c ca="center">
                     <p>6.2</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Compilation of small- and large-scale data</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Worm [39]</p>
                  </c>
                  <c ca="center">
                     <p>2,422</p>
                  </c>
                  <c ca="center">
                     <p>1,726</p>
                  </c>
                  <c ca="center">
                     <p>1.6</p>
                  </c>
                  <c ca="center">
                     <p>3.3</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>High-throughput (yeast two-hybrid)</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Fly [38]</p>
                  </c>
                  <c ca="center">
                     <p>3,384</p>
                  </c>
                  <c ca="center">
                     <p>2,877</p>
                  </c>
                  <c ca="center">
                     <p>2.9</p>
                  </c>
                  <c ca="center">
                     <p>9.1</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>High-throughput (yeast two-hybrid)</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c cspan="10" ca="left">
                     <p>
                        <b>Complexes</b>
                     </p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>MIPS [36]</p>
                  </c>
                  <c ca="center">
                     <p>216</p>
                  </c>
                  <c ca="center">
                     <p>1,185</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>32</p>
                  </c>
                  <c ca="center">
                     <p>27</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>Manual curation</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>TAP [40]</p>
                  </c>
                  <c ca="center">
                     <p>589</p>
                  </c>
                  <c ca="center">
                     <p>1,474</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>31</p>
                  </c>
                  <c ca="center">
                     <p>30</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>High-throughput tagging and mass spectometry</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>HMS-PCI [41]</p>
                  </c>
                  <c ca="center">
                     <p>741</p>
                  </c>
                  <c ca="center">
                     <p>1,758</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>33</p>
                  </c>
                  <c ca="center">
                     <p>27</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>High-throughput tagging and mass spectometry</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>PQS [29]</p>
                  </c>
                  <c ca="center">
                     <p>2509</p>
                  </c>
                  <c ca="center">
                     <p>3,124</p>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>85</p>
                  </c>
                  <c ca="center">
                     <p>11</p>
                  </c>
                  <c ca="left">
                     <p>Three-dimensional structures of protein complexes</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>PPIs/Complexes are the number of protein-protein interactions and protein complexes (for complexes) in the data sets, respectively. HD, homodimers; PD, dimers of paralogous proteins; F(HD) and F(PD) represent the frequency at which the complexes contain homodimers or dimers of paralogous proteins in any of the two <it>S. cerevisiae </it>protein interaction datasets. These numbers were obtained by computationally superimposing the PINs onto the complex data and are significantly higher than expected by chance at <it>p </it>&lt; 10<sup>-4 </sup>in all cases. F(HI) and F(PI) are the frequency of complexes with homomeric or paralogous interactions, respectively.</p>
            </tblfn>
         </tbl>
         <p>The third consideration is that when genes coding for proteins that form homodimers duplicate, conservation of interactions will generate dimers of paralogous proteins. In these, the stability associated with the homodimer is maintained, while at the same time asymmetry is introduced into the interaction. This asymmetry provides more degrees of evolutionary freedom and represents a source of functional novelty (discussed in <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>). This is illustrated by the anecdotal examples like the photosystem I (Figure <figr fid="F1">1</figr>), in which there is asymmetry in terms of the subunits bound to PsaA and PsaB, the two paralogous proteins at its core <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>These considerations suggest the following evolutionary scenario (see Figure <figr fid="F1">1</figr>), which we test in the work presented here. An initial interaction is established between two (or more) copies of the same protein (homomeric interactions; Figure <figr fid="F1">1</figr>, left). This is the stable 'seed' of a new complex, and functional and structural factors will contribute to this interaction being selected for conservation. Gene duplication and divergence with conservation of the interactions will then follow. This initially results in multiple homomeric and heteromeric complexes with different numbers of the two duplicates (Figure <figr fid="F1">1</figr>, middle), permitting functional and structural diversification. Over time, sequence divergence will produce distinct complexes with distinct functionalities. The complexes containing paralogous proteins will frequently be selected in evolution due to the advantages of asymmetry, and accretion of new interactions may follow. This evolutionary process is illustrated by the related complexes of the RecA recombinase homohexamer and the F1 ATP synthase &#945;3:&#946;3 hexamer (discussed below). These two functionally distinct complexes are likely to have evolved from a common homomeric ancestor <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>We test the evolutionary scenario hypothesized above by investigating the following corollaries: whether duplication of genes coding for homodimers is frequently accompanied by conservation of protein interactions in protein interaction networks; whether interactions between paralogous proteins are associated with high clustering in protein interaction networks; whether these interactions are over-represented in protein complexes obtained in large-scale proteomic experiments; whether interactions between paralogous proteins are over-represented in protein complexes of known three-dimensional structure; whether these interactions are older than other interactions and, hence, paralogous dimerization precedes accretion of further interactions, as well as whether the establishment of dimers of paralogues is associated with asymmetry of protein interactions.</p>
         <sec>
            <st>
               <p>Duplication of homodimers with conservation of interactions</p>
            </st>
            <p>It is known from previous work that gene duplication accompanied by conservation of interactions is common in PINs for both homomers and interactions between non-homologous proteins <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B19">19</abbr></abbrgrp>. We have calculated the frequency of interactions between paralogues in four independent datasets of protein interaction networks studied here (Table <tblr tid="T1">1</tblr>). We used structural assignments to detect homology, thus considering even very distant evolutionary relationships, as described in Materials and methods. We found that interactions between paralogues are significantly more frequent than expected by chance (Figure <figr fid="F2">2</figr>). In order to investigate the evolutionary origins of interactions between paralogues, we determined the conditional probabilities for a protein that forms a paralogous dimer to also be a homodimer. The observation that interactions in homodimers and paralogous dimers are not independent (Table <tblr tid="T2">2</tblr>) supports an evolutionary link between these two types of dimers, such that paralogous dimers evolved by duplication of homodimers. These observations support the corollary that duplication of homomers is frequently accompanied by conservation of interactions.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Evolutionary origin of dimers of paralogues</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>P(HD)</p>
                     </c>
                     <c ca="center">
                        <p>P(PD)</p>
                     </c>
                     <c ca="center">
                        <p>P(HD|PD)</p>
                     </c>
                     <c ca="center">
                        <p>P(PD|HD)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Yeast</p>
                     </c>
                     <c ca="center">
                        <p>0.034 &#177; 0.006</p>
                     </c>
                     <c ca="center">
                        <p>0.134 &#177; 0.011</p>
                     </c>
                     <c ca="center">
                        <p>0.043 &#177; 0.006</p>
                     </c>
                     <c ca="center">
                        <p>0.17 &#177; 0.012</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Yeast-large</p>
                     </c>
                     <c ca="center">
                        <p>0.027 &#177; 0.001</p>
                     </c>
                     <c ca="center">
                        <p>0.062 &#177; 0.002</p>
                     </c>
                     <c ca="center">
                        <p>0.203 &#177; 0.003</p>
                     </c>
                     <c ca="center">
                        <p>0.466 &#177; 0.004</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Fly</p>
                     </c>
                     <c ca="center">
                        <p>0.047 &#177; 0.004</p>
                     </c>
                     <c ca="center">
                        <p>0.091 &#177; 0.006</p>
                     </c>
                     <c ca="center">
                        <p>0.082 &#177; 0.006</p>
                     </c>
                     <c ca="center">
                        <p>0.257 &#177; 0.008</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Worm</p>
                     </c>
                     <c ca="center">
                        <p>0.031 &#177; 0.003</p>
                     </c>
                     <c ca="center">
                        <p>0.033 &#177; 0.003</p>
                     </c>
                     <c ca="center">
                        <p>0.355 &#177; 0.008</p>
                     </c>
                     <c ca="center">
                        <p>0.379 &#177; 0.008</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Dimers of paralogues are frequently also homodimers. HD, homodimer; PD, dimer of paralogues. P(HD|PD) should be read as the conditional probability of a polypeptide forming a homodimer given that it also participates in forming a dimer of paralogues. The standard deviations for each probability are calculated from &#8730;(p(1 - p)/n) where p is the estimated probability and n the number of observations. The enrichment observed with the conditional probabilities is significant for all interaction datasets except the small yeast network.</p>
               </tblfn>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Dimers of paralogues in the protein-protein interaction network</p>
               </caption>
               <text>
                  <p>Dimers of paralogues in the protein-protein interaction network. On top is a cartoon illustrating how paralogous dimers result from the duplication of proteins that form homodimers. The bar chart shows the fraction of paralogous dimers (gray bars) in four protein interaction networks compared with random expectation levels, obtained by 10,000 network randomizations by shuffling evolutionary relationships (<it>p </it>&lt; 10<sup>-4</sup>; see Materials and methods for details).</p>
               </text>
               <graphic file="gb-2007-8-4-r51-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Duplication of homodimers and network clusters</p>
            </st>
            <p>We next investigated whether the duplication of homodimers is associated with protein complexes in PINs. We consider the average clustering coefficient (<it>C</it>) of a network as a descriptor of the extent of modularity of that network. Clusters frequently correspond to known protein complexes, as shown by <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp> and by us (Supplementary material S1 in Additional data file 1; illustrated in Figure <figr fid="F3">3a</figr>). This parameter captures the frequency of densely connected subgraphs in the network <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, and allows us to measure modularity in a PIN without specific knowledge of the identity of the protein complexes. We show here that PINs are more clustered than randomized networks with exactly the same broad degree distributions as the real yeast PIN (Figure <figr fid="F3">3b</figr>), extending previous analysis where it was shown that protein interaction networks were more clustered than random Erd&#245;s-R&#233;nyi networks and random power-law networks <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. This provides strong support for the biological significance of clusters in networks.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Clusters in PINs</p>
               </caption>
               <text>
                  <p>Clusters in PINs. <b>(a) </b>A small section of a PIN in <it>S. cerevisiae </it>is represented as a graph where nodes correspond to proteins and edges to physical interactions between pairs of proteins. One definition of a module in this work is a highly connected subgraph, such as that shaded in the figure (left), in which the central (green) node has a maximum clustering coefficient (<it>C </it>= 1). A clustering coefficient can be calculated for each protein in the network and measures the number of interactions between neighbors of that protein, divided by the total number of possible interactions between those neighbors. In this example, the green node and its fully connected neighborhood correspond to the protein complex AP-2 [49]. Fully connected subgraphs can also represent interactions that are dissociated in time and/or in space. For example, the shaded cluster on the right represents members of the basic helix-loop-helix transcriptional regulator family, in which duplication of a homodimeric protein with inheritance of interactions resulted in Max existing as a homodimer, as well as distinct dimers of paralogous proteins (c-Myc and Mad1) [34,35]. <b>(b) </b>Cumulative frequency distribution of the clustering coefficients in the Yeast PIN and in randomized networks with exactly the same degree distribution (scale-free random; see the Randomization by link shuffling section in Materials and methods for details). This shows that high clustering of real PINs, and thus their modularity, is a characteristic of their biology and not of the degree distribution. <b>(c) </b>Cartoon illustrating the consequences of duplication with conservation of interactions for the clustering coefficient of node (protein) <it>i </it>(<it>C</it><sub><it>i</it></sub>). In each case the network is shown before and after duplication of a protein that either interacts with itself or does not. The bottom part of the cartoon summarizes the effect on the clustering coefficient of the protein. <b>(d) </b>Cumulative frequency distribution of clustering coefficients in the simulated networks, with varying proportions of self-interactors at the start of the simulation. The fraction of proteins with higher clustering coefficients increases with the proportion of self-interactors.</p>
               </text>
               <graphic file="gb-2007-8-4-r51-3"/>
            </fig>
            <p>We investigated whether duplication plays a role in determining the clustering levels of the network. The duplication and conservation scenarios in Figure <figr fid="F3">3c</figr> suggest that only duplication of proteins that form homodimers, and not other proteins, will lead to an increase in the clustering coefficient of the network. To investigate this, we implemented a theoretical model of network growth by duplication-divergence <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp> and asked whether inclusion of self-interactions in the model increases the global clustering coefficient.</p>
            <p>As shown in Figure <figr fid="F3">3d</figr>, the presence of self-interacting proteins increases the clustering of the network in this model. In particular, the higher the initial proportion of self-interactors, the higher the clustering of the resulting networks (see Materials and methods and Supplementary material S2 in Additional data file 1 for details of the modeling procedure). This is consistent with the result obtained in a previous theoretical study of network evolution by duplication-divergence <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. The increases in clustering levels in this simplified model are modest, suggesting that additional mechanisms must operate in the evolution of real networks, and that only a subset of protein complexes are derived by the mechanism proposed here.</p>
            <p>Conversely, when we consider the four real PINs (Table <tblr tid="T1">1</tblr>) and ask the opposite question, whether selective removal of interactions between paralogous proteins reduces the global clustering of the network, we find that this is the case. Clustering levels are reduced by between 7% and 15% (Supplementary material S3 in Additional data file 1). This is significantly more than obtained by removal of other interactions, which has negligible effects on the global clustering of the network. These small but significant reductions are consistent with the modeling results, further supporting that this mechanism operates in the evolution of a subset of protein complexes.</p>
            <p>This result is subject to the following caveats. First, in some cases the formation of a cluster is not due to a single multi-protein complex, but many alternative ones, which may not co-exist in time and in space. This has been described in transcription factor families <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>, and is illustrated in Figure <figr fid="F3">3a</figr>. Secondly, the graph representation we use for PINs is, in itself, limited; for example, it ignores the stoichiometry of the different subunits within protein complexes. For example, a protein complex composed of six identical subunits forming a ring would be depicted as a single self-interacting node, and not captured as a cluster in the PIN. Thus, although considering PINs gives us a network perspective on protein complexes and also large numbers of interactions and increased statistical power, we need to consider alternative definitions of protein complexes to substantiate the above result. So, we next investigated experimentally derived protein complexes.</p>
         </sec>
         <sec>
            <st>
               <p>Paralogous subunits in protein complexes</p>
            </st>
            <p>We tested the corollary that there is an over-representation of interactions between paralogous proteins within protein complexes. We considered two distinct types of protein complex data. The first is composed of three independent data sets of protein complexes in <it>S. cerevisiae </it>(Table <tblr tid="T1">1</tblr>) and is discussed in this section; the second is composed of protein complexes of known three-dimensional structure, and will be considered in the next section.</p>
            <p>First, we found that in all three <it>S. cerevisiae </it>datasets, about one-third of the protein complexes contain duplicated proteins, which is more than expected by chance (Figure <figr fid="F4">4a</figr>). We then wanted to check whether the duplicates physically contact each other within these complexes. However, the three <it>S. cerevisiae </it>protein complex datasets lack information on the physical interactions, or interfaces, formed between the constituent proteins of a complex, as well as the stoichiometry of the complexes, that is, how many copies of each protein are present. Therefore, we computationally overlaid all the protein interactions (Yeast and Yeast-large) onto the protein complexes and asked whether the paralogous subunits of complexes physically interact. Of the complexes for which protein interactions can be superimposed, 27% to 30% have interacting homologous proteins (Table <tblr tid="T1">1</tblr>). The TAP and HMS-PCI datasets are the result of large-scale experiments, and some redundancy may exist, deriving from multiple baits picking up the same complex. For the TAP dataset, the authors provide a smaller set of predicted complexes based on bioinformatics methods. We repeated the calculation on this set of predicted TAP complexes and found that the frequencies at which the complexes contain homodimers (F(HD)) or dimers of paralogous proteins (F(PD)) are 43% and 34%, respectively. In all cases we found that this enrichment is highly significant at <it>p </it>&lt; 10<sup>-4</sup>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Duplication of subunits in protein complexes</p>
               </caption>
               <text>
                  <p>Duplication of subunits in protein complexes. <b>(a) </b>Nearly 40% of the protein complexes have homologous subunits (gray bars). These levels are higher than expected by chance (white bars). Random expectation levels are the averages of 10,000 randomized protein complex datasets, where the complex size distribution is kept constant. While the MIPS dataset is the result of manual curation (see table 1), both TAP and HMS-PCI are the result of large-scale experiments, and some redundancy may exist from multiple baits picking up the same complex. For the TAP dataset, the authors provide a smaller set of predicted complexes based on bioinformatics methods. We repeated the calculation on this set of predicted TAP complexes and found that 47% of the complexes have duplicated subunits, while 18 &#177; 2% would be expected at random. The significance level remains the same for this predicted set of complexes as for the raw purification data. <b>(b) </b>Percentage of complexes of known three-dimensional structure that have duplicated subunits, as a function of complex size. Grey bars are for the complete data set, whereas black bars are from a dataset that excludes purely homomeric complexes, as these dominate the dataset (see Table 1) and may distort the results. On average, between 9% and 30% of the complexes display duplicated subunits (including and excluding purely homomeric complexes, respectively). This is not an artifact introduced by the complex size distribution.</p>
               </text>
               <graphic file="gb-2007-8-4-r51-4"/>
            </fig>
            <p>Thus, analysis of the three sets of <it>S. cerevisiae </it>protein complex datasets supports the corollary that interacting paralogues are over-represented amongst protein complexes.</p>
         </sec>
         <sec>
            <st>
               <p>Paralogous subunits in protein complexes of known 3D structure</p>
            </st>
            <p>Next we concentrated on the set of protein complexes with known three- dimensional structure (Table <tblr tid="T1">1</tblr>) to further test the corollary that there is an over-representation of interfaces between paralogues within each protein complex. This dataset, obtained from the PQS database, is an automatically generated subset of the PDB containing solely oligomers <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. In PQS, the proportion of complexes with paralogues is comparable to the <it>S. cerevisiae </it>complex datasets, at 30% (Figure <figr fid="F4">4b</figr>). The advantage of studying this dataset is that it can provide stoichiometry and interaction maps for complexes, that is, we can test directly whether paralogues interact.</p>
            <p>Consistent with our hypothesis, we found that the frequency of interactions between paralogues within a protein complex is higher than would be expected by chance, while that of homomeric interactions is lower (Figure <figr fid="F5">5a</figr>; see Materials and methods for an explanation on how the expected values are calculated). One example is the mitochondrial F1/Fo ATP synthase complex (Figure <figr fid="F5">5b</figr>), which contains interacting paralogous subunits <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. While it could potentially establish homomeric contacts, no such contacts exist in the complex, illustrating how homomeric interactions can be under-represented compared to the random scenario. Thus, we have shown that paralogues not only frequently interact within protein complexes, but also appear to interact preferentially compared to homomeric interactions and interactions between evolutionarily unrelated proteins.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Duplicated subunits in complexes interact</p>
               </caption>
               <text>
                  <p>Duplicated subunits in complexes interact. <b>(a) </b>Interactions between paralogous subunits (red) are more frequent than expected given the stoichiometry of subunits within protein complexes. Chains from PQS complexes were binned according to probability of forming a homomeric interaction or interactions between paralogous or different chains (see Materials and methods). The frequencies at which these chains form homodimers and paralogous dimers (averaged for each bin) are shown as blue and red bars, respectively. In a random scenario, all the points lie within the range shown in the black lines. <b>(b) </b>Possible arrangements of two distinct subunits in a hexameric ring like that of the F1 complex. The actual F1 complex is shown on the left. Bars of different colors correspond to different inter-subunit interfaces. <b>(c) </b>If there are multiple identical and paralogous chains within a protein complex, the chains tend to be arranged in three-dimensional space so that the paralogous chains rather than identical chains are contacting each other, corresponding to the scenario shown on the left. Note that when there is a choice, interactions between paralogous proteins are always preferred. This experiment is similar to that described in (a), but considering only the two types of interaction in the calculations. <b>(d) </b>The role of oligomers of paralogues in generating structural diversity. n is the number of protein complexes found in PQS that have identical chains (left) or paralogous chains (right), which contact the same (top) or distinct binding partners (bottom). Hetero-oligomers that contain paralogous dimers are more frequently asymmetrical (10/31) than those containing homomers (6/210), that is, paralogues tend to bind different partners. The complexes shown illustrate the four possible situations. Top left is the tryptophan synthase from <it>Salmonella typhimurium</it>, in which the homomeric &#945;:&#945; dimer (blue) binds one &#946; subunit on each side (yellow) [50], which represents symmetry in binding partners of homomers. Top right is the photosynthetic reaction centre from <it>Rhodopseudomonas viridis</it>, in which both paralogous L and M chains (blue and purple) bind to the H and C subunits (shown in yellow) [51], which illustrates symmetry in the binding partners of a dimer of paralogues. Bottom left is the structure of the Rac1 small GTPase bound to the arfaptin-1 homodimer [52] from <it>Homo sapiens</it>, in which Rac1 binds solely one of the arfaptin chains, but occupies a volume that excludes the possibility of additional Rab molecules binding the other arfaptin chain; this illustrates the rare cases of asymmetry in the binding partners of homomers. Bottom right is the RNA polymerase from <it>S. cerevisiae </it>[53], in which many peripheral subunits decorate the central core formed by the dimer of paralogues A:B, which illustrates the creation of asymmetry by the duplication of the ancestral homodimer [32].</p>
               </text>
               <graphic file="gb-2007-8-4-r51-5"/>
            </fig>
            <p>To further investigate this, we repeated the experiment shown in Figure <figr fid="F5">5a</figr>, but considering only subunits that can establish homo-interactions as well as interactions between paralogues. This is equivalent to determining what choice is made in a situation such as that represented in Figure <figr fid="F5">5b</figr>. We found that given a choice, in almost all cases a preference for interactions between paralogues will be made, as shown in Figure <figr fid="F5">5c</figr>. The reason for this is likely to be that this type of geometrical arrangement of proteins within complexes requires the smallest number of different interfaces to be formed, and so is the most parsimonious evolutionary scenario. In the F1 sub-complex, the three &#945; and three &#946; subunits alternate within the hexameric ring <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, so that only two different interfaces are formed (&#945;:&#946; and &#946;:&#945;; Figure <figr fid="F5">5b</figr>, left).</p>
         </sec>
         <sec>
            <st>
               <p>Evolutionary cores of protein complexes and asymmetry</p>
            </st>
            <p>Our hypothesis is that many protein complexes start with homomeric interactions that duplicate and diversify, and serve as a seed for the coalescence of further subunits. The photosystem I shown in Figure <figr fid="F1">1</figr> illustrates this concept. In <it>Heliobacteria</it>, the complex contains a homodimer at its core (PshA<sub>2</sub>), whereas the eukaryotic complex contains a dimer of paralogues (PsaA:PsaB). These two paralogous polypeptide chains are each decorated by different peripheral subunits, suggesting that in this class of photosystem (Type-I RC), the core was established prior to the accretion of further subunits <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. Another example is RNA polymerase II, which contains at its core a large dimer of homologous subunits, and is believed to have evolved from an ancestral generic nucleic acid binding homodimer <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>.</p>
            <p>To investigate whether this is a frequent mechanism of evolution of complexes, we tested the fifth corollary and asked whether homomeric interactions and interactions between homologous proteins precede interactions between unrelated proteins in evolution. Then we tested whether paralogues within complexes of known three-dimensional structure have asymmetric interactions.</p>
            <p>To answer the first question, we compared PINs in different organisms and asked whether homomeric interactions and interactions between paralogues in one organism are likely to be conserved in the PINs in other organisms, that is, whether the protein(s) have orthologues that interact. Such pairs of interactions have been termed interologs in <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. In Table <tblr tid="T3">3</tblr> we show that self-interactions and dimers of paralogues are three to seven times more likely to be conserved from yeast to fly and worm than interactions between unrelated proteins. However, due to the small number of conserved interactions, these results are not definitive. To gain a larger coverage of the yeast proteins, we tested whether proteins that establish interactions with identical and/or with homologous proteins are older than other proteins. We estimated the likely time of origin of each gene in <it>S. cerevisiae </it>by phylogenetic profiling and analyzed both the Yeast and Yeast-large PINs as described in Materials and methods. Homomeric proteins and those that interact with paralogues are significantly older than other proteins, tending to be present in all Eukaryotes and either in Bacteria or Archaea (Yeast) or all Eukaryota (Yeast-large). Other proteins tend to be present only in Fungi and Metazoa, but not in other Eukaryota. The difference in evolutionary conservation is statistically significant in both data sets (<it>p </it>= 0.003 for yeast and <it>p </it>&lt; 0.001 for Yeast-large, Wilcoxon-Mann-Whitney test). These two results support the corollary that the establishment of homomeric interactions and the duplication to dimers of paralogous chains are early events in the emergence of a protein complex.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Conservation of yeast protein interactions</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>P(Fly)</p>
                     </c>
                     <c ca="center">
                        <p>P(Fly|HPD)</p>
                     </c>
                     <c ca="center">
                        <p>P(Worm)</p>
                     </c>
                     <c ca="center">
                        <p>P(worm|HPD)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Yeast</p>
                     </c>
                     <c ca="center">
                        <p>0.020 (10/409)</p>
                     </c>
                     <c ca="center">
                        <p>0.051 (4/79)</p>
                     </c>
                     <c ca="center">
                        <p>0.004 (2/457)</p>
                     </c>
                     <c ca="center">
                        <p>0.027 (2/75)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Yeast-large</p>
                     </c>
                     <c ca="center">
                        <p>0.009 (56/6113)</p>
                     </c>
                     <c ca="center">
                        <p>0.061 (34/559)</p>
                     </c>
                     <c ca="center">
                        <p>0.001 (3/5823)</p>
                     </c>
                     <c ca="center">
                        <p>0.005 (3/547)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>We consider a protein interaction in yeast to be conserved in another organism if both interacting proteins in yeast have orthologs in the other organism, and the orthologous proteins interact. P(Fly) and P(Worm), probability of a protein interaction in a <it>S. cerevisiae </it>PIN to be conserved in Fly and Worm, respectively. P(Fly|HPD) and P(Worm|HPD), probability of a protein interaction in a <it>S. cerevisiae </it>PIN to be conserved in fly and worm given that it is a homomeric interaction or an interaction between paralogous proteins.</p>
               </tblfn>
            </tbl>
            <p>To test whether paralogous proteins break the symmetry of a complex and allow accretion of different types of subunits, we considered the protein complexes of known three-dimensional structure again. We compared the set of complexes that contain paralogues to the complexes that contain homomeric interactions and no paralogues. As shown in Figure <figr fid="F5">5d</figr>, we found that 32% of paralogues have asymmetrical interactions, while only 4% of the homomers do, a significant difference (<it>p </it>&lt; 0.001). Thus, the hypothesis that duplication of homomers results in new asymmetrical complexes is supported by the data. This may represent part of the selective advantage for conservation of such duplications.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We present here a genome-wide, cross-species analysis of the origins and evolution of protein complexes. At the beginning, we hypothesized that duplication of self-interacting proteins (homomers) is an evolutionary path leading to the establishment and evolution of many complexes. To substantiate this hypothesis, we tested five corollaries that arise from such an evolutionary scenario.</p>
         <p>The first corollary is that duplication of genes coding for homodimers is frequently accompanied by conservation of protein interactions. Conservation of protein interactions after gene duplication has been shown to be frequent <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B19">19</abbr></abbrgrp>. We show here that between 4% and 13% of interactions in PINs are between paralogous proteins.</p>
         <p>Next we tested the association between clustering of the network and interactions between paralogous proteins. Clusters in protein interaction networks frequently represent protein complexes. We have shown that removal of interactions between paralogues causes a small but highly significant decrease in the global clustering level of the network. This is consistent with our theoretical modeling results.</p>
         <p>We then observed that about 30% of protein complexes from proteomics experiments contain duplicated subunits. In protein complexes of known three-dimensional structure, a similar proportion of complexes have duplicated subunits, and more importantly, there is preferential binding of paralogous subunits. This supports the corollary that interactions between paralogues are frequent in complexes.</p>
         <p>We observed that proteins involved in homomeric interactions and interactions between paralogues were more conserved than other proteins: more than half of yeast proteins had orthologues in all eukaryotes and either archaea or bacteria, whereas more than half of the other yeast proteins had orthologues only in fungi and animals. Homomeric interactions and those between paralogous proteins were also three to seven times more likely to be conserved, when compared to other interactions. Thus, this supports the corollary that homomers and oligomers of paralogues represent the first steps in the evolution of new protein complexes, with other subunits added later.</p>
         <p>Finally, we showed that amongst three-dimensional structures of complexes, 32% of dimers of paralogues establish asymmetric interactions with other proteins whereas only 3% of homodimers show such asymmetry, further substantiating that the duplication of homomeric interactions helps to create asymmetry in protein interactions, and allows the coalescence of other subunits in the complex.</p>
         <p>Altogether, our data suggest an evolutionary route to the formation and specialization of many extant protein complexes. On this route, homomers and oligomers of paralogous subunits represent an ancestral core around which further subunits can coalesce in evolution. Sequence divergence of the paralogous subunits creates the asymmetry that permits the accretion and diversification of interactions. In addition, divergence of paralogues may be involved in functional specialization of complexes. The biases inherent in each data type make it difficult to determine the exact fraction of protein complexes that evolved via the proposed route. A higher bound is about one-third, estimated by the fraction of proteomics complexes that display duplicated subunits. A lower bound is less than one-tenth, estimated by the fraction of dimers of paralogues in one of the yeast two-hybrid data sets (Table <tblr tid="T1">1</tblr>).</p>
         <p>Another issue that at this stage is difficult to ascertain is the nature of the complexes that emerged by the proposed route. If we assume that both the proteomics data and the crystallographic data represent an enrichment for stable protein complexes, then our proposed evolutionary route appears to be more prevalent in stable complexes. In fact, most examples discussed in the text are stable complexes. They also appear to be complexes that were established very early in evolution, which is illustrated by the ages of the proteins that establish homomeric interactions and interactions between duplicates.</p>
         <p>We have shown previously that duplication of protein interactions and of entire protein complexes is accompanied by specialization of function <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Inspection of the effects of duplication of homo-interactions suggests a similar outcome. In other words, the main function is established when the homomer is first formed, and then duplications will serve to specialize these functions. For example, in Figure <figr fid="F3">3a</figr> the transition from homodimer to dimers of paralogous proteins of the helix-turn-helix transcription factors results in specialization of the function of the complex, that is, distinct but overlapping specificities in DNA binding <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. Other examples of functional specialization are in the ATP synthase and proteosome families, as discussed in Additional data file 1.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Our investigations of protein interactions and protein complexes, as well as theoretical modeling, reveal that many protein complexes evolved by the initial establishment of self-interactions followed by duplication of these self-interacting proteins. Our study provides the first insight into the evolution of functional modularity in protein-protein interaction networks, and the origins of a large class of protein complexes.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Datasets of protein interactions and protein complexes</p>
            </st>
            <p>Binary physical protein-protein interactions for <it>S. cerevisiae </it><abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>, <it>Drosophila melanogaster </it>(high confidence interactions) <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and <it>Caenorhabditis elegans</it>. <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, as well as protein complex datasets for <it>S. cerevisiae </it><abbrgrp><abbr bid="B36">36</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp> and complexes of known three-dimensional structure used in this study <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> are summarized in Table <tblr tid="T1">1</tblr>.</p>
            <p>A non-redundant set of protein complexes of known structure, based on the PQS database as of June 2005, was prepared by considering complexes as graphs where nodes are the protein subunits (labeled by the domain architecture and chain identity) and edges are a contact between these subunits: two complexes were considered identical when they had the same subunits (same domain architectures, that is, identical or homolgous chains) and the same contact topology between subunits. Details of this procedure can be found in <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Detection of gene duplication and contacts between chains</p>
            </st>
            <p>We used domain architecture as defined in the Superfamily database <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp> to identify paralogous proteins in PINs, that is, those proteins resulting from duplication of the corresponding genes. The SUPERFAMILY database provides protein domain assignments, at the SCOP 'superfamily' level <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, for the predicted protein sequences in completed genomes. Domain assignments were generated using a curated set of profile hidden Markov models. In this work, two proteins are considered paralogous if they display the same amino- to carboxy-terminal domain architecture, ignoring gaps and tandem domain repetitions as described in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Domain assignments were based on Superfamily release 1.63 <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
            <p>In the analysis of protein complexes from PQS we considered two chains to be identical when strict sequence identity was found, and accepted gaps at the amino and carboxyl termini of the sequences. Two chains were considered homologous when they displayed the same amino- to carboxy-terminal SCOP superfamily domain architecture, and to be different when they did not satisfy any of the above criteria. We used a cut-off of five amino acids with atoms within their van der Waal's radii plus 0.5 &#197; for two chains to be considered in contact. The expected frequency for a given chain to form a homo- or a paralogous contact (P<sub>h </sub>and P<sub>p</sub>, respectively) within a complex was calculated by counting the number of times the given chain made one or more homo- or paralogous contacts (N<sub>h </sub>and N<sub>p</sub>, respectively) in a set resulting from 500 randomizations of that protein complex. Randomizations consisted of considering the topology of each complex fixed, and shuffling the position of each chain within the complex. The expected frequencies were estimated by P<sub>h </sub>= N<sub>h</sub>/500 and P<sub>p </sub>= N<sub>p</sub>/500.</p>
         </sec>
         <sec>
            <st>
               <p>Network randomization</p>
            </st>
            <p>To investigate the effect of correlations in the network in terms of evolutionary relationships or topological organization, the following randomization schemes were applied.</p>
            <sec>
               <st>
                  <p>Randomization by domain architecture shuffling</p>
               </st>
               <p>To test for statistical significance of the measured parameters, we performed 10,000 network randomizations, in which the topology of the network was kept constant, and the evolutionary relationships between proteins, that is, their Superfamily domain assignments, were shuffled.</p>
            </sec>
            <sec>
               <st>
                  <p>Randomization by link shuffling</p>
               </st>
               <p>To measure the influence of local organization of network structure, link shuffling was used <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. Repeated swapping of interaction partners among pairs of interacting proteins preserves the degree of each individual node in the network but destroys higher order topological correlations and structures such as clustering.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Modeling of the growth of the network by gene duplication</p>
            </st>
            <p>We implemented a theoretical model of network evolution based on the concepts proposed in <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. In this model we started with x = 340 proteins, representing the total number of 241 protein families and 29% of unassigned proteins in the Yeast dataset. We randomly introduced an interaction between any pair of proteins with a probability 0.0059 = 2/339, leading to a classic random graph with a Poissonian degree distribution and an average degree of two. The network is then allowed to grow until it reaches the same size as the Yeast network (neglecting isolated nodes generated during the simulation). The parameter &#948; for the probability to delete a link under duplication and &#945; for random re-linking of a new node to older nodes in the network was chosen with the aim of obtaining realistic network features (that is, degree distribution) in the final network, that is &#948; = 0.9 and &#945; = 0 or &#945; = 0.1. For more details, see Supplementary information S2 in Additional data file 1.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic profiling</p>
            </st>
            <p>We used Smith-Waterman alignments to identify orthologs of yeast genes in the genomes of 40 organisms, representing the three branches of the tree of life, and the major taxonomical groups within each of the branches. We used the Smith-Waterman implementation of the TimeLogic's DeCypher<sup>&#174; </sup>accelerated hardware. The significance of each hit is based on a PSCORE statistic where the <it>p </it>value is a real number between 0 and 1 describing the probability of a hit being random. The significance is based on the histogram fitting method and we used a cutoff of <it>p </it>&lt; 0.01. A complete list of the organisms studied is shown in Additional data file 1 (Supplementary material S5). We considered two proteins to be orthologous if they were bidirectional best hits. The 'age' groups we can define based on available genomes are, starting from the most recent, '<it>S. cerevisiae </it>specific'; 'Saccharomyceta'; 'Fungi'; 'Fungi/Metazoa'; 'Fungi/Metazoa/Amoebozoa'; 'Eukaryota'; 'Eukaryota+Archaea' or 'Eukaryota+Bacteria'; 'universal'. The eukaryotic tree used as reference is that in <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> contains additional figures as well as raw data for the plots in Figures <figr fid="F4">4</figr> and <figr fid="F5">5</figr>. The data used and results from this study can be accessed from the companion website <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Additional figures and raw data for the plots in Figures <figr fid="F4">4</figr> and <figr fid="F5">5</figr></p>
            </caption>
            <text>
               <p>Additional figures and raw data for the plots in Figures <figr fid="F4">4</figr> and <figr fid="F5">5</figr></p>
            </text>
            <file name="gb-2007-8-4-r51-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are grateful for the hospitality and scientific discussions on networks that CK experienced with the members of the physics department at Imperial College, London, and the University of Oslo. We wish to thank Joel Janin, Daniela Stock, Kiyoshi Nagai, Tony Crowther, Cyrus Chothia, Benjamin Audit and the members of the Theoretical and Computational Biology group at the MRC-LMB for useful discussions. We are grateful to Nick Luscombe, Madan Babu, Christine Vogel, Valerie Hindie, Siarhei Maslau and Patrick Aloy for critical reading of the manuscript. We thank the MRC, EMBO, and the postdoctoral program of the German Academic Exchange Service (DAAD) for funding.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Computational discovery of gene modules and regulatory networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Bar-Joseph</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gerber</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Rinaldi</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Robert</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gordon</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Fraenkel</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Jaakkola</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gifford</snm>
                  <fnm>DK</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2003</pubdate>
            <volume>21</volume>
            <fpage>1337</fpage>
            <lpage>1342</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt890</pubid>
                  <pubid idtype="pmpid" link="fulltext">14555958</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Revealing modular organization in the yeast transcriptional network.</p>
            </title>
            <aug>
               <au>
                  <snm>Ihmels</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Friedlander</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bergmann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sarig</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Ziv</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Barkai</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>31</volume>
            <fpage>370</fpage>
            <lpage>377</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12134151</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Hierarchical organization of modularity in metabolic networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Ravasz</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Somera</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Mongru</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Oltvai</snm>
                  <fnm>ZN</fnm>
               </au>
               <au>
                  <snm>Barabasi</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>297</volume>
            <fpage>1551</fpage>
            <lpage>1555</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1073374</pubid>
                  <pubid idtype="pmpid" link="fulltext">12202830</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Detection of functional modules from protein interaction networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Pereira-Leal</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2004</pubdate>
            <volume>54</volume>
            <fpage>49</fpage>
            <lpage>57</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10505</pubid>
                  <pubid idtype="pmpid" link="fulltext">14705023</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Modular organization of cellular networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Rives</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Galitski</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>1128</fpage>
            <lpage>1133</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">298738</pubid>
                  <pubid idtype="pmpid" link="fulltext">12538875</pubid>
                  <pubid idtype="doi">10.1073/pnas.0237338100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>From molecular to modular cell biology.</p>
            </title>
            <aug>
               <au>
                  <snm>Hartwell</snm>
                  <fnm>LH</fnm>
               </au>
               <au>
                  <snm>Hopfield</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Leibler</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Murray</snm>
                  <fnm>AW</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>402</volume>
            <issue>Suppl</issue>
            <fpage>C47</fpage>
            <lpage>52</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1038/35011540</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Reconstitution of <it>Saccharomyces cerevisiae </it>prereplicative complex assembly <it>in vitro</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Kawasaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>HD</fnm>
               </au>
               <au>
                  <snm>Kojima</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Seki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sugino</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genes Cells</source>
            <pubdate>2006</pubdate>
            <volume>11</volume>
            <fpage>745</fpage>
            <lpage>756</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2443.2006.00975.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16824194</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Structural assignments to the <it>Mycoplasma genitalium </it>proteins show extensive gene duplications and domain rearrangements.</p>
            </title>
            <aug>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>14658</fpage>
            <lpage>14663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">24505</pubid>
                  <pubid idtype="pmpid" link="fulltext">9843945</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.25.14658</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Novel specificities emerge by step-wize duplication of functional modules.</p>
            </title>
            <aug>
               <au>
                  <snm>Pereira-Leal</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>552</fpage>
            <lpage>559</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1074369</pubid>
                  <pubid idtype="pmpid" link="fulltext">15805495</pubid>
                  <pubid idtype="doi">10.1101/gr.3102105</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Modeling of protein interaction networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Vazquez</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Flammini</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Maritan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vespignanni</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Complexus</source>
            <pubdate>2002</pubdate>
            <volume>1</volume>
            <fpage>38</fpage>
            <lpage>44</lpage>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Evolving protein interaction networks through gene duplication.</p>
            </title>
            <aug>
               <au>
                  <snm>Pastor-Satorras</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sole</snm>
                  <fnm>RV</fnm>
               </au>
            </aug>
            <source>J Theor Biol</source>
            <pubdate>2003</pubdate>
            <volume>222</volume>
            <fpage>199</fpage>
            <lpage>210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-5193(03)00028-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">12727455</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>An automated method for finding molecular complexes in large protein interaction networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Bader</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Hogue</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">149346</pubid>
                  <pubid idtype="pmpid" link="fulltext">12525261</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-4-2</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Predicting interactions in protein networks by completing defective cliques.</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Paccanaro</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Trifonov</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <fpage>823</fpage>
            <lpage>829</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl014</pubid>
                  <pubid idtype="pmpid" link="fulltext">16455753</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Collective dynamics of 'small-world' networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Watts</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Strogatz</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1998</pubdate>
            <volume>393</volume>
            <fpage>440</fpage>
            <lpage>442</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/30918</pubid>
                  <pubid idtype="pmpid" link="fulltext">9623998</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Structural symmetry and protein function.</p>
            </title>
            <aug>
               <au>
                  <snm>Goodsell</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Olson</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Annu Rev Biophys Biomol Struct</source>
            <pubdate>2000</pubdate>
            <volume>29</volume>
            <fpage>105</fpage>
            <lpage>153</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.biophys.29.1.105</pubid>
                  <pubid idtype="pmpid" link="fulltext">10940245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The power of two: protein dimerization in biology.</p>
            </title>
            <aug>
               <au>
                  <snm>Marianayagam</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Sunde</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Matthews</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2004</pubdate>
            <volume>29</volume>
            <fpage>618</fpage>
            <lpage>625</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tibs.2004.09.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15501681</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Domain swapping: entangling alliances between proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Bennett</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Choe</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1994</pubdate>
            <volume>91</volume>
            <fpage>3127</fpage>
            <lpage>3131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">43528</pubid>
                  <pubid idtype="pmpid" link="fulltext">8159715</pubid>
                  <pubid idtype="doi">10.1073/pnas.91.8.3127</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Structural similarity enhances interaction propensity of proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Lukatsky</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Shakhnovich</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Mintseris</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shakhnovich</snm>
                  <fnm>EI</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2007</pubdate>
            <volume>365</volume>
            <fpage>1596</fpage>
            <lpage>1606</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2006.11.020</pubid>
                  <pubid idtype="pmpid" link="fulltext">17141268</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Binding properties and evolution of homodimers in protein-protein interaction networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Ispolatov</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Yuryev</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mazo</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Maslov</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>3629</fpage>
            <lpage>3635</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1160523</pubid>
                  <pubid idtype="pmpid" link="fulltext">15983135</pubid>
                  <pubid idtype="doi">10.1093/nar/gki678</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Evolution of protein fold in the presence of functional constraints.</p>
            </title>
            <aug>
               <au>
                  <snm>Andreeva</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>399</fpage>
            <lpage>408</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.sbi.2006.04.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">16650981</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A common ancestor for oxygenic and anoxygenic photosynthetic systems: a comparison based on the structural model of photosystem I.</p>
            </title>
            <aug>
               <au>
                  <snm>Schubert</snm>
                  <fnm>WD</fnm>
               </au>
               <au>
                  <snm>Klukas</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Saenger</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Witt</snm>
                  <fnm>HT</fnm>
               </au>
               <au>
                  <snm>Fromme</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Krauss</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>280</volume>
            <fpage>297</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1998.1824</pubid>
                  <pubid idtype="pmpid" link="fulltext">9654453</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Evolution of photosystem I - from symmetry through pseudo-symmetry to asymmetry.</p>
            </title>
            <aug>
               <au>
                  <snm>Ben-Shem</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Frolow</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2004</pubdate>
            <volume>564</volume>
            <fpage>274</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0014-5793(04)00360-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">15111109</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The RecA hexamer is a structural homologue of ring helicases.</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Egelman</snm>
                  <fnm>EH</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>1997</pubdate>
            <volume>4</volume>
            <fpage>101</fpage>
            <lpage>104</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsb0297-101</pubid>
                  <pubid idtype="pmpid">9033586</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Wagner</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>1283</fpage>
            <lpage>1292</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11420367</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Some asymptotic properties of duplication graphs.</p>
            </title>
            <aug>
               <au>
                  <snm>Raval</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Phys Rev E Stat Nonlin Soft Matter Phys</source>
            <pubdate>2003</pubdate>
            <volume>68</volume>
            <fpage>066119</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14754281</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Infinite-order percolation and giant fluctuations in a protein interaction network.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Krapivsky</snm>
                  <fnm>PL</fnm>
               </au>
               <au>
                  <snm>Kahng</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Redner</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Phys Rev E Stat Nonlin Soft Matter Phys</source>
            <pubdate>2002</pubdate>
            <volume>66</volume>
            <fpage>055101</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12513542</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Shaping the nuclear action of NF-kappaB.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Greene</snm>
                  <fnm>WC</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>392</fpage>
            <lpage>401</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrm1368</pubid>
                  <pubid idtype="pmpid" link="fulltext">15122352</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Convergent evolution of gene networks by single-gene duplications in higher eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Amoutzias</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Bornberg-Bauer</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>EMBO Rep</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>274</fpage>
            <lpage>279</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1299007</pubid>
                  <pubid idtype="pmpid" link="fulltext">14968135</pubid>
                  <pubid idtype="doi">10.1038/sj.embor.7400096</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>PQS: a protein quaternary structure file server.</p>
            </title>
            <aug>
               <au>
                  <snm>Henrick</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>1998</pubdate>
            <volume>23</volume>
            <fpage>358</fpage>
            <lpage>361</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(98)01253-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">9787643</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Molecular architecture of the rotary motor in ATP synthase.</p>
            </title>
            <aug>
               <au>
                  <snm>Stock</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Leslie</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <fpage>1700</fpage>
            <lpage>1705</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.286.5445.1700</pubid>
                  <pubid idtype="pmpid" link="fulltext">10576729</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The RNA polymerase II machinery: structure illuminates function.</p>
            </title>
            <aug>
               <au>
                  <snm>Woychik</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Hampsey</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2002</pubdate>
            <volume>108</volume>
            <fpage>453</fpage>
            <lpage>463</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(02)00646-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">11909517</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases.</p>
            </title>
            <aug>
               <au>
                  <snm>Iyer</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>BMC Struct Biol</source>
            <pubdate>2003</pubdate>
            <volume>3</volume>
            <fpage>1</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151600</pubid>
                  <pubid idtype="pmpid" link="fulltext">12553882</pubid>
                  <pubid idtype="doi">10.1186/1472-6807-3-1</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs.</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Luscombe</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>HX</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Xia</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Bertin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chung</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1107</fpage>
            <lpage>1118</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419789</pubid>
                  <pubid idtype="pmpid" link="fulltext">15173116</pubid>
                  <pubid idtype="doi">10.1101/gr.1774904</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Recognition by Max of its cognate DNA through a dimeric b/HLH/Z domain.</p>
            </title>
            <aug>
               <au>
                  <snm>Ferre-D'Amare</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Prendergast</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Ziff</snm>
                  <fnm>EB</fnm>
               </au>
               <au>
                  <snm>Burley</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1993</pubdate>
            <volume>363</volume>
            <fpage>38</fpage>
            <lpage>45</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/363038a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">8479534</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>X-ray structures of Myc-Max and Mad-Max recognizing DNA. Molecular bases of regulation by proto-oncogenic transcription factors.</p>
            </title>
            <aug>
               <au>
                  <snm>Nair</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Burley</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2003</pubdate>
            <volume>112</volume>
            <fpage>193</fpage>
            <lpage>205</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(02)01284-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12553908</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>MIPS: a database for genomes and protein sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Mewes</snm>
                  <fnm>HW</fnm>
               </au>
               <au>
                  <snm>Frishman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Guldener</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Mannhaupt</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mokrejs</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Munsterkotter</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rudd</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Weil</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>31</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99165</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752246</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.31</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.</p>
            </title>
            <aug>
               <au>
                  <snm>Xenarios</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Salwinski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Duan</snm>
                  <fnm>XJ</fnm>
               </au>
               <au>
                  <snm>Higney</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>303</fpage>
            <lpage>305</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99070</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752321</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.303</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A protein interaction map of <it>Drosophila melanogaster</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Brouwer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Chaudhuri</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kuang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Ooi</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Godwin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Vitols</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>1727</fpage>
            <lpage>1736</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090289</pubid>
                  <pubid idtype="pmpid" link="fulltext">14605208</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A map of the interactome network of the metazoan <it>C. elegans</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Armstrong</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Bertin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Milstein</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Boxem</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vidalain</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Chesneau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hao</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>303</volume>
            <fpage>540</fpage>
            <lpage>543</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1698949</pubid>
                  <pubid idtype="pmpid" link="fulltext">14704431</pubid>
                  <pubid idtype="doi">10.1126/science.1091403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Functional organization of the yeast proteome by systematic analysis of protein complexes.</p>
            </title>
            <aug>
               <au>
                  <snm>Gavin</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Bosche</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Grandi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Marzioch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Michon</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Cruciat</snm>
                  <fnm>CM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>141</fpage>
            <lpage>147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415141a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11805826</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Systematic identification of protein complexes in <it>Saccharomyces cerevisiae </it>by mass spectrometry.</p>
            </title>
            <aug>
               <au>
                  <snm>Ho</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gruhler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Heilbut</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Millar</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Boutilier</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>180</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415180a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11805837</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>3D complex: a structural classification of protein complexes.</p>
            </title>
            <aug>
               <au>
                  <snm>Levy</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Pereira-Leal</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e155</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1636673</pubid>
                  <pubid idtype="pmpid" link="fulltext">17112313</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020155</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments.</p>
            </title>
            <aug>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>268</fpage>
            <lpage>272</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99153</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752312</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.268</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>The SUPERFAMILY database in 2004: additions and improvements.</p>
            </title>
            <aug>
               <au>
                  <snm>Madera</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vogel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kummerfeld</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32 (Database issue)</volume>
            <fpage>D235</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308851</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681402</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh117</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>SCOP database in 2004: refinements integrate structure and sequence family data.</p>
            </title>
            <aug>
               <au>
                  <snm>Andreeva</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Howorth</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32 (Database issue)</volume>
            <fpage>D226</fpage>
            <lpage>229</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308773</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681400</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh039</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Specificity and stability in topology of protein networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Maslov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sneppen</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>296</volume>
            <fpage>910</fpage>
            <lpage>913</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1065103</pubid>
                  <pubid idtype="pmpid" link="fulltext">11988575</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>The deep roots of eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Baldauf</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>300</volume>
            <fpage>1703</fpage>
            <lpage>1706</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1085544</pubid>
                  <pubid idtype="pmpid" link="fulltext">12805537</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Emergence of Modularity in Protein Interaction Networks</p>
            </title>
            <url>http://www.mrc-lmb.cam.ac.uk/genomes/jleal/modules/self.html</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Molecular architecture and functional model of the endocytic AP2 complex.</p>
            </title>
            <aug>
               <au>
                  <snm>Collins</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>McCoy</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>Owen</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2002</pubdate>
            <volume>109</volume>
            <fpage>523</fpage>
            <lpage>535</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(02)00735-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">12086608</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Loop closure and intersubunit communication in tryptophan synthase.</p>
            </title>
            <aug>
               <au>
                  <snm>Schneider</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Gerhardt</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Schlichting</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1998</pubdate>
            <volume>37</volume>
            <fpage>5394</fpage>
            <lpage>5406</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi9728957</pubid>
                  <pubid idtype="pmpid" link="fulltext">9548921</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Crystallographic refinement at 2.3 A resolution and refined model of the photosynthetic reaction centre from <it>Rhodopseudomonas viridis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Deisenhofer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Epp</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sinning</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Michel</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>246</volume>
            <fpage>429</fpage>
            <lpage>457</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1994.0097</pubid>
                  <pubid idtype="pmpid" link="fulltext">7877166</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>The structural basis of Arfaptin-mediated cross-talk between Rac and Arf signalling pathways.</p>
            </title>
            <aug>
               <au>
                  <snm>Tarricone</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Xiao</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Justin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Rittinger</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Gamblin</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>