<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1472-6807-6-7</ui>
   <ji>1472-6807</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Structural proteomics of minimal organisms: Conservation of protein fold usage and evolutionary implications</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Chandonia</snm>
               <fnm>John-Marc</fnm>
               <insr iid="I1"/>
               <email>JMChandonia@lbl.gov</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Kim</snm>
               <fnm>Sung-Hou</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>shkim@cchem.berkeley.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Chemistry, University of California, Berkeley, CA 94720, USA</p>
            </ins>
         </insg>
         <source>BMC Structural Biology</source>
         <issn>1472-6807</issn>
         <pubdate>2006</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>7</fpage>
         <url>http://www.biomedcentral.com/1472-6807/6/7</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16566839</pubid>
               <pubid idtype="doi">10.1186/1472-6807-6-7</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>20</day>
               <month>10</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>28</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Chandonia and Kim; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Determining the complete repertoire of protein structures for all soluble, globular proteins in a single organism has been one of the major goals of several structural genomics projects in recent years.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We report that this goal has nearly been reached for several "minimal organisms" &#8211; parasites or symbionts with reduced genomes &#8211; for which over 95% of the soluble, globular proteins may now be assigned folds, overall 3-D backbone structures. We analyze the structures of these proteins as they relate to cellular functions, and compare conservation of fold usage between functional categories. We also compare patterns in the conservation of folds among minimal organisms and those observed between minimal organisms and other bacteria.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We find that proteins performing essential cellular functions closely related to transcription and translation exhibit a higher degree of conservation in fold usage than proteins in other functional categories. Folds related to transcription and translation functional categories were also overrepresented in minimal organisms compared to other bacteria.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The availability of complete genome sequences opened up a new era in biology, providing a global and systems view of the range of genome sizes in different organisms, the presence or absence of genes involved in various cellular functions, the genes involved in particular cellular functions, and the relative abundance of different gene families. This new global view is creating major new areas of research such as functional genomics <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. At the time of this writing, over 224 prokaryotic genomes and over 22 complete eukaryotic genomes have been sequenced <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Just as the field of sequence genomics has yielded complete genome sequences for a variety of organisms, the field of structural genomics aims to provide structures for the complete array of biological macromolecules found in nature, <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. The first phase of structural genomics focused only on proteins (not RNAs), and has proven to be an efficient means of providing structural information for new protein families <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>After the first sequencing of a complete genome of <it>Haemophilus influenzae </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, some of the earliest subsequent genomes sequenced were from the "minimal organisms" <it>Mycoplasma genitalium </it>and <it>M. pneumoniae </it><abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Minimal organisms have been the subject of numerous experimental and computational genomic studies because of the possibility of identifying the minimal complement of genes necessary for sustaining life <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Because of their small size, organisms with minimal genomes have also been popular for structure and function prediction <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The minimal organisms <it>M. genitalium </it>(~486 protein-encoding genes) and <it>M. pneumoniae </it>(~690 genes) have also been the focus of structural genomics research at the Berkeley Structural Genomics Center <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>.</p>
         <p>Other minimal organisms that have been sequenced more recently include the aphid symbiont <it>Buchnera aphidicola </it>(~572 genes) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, the ant symbiont <it>Candidatus Blochmannia floridanus </it>(~583 genes) <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, the tsetse fly symbiont <it>Wigglesworthia glossinidia brevipalpis </it>(~612 genes) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, and the Whipple's disease parasite <it>Tropheryma whipplei </it>(~781 genes) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Comparative analysis of the first three symbiont genomes and <it>M. genitalium </it>has demonstrated that the symbionts are closely related, sharing 313 orthologous genes (51&#8211;55% of each genome), and that they share 179 genes with <it>M. genitalium </it><abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. However, a broader comparison of all five species, including <it>T. whipplei</it>, indicated significant variability in the functional repertoire of proteins in these organisms, suggesting that minimal genomes are not the result of a unique reductive evolutionary pathway, but the products of reductive evolution in specific environments <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
         <p>A recent survey of proteins from 238 complete genomes revealed that fold assignments (approximate 3-D backbone structures) can be made for the majority of non-membrane proteins of minimal organisms <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Statistically significant sequence similarity to a protein of known structure allows homology (evolutionary relatedness) to be inferred, thus enabling the fold of the homologous proteins to be assigned even in cases where the degree of sequence similarity is insufficiently high to allow accurate modeling <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
         <p>Fold assignment of a protein has implications for functional annotation, because the link between molecular function and structure is well known. Todd and colleagues showed that while the majority of superfamilies display variation in enzyme function (i.e., molecular function), the biochemical mechanisms (as represented by the Enzyme Commission [EC] number) are almost always conserved between proteins with 40% sequence identity or above <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. More recent work has shown that conserved domain combinations, or supradomains, are more likely to maintain a conserved molecular function even at lower sequence identity <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. A study in two proteomes (yeast and <it>Escherichia coli</it>) found clear tendencies for fold-function association across a broad range of molecular functions <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. The latter study also found the fold distributions in the two proteomes surveyed did not vary significantly from the average across all sequenced proteomes, although the study was based on fold assignments for less than 10% of the total number of proteins.</p>
         <p>We now report that recent efforts in structural biology and structural genomics have succeeded in enabling fold assignments for over ~90% of soluble, globular proteins in the five minimal organisms described above. In this report, we survey the classes of protein folds found in each organism, and examine the conservation in fold usage of proteins in several broad categories of cellular function. We find that the degree of conservation of fold usage varies among cellular functional categories, with the most conserved categories of proteins performing essential cellular functions closely related to transcription and translation. Finally, we compare the degree of conservation in cellular functions and fold usage among the five minimal organisms and <it>E. coli</it>, a non-minimal organism.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Near-complete coverage of soluble, globular proteomes of "minimal" organisms</p>
            </st>
            <p>In Table <tblr tid="T1">1</tblr>, we show the percentage of proteomes that may be assigned folds for five minimal organisms and for <it>E. coli</it>, an example of a well-studied organism that is not "minimal." For the minimal organisms considered in this study, nearly all proteins annotated as soluble and globular may be assigned to a known fold. The aphid symbiont <it>B. floridanus </it>has the highest coverage, at 96% of soluble, globular proteins (431 of 451 proteins). 58 of the remaining proteins in the proteome (10% of the proteome) have unknown structure, but are predicted to have at least one transmembrane helix. 3 additional proteins have unknown structure and no predicted transmembrane helices, but 20% or more of their residues are in predicted low complexity or coiled coil regions, and thus not easily tractable in experimental structural studies. Overall, the folds of 502 of 583 <it>B. floridanus </it>proteins (86%) may be annotated by sequence similarity to a protein of known structure. Other minimal organisms also have high structural coverage: 95% of soluble, globular <it>W. glossinidia </it>proteins, 94% of soluble, globular <it>B. aphidicola </it>proteins, 87% of soluble, globular <it>M. genitalium </it>proteins, and 87% of soluble, globular <it>T. whipplei </it>proteins can reliably be assigned folds. In contrast, only 78% of soluble, globular <it>E. coli </it>proteins can reliably be assigned folds. The low numbers of predicted transmembrane proteins in several of the minimal organisms (e.g., only 87 of 572 <it>B. aphidicola </it>proteins) is also notable; previous analyses suggest that some transmembrane proteins (e.g., proteins with a role in cell defense or transporters of diverse nutritional sources) are less important to intracellular symbiotes than to free-living bacteria <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Status of near-complete structural proteomes as of 22 February 2005. How many proteins may be assigned folds in near-complete proteomes? The status for five near-complete prokaryotes are shown. <it>E. coli</it>, a well-studied bacteria that is not considered a minimal organism, is included for comparison.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Organism</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Total # of proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># of soluble, globular proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># of soluble, non-globular proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># of membrane proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># of folds assigned</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>% folds assigned (of total)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>% folds assigned(of soluble, globular)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># of remaining soluble, globular proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># of remaining soluble, non-globular proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b># of remaining membrane proteins</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Candidatus Blochmannia floridanus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>583</p>
                     </c>
                     <c ca="center">
                        <p>451</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>502</p>
                     </c>
                     <c ca="center">
                        <p>86.1%</p>
                     </c>
                     <c ca="center">
                        <p>95.6%</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Wigglesworthia glossinidia brevipalpis</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>612</p>
                     </c>
                     <c ca="center">
                        <p>536</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>217</p>
                     </c>
                     <c ca="center">
                        <p>508</p>
                     </c>
                     <c ca="center">
                        <p>83.0%</p>
                     </c>
                     <c ca="center">
                        <p>94.8%</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Buchnera aphidicola (subsp. Acyrthosiphon pisum)</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>572</p>
                     </c>
                     <c ca="center">
                        <p>446</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>495</p>
                     </c>
                     <c ca="center">
                        <p>86.5%</p>
                     </c>
                     <c ca="center">
                        <p>94.4%</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>43</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mycoplasma genitalium</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>486</p>
                     </c>
                     <c ca="center">
                        <p>341</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>111</p>
                     </c>
                     <c ca="center">
                        <p>350</p>
                     </c>
                     <c ca="center">
                        <p>72.0%</p>
                     </c>
                     <c ca="center">
                        <p>87.1%</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Tropheryma whipplei (strain TW08/27)</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>781</p>
                     </c>
                     <c ca="center">
                        <p>430</p>
                     </c>
                     <c ca="center">
                        <p>55</p>
                     </c>
                     <c ca="center">
                        <p>127</p>
                     </c>
                     <c ca="center">
                        <p>556</p>
                     </c>
                     <c ca="center">
                        <p>71.2%</p>
                     </c>
                     <c ca="center">
                        <p>87.0%</p>
                     </c>
                     <c ca="center">
                        <p>56</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>154</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Escherichia coli</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>4338</p>
                     </c>
                     <c ca="center">
                        <p>3130</p>
                     </c>
                     <c ca="center">
                        <p>146</p>
                     </c>
                     <c ca="center">
                        <p>1062</p>
                     </c>
                     <c ca="center">
                        <p>2945</p>
                     </c>
                     <c ca="center">
                        <p>67.9%</p>
                     </c>
                     <c ca="center">
                        <p>78.0%</p>
                     </c>
                     <c ca="center">
                        <p>688</p>
                     </c>
                     <c ca="center">
                        <p>76</p>
                     </c>
                     <c ca="center">
                        <p>629</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>&#945;/&#946; fold class is the most common category of fold</p>
            </st>
            <p>For the proteins that could be reliably assigned folds, we examined their structural classification in the SCOP database <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. SCOP is a widely used, manually curated database in which protein structures are divided into domains, which are classified in a hierarchy indicating different types of structural and evolutionary relationships between the domains. Domains classified together in a single "family" or "superfamily" are hypothesized to have a common evolutionary origin on the basis of sequence or structural evidence. Superfamilies that share similar secondary structural features and topology, but for which there is little or no evidence to suggest a common evolutionary origin, are classified together at the "fold" level. SCOP folds are grouped together in seven major "classes" (all-&#945;, all-&#946;, &#945;/&#946;, &#945;+&#946;, multi-domain, membrane, and small), based on common physical characteristics such as the predominant type of secondary structure or the order of connection of the different secondary structures (Figure <figr fid="F1">1</figr>). Note that the SCOP "multi-domain" class encompasses folds that are comprised of multiple domains that individually would belong to different classes; individual domains from multi-domain proteins are not classified in the "multi-domain" class. Although we use the term "fold" to refer to a protein's overall 3D backbone structure, we use the term "SCOP fold" to refer to a specific fold classification within the SCOP database.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Four major SCOP classes</p>
               </caption>
               <text>
                  <p><b>Four major SCOP classes</b>. The predominant form of secondary structure in each of the first four SCOP classes is shown. Alpha helices are shown as red cylinders, and beta strands as yellow ribbons.</p>
               </text>
               <graphic file="1472-6807-6-7-1"/>
            </fig>
            <p>The fraction of proteins found in each organism belonging to each of these SCOP classes is shown in Figure <figr fid="F2">2</figr>. Those proteins that could not reliably be assigned folds, and those that were assigned a fold based on homology to a protein not yet classified in SCOP, are described as "Unsolved" and "Unclassified," respectively. For all organisms, the highest proportion of SCOP folds are in the &#945;/&#946; class, and those in the &#945;/&#946; and &#945;+&#946; classes together comprise over half of the assigned SCOP folds. This reflects the observation that the &#945;/&#946; class contains some of the most functionally diverse "superfolds" that act as scaffolds for a wide array of molecular or chemical functions <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>SCOP class distribution in near-complete proteomes</p>
               </caption>
               <text>
                  <p><b>SCOP class distribution in near-complete proteomes</b>. The fraction of domains in each proteome belonging to each of the first 7 SCOP classes is shown. "Unclassified" domains are from proteins annotated as homologous to a known structure using Pfam, but not classified in one of the first 7 classes of SCOP (e.g., due to being in a superfamily solved since the SCOP cutoff date of 15 May 2004). "Unsolved" domains are from proteins not annotated as homologous to a known structure. For statistical analysis, each ORF in the latter two categories was treated as containing exactly one domain. "Unsolved" domains are further divided into three categories based on predicted tractability in high-throughput experiments: "Unsolved, TM" are predicted to contain at least one transmembrane helix, "Unsolved, LCCC" have no predicted transmembrane helices but at least 20% of the sequence in low complexity or coiled coil regions, and "Unsolved, Soluble Globular" are predicted to be tractable in high-throughput experiments due to having neither of these features.</p>
               </text>
               <graphic file="1472-6807-6-7-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Usage of protein fold classes are conserved for key cellular processes</p>
            </st>
            <p>In order to analyze how the annotated cellular function of each protein correlates with its structure, we examined the "functional role" annotation for each protein as provided in the TIGR database <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. We found that the distribution of proteins among SCOP fold classes was highly conserved within some roles and showed much more variability in others.</p>
            <p>Figure <figr fid="F3">3</figr> shows the fold class distribution of proteins in the "Protein Synthesis" functional category across all 6 proteomes. The fraction of these proteins in each structural class shows little variability, with no more than a 4% difference between proteomes. Furthermore, the proteins in this functional category comprise a relatively large fraction of the proteins in each proteome (99 proteins on average, or 8% of the proteome). The extremely low variability is consistent with the idea that these proteins have been fundamental part of cellular biochemistry since early evolution, and are thus essential to any organism regardless of its environment.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>SCOP class distribution of proteins with "Protein Synthesis" function</p>
               </caption>
               <text>
                  <p><b>SCOP class distribution of proteins with "Protein Synthesis" function</b>. The fraction of domains in each proteome from the TIGR role category "Protein Synthesis" belonging to each of the first 7 SCOP classes is shown. "Unclassified" and "Unsolved" domains were counted as described in Figure 1.</p>
               </text>
               <graphic file="1472-6807-6-7-3"/>
            </fig>
            <p>In contrast, Figure <figr fid="F4">4</figr> shows the fold class distribution of proteins in the "Cell Envelope" functional category across all 6 proteomes. This functional category is also highly represented in each proteome (73.8 proteins on average), but the proteins show a much higher degree of variation in fold usage. This category contains the highest proportion of unassigned folds, as well as a diverse array of assigned SCOP folds: for example, 6% and 4% of domains from <it>W. glossinidia </it>and <it>E. coli </it>cell envelope proteins belong to the all-&#945; structural class, while cell envelope proteins from the other proteomes contain few or no all-&#945; structures. <it>E. coli </it>also contains a number of solved transmembrane structures, while other proteomes contain significant numbers of proteins predicted to be transmembrane proteins not detectably homologous to any protein with a known 3D structure. <it>M. genitalium </it>and <it>T. whipplei </it>contain the largest fractions of cell envelope proteins that could not be reliably assigned a fold at this time, although most of these <it>M. genitalium </it>proteins are expected to be soluble and globular, while the majority of such proteins from <it>T. whipplei </it>are predicted to contain at least one transmembrane helix. The high amount of variability suggests that proteins in the "Cell Envelope" category evolve rapidly in response to specific pressures in an organism's environment, and different sets of these proteins remain after reductive evolution in the different environments occupied by the different species of minimal organisms.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>SCOP class distribution of proteins with "Cell Envelope" function</p>
               </caption>
               <text>
                  <p><b>SCOP class distribution of proteins with "Cell Envelope" function</b>. The fraction of domains in each proteome from the TIGR role category "Cell Envelope" belonging to each of the first 7 SCOP classes is shown. "Unclassified" and "Unsolved" domains were counted as described in Figure 1.</p>
               </text>
               <graphic file="1472-6807-6-7-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Cellular functions with most conserved SCOP fold usage</p>
            </st>
            <p>Previous comparative sequence genomic analyses of symbionts have shown that the number of proteins in most cellular function categories varies little between symbiont proteomes, and that many of the most highly conserved proteins have cellular functions related to information storage and processing, particularly translation and ribosomal structure <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. We calculated the coefficient of variation (CV) in the number of proteins in each functional role category (N<sub>1 </sub>for the first species, N<sub>2 </sub>for the second species, etc.), as shown in Equation 1.</p>
            <p>
               <m:math name="1472-6807-6-7-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>C</m:mi>
                        <m:msub>
                           <m:mi>V</m:mi>
                           <m:mrow>
                              <m:mi>s</m:mi>
                              <m:mi>e</m:mi>
                              <m:mi>q</m:mi>
                              <m:mi>u</m:mi>
                              <m:mi>e</m:mi>
                              <m:mi>n</m:mi>
                              <m:mi>c</m:mi>
                              <m:mi>e</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mi>S</m:mi>
                              <m:mi>t</m:mi>
                              <m:mi>d</m:mi>
                              <m:mi>e</m:mi>
                              <m:mi>v</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>N</m:mi>
                                 <m:mn>1</m:mn>
                              </m:msub>
                              <m:mo>&#8230;</m:mo>
                              <m:msub>
                                 <m:mi>N</m:mi>
                                 <m:mn>6</m:mn>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>M</m:mi>
                              <m:mi>e</m:mi>
                              <m:mi>a</m:mi>
                              <m:mi>n</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>N</m:mi>
                                 <m:mn>1</m:mn>
                              </m:msub>
                              <m:mo>&#8230;</m:mo>
                              <m:msub>
                                 <m:mi>N</m:mi>
                                 <m:mn>6</m:mn>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mfrac>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqcqWGwbGvdaWgaaWcbaGaem4CamNaemyzauMaemyCaeNaemyDauNaemyzauMaemOBa4Maem4yamMaemyzaugabeaakiabg2da9maalaaabaGaem4uamLaemiDaqNaemizaqMaemyzauMaemODayNaeiikaGIaemOta40aaSbaaSqaaiabigdaXaqabaGccqWIMaYscqWGobGtdaWgaaWcbaGaeGOnaydabeaakiabcMcaPaqaaiabd2eanjabdwgaLjabdggaHjabd6gaUjabcIcaOiabd6eaonaaBaaaleaacqaIXaqmaeqaaOGaeSOjGSKaemOta40aaSbaaSqaaiabiAda2aqabaGccqGGPaqkaaGaaCzcaiaaxMaadaqadaqaaiabigdaXaGaayjkaiaawMcaaaaa@59BA@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Results are shown in Table <tblr tid="T2">2</tblr>. As expected, the category with the lowest variation in the number of proteins is "Protein synthesis," and the top three categories are all closely related to transcription or translation.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Variation within functional categories based on sequence and structure. Which functional categories show the most variation in fold usage between organisms? The first column lists 17 TIGR cellular function categories, and an additional category composed of all proteins in each proteome. The "fold-based variation" column is based on a calculation of the coefficient of variation in the number of structurally characterized domains in each functional role in each of the first 7 SCOP classes (all-&#945;, all-&#946;, &#945;/&#946;, &#945;+&#946;, multi- domain, membrane, small). As described in Equation 2, the coefficient of variation is calculated separately for each of the 7 classes, and then averaged across all 7 classes to produce CV<sub>structure</sub>. The "sequence-based variation" column gives the coefficient of variation in the number of proteins in each category (CV<sub>sequence</sub>, Equation 1). The "fold-based rank" and "sequenced-based rank" show the ranking of functional categories based on the amount of fold-based and sequence-based variation, from lowest amount of variation to the highest. Cellular function categories are ordered in the table according to their fold-based rank.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Category</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Average # of Proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Fold-based variation (CV<sub>structure</sub>)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sequence-based variation (CV<sub>sequence</sub>)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Fold-based Rank/Sequence-based Rank</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein synthesis</p>
                     </c>
                     <c ca="center">
                        <p>99.0</p>
                     </c>
                     <c ca="center">
                        <p>0.141</p>
                     </c>
                     <c ca="center">
                        <p>0.100</p>
                     </c>
                     <c ca="center">
                        <p>1/1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transcription</p>
                     </c>
                     <c ca="center">
                        <p>20.8</p>
                     </c>
                     <c ca="center">
                        <p>0.286</p>
                     </c>
                     <c ca="center">
                        <p>0.409</p>
                     </c>
                     <c ca="center">
                        <p>2/2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Purines, pyrimidines, nucleosides, and nucleotides</p>
                     </c>
                     <c ca="center">
                        <p>36.8</p>
                     </c>
                     <c ca="center">
                        <p>0.462</p>
                     </c>
                     <c ca="center">
                        <p>0.570</p>
                     </c>
                     <c ca="center">
                        <p>3/3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DNA metabolism</p>
                     </c>
                     <c ca="center">
                        <p>46.8</p>
                     </c>
                     <c ca="center">
                        <p>0.586</p>
                     </c>
                     <c ca="center">
                        <p>0.753</p>
                     </c>
                     <c ca="center">
                        <p>4/6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein fate</p>
                     </c>
                     <c ca="center">
                        <p>48.3</p>
                     </c>
                     <c ca="center">
                        <p>0.731</p>
                     </c>
                     <c ca="center">
                        <p>0.723</p>
                     </c>
                     <c ca="center">
                        <p>5/4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Amino acid biosynthesis</p>
                     </c>
                     <c ca="center">
                        <p>44.7</p>
                     </c>
                     <c ca="center">
                        <p>0.935</p>
                     </c>
                     <c ca="center">
                        <p>0.972</p>
                     </c>
                     <c ca="center">
                        <p>6/8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>All Proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1228.7</p>
                     </c>
                     <c ca="center">
                        <p>1.061</p>
                     </c>
                     <c ca="center">
                        <p>1.242</p>
                     </c>
                     <c ca="center">
                        <p>7/12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cell envelope</p>
                     </c>
                     <c ca="center">
                        <p>73.8</p>
                     </c>
                     <c ca="center">
                        <p>1.099</p>
                     </c>
                     <c ca="center">
                        <p>0.971</p>
                     </c>
                     <c ca="center">
                        <p>8/7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Central intermediary metabolism</p>
                     </c>
                     <c ca="center">
                        <p>27.5</p>
                     </c>
                     <c ca="center">
                        <p>1.228</p>
                     </c>
                     <c ca="center">
                        <p>1.113</p>
                     </c>
                     <c ca="center">
                        <p>9/10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Energy metabolism</p>
                     </c>
                     <c ca="center">
                        <p>116.7</p>
                     </c>
                     <c ca="center">
                        <p>1.276</p>
                     </c>
                     <c ca="center">
                        <p>1.220</p>
                     </c>
                     <c ca="center">
                        <p>10/11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Fatty acid and phospholipid metabolism</p>
                     </c>
                     <c ca="center">
                        <p>20.0</p>
                     </c>
                     <c ca="center">
                        <p>1.328</p>
                     </c>
                     <c ca="center">
                        <p>1.014</p>
                     </c>
                     <c ca="center">
                        <p>11/9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Biosynthesis of cofactors, prosthetic groups, and carriers</p>
                     </c>
                     <c ca="center">
                        <p>50.3</p>
                     </c>
                     <c ca="center">
                        <p>1.332</p>
                     </c>
                     <c ca="center">
                        <p>0.731</p>
                     </c>
                     <c ca="center">
                        <p>12/5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cellular processes</p>
                     </c>
                     <c ca="center">
                        <p>62.0</p>
                     </c>
                     <c ca="center">
                        <p>1.364</p>
                     </c>
                     <c ca="center">
                        <p>1.301</p>
                     </c>
                     <c ca="center">
                        <p>13/13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Regulatory functions</p>
                     </c>
                     <c ca="center">
                        <p>34.5</p>
                     </c>
                     <c ca="center">
                        <p>1.427</p>
                     </c>
                     <c ca="center">
                        <p>1.940</p>
                     </c>
                     <c ca="center">
                        <p>14/18</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Unknown function</p>
                     </c>
                     <c ca="center">
                        <p>115.6</p>
                     </c>
                     <c ca="center">
                        <p>1.659</p>
                     </c>
                     <c ca="center">
                        <p>1.865</p>
                     </c>
                     <c ca="center">
                        <p>15/17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transport and binding proteins</p>
                     </c>
                     <c ca="center">
                        <p>81.8</p>
                     </c>
                     <c ca="center">
                        <p>1.809</p>
                     </c>
                     <c ca="center">
                        <p>1.638</p>
                     </c>
                     <c ca="center">
                        <p>16/15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical proteins</p>
                     </c>
                     <c ca="center">
                        <p>205.8</p>
                     </c>
                     <c ca="center">
                        <p>1.984</p>
                     </c>
                     <c ca="center">
                        <p>1.631</p>
                     </c>
                     <c ca="center">
                        <p>17/14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Unclassified</p>
                     </c>
                     <c ca="center">
                        <p>118.0</p>
                     </c>
                     <c ca="center">
                        <p>2.020</p>
                     </c>
                     <c ca="center">
                        <p>1.835</p>
                     </c>
                     <c ca="center">
                        <p>18/16</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>We also calculated the coefficient of variation in the number of protein domains assigned to each SCOP class (N<sub>1,all-&#945; </sub>for the first species in the all-&#945; class, N<sub>2,all-&#945; </sub>for the second species in the all-&#945; class, etc), then averaged that data across all 7 structural classes, as shown in Equation 2.</p>
            <p>
               <m:math name="1472-6807-6-7-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>C</m:mi>
                        <m:msub>
                           <m:mi>V</m:mi>
                           <m:mrow>
                              <m:mi>s</m:mi>
                              <m:mi>t</m:mi>
                              <m:mi>r</m:mi>
                              <m:mi>u</m:mi>
                              <m:mi>c</m:mi>
                              <m:mi>t</m:mi>
                              <m:mi>u</m:mi>
                              <m:mi>r</m:mi>
                              <m:mi>e</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:msubsup>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>c</m:mi>
                                       <m:mi>l</m:mi>
                                       <m:mi>a</m:mi>
                                       <m:mi>s</m:mi>
                                       <m:mi>s</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mn>7</m:mn>
                                 </m:msubsup>
                                 <m:mrow>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:mi>s</m:mi>
                                          <m:mi>t</m:mi>
                                          <m:mi>d</m:mi>
                                          <m:mi>e</m:mi>
                                          <m:mi>v</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>N</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mo>,</m:mo>
                                                <m:mi>c</m:mi>
                                                <m:mi>l</m:mi>
                                                <m:mi>a</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>s</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>&#8230;</m:mo>
                                          <m:msub>
                                             <m:mi>N</m:mi>
                                             <m:mrow>
                                                <m:mn>6</m:mn>
                                                <m:mo>,</m:mo>
                                                <m:mi>c</m:mi>
                                                <m:mi>l</m:mi>
                                                <m:mi>a</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>s</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>M</m:mi>
                                          <m:mi>e</m:mi>
                                          <m:mi>a</m:mi>
                                          <m:mi>n</m:mi>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mi>N</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mo>,</m:mo>
                                                <m:mi>c</m:mi>
                                                <m:mi>l</m:mi>
                                                <m:mi>a</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>s</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>&#8230;</m:mo>
                                          <m:msub>
                                             <m:mi>N</m:mi>
                                             <m:mrow>
                                                <m:mn>6</m:mn>
                                                <m:mo>,</m:mo>
                                                <m:mi>c</m:mi>
                                                <m:mi>l</m:mi>
                                                <m:mi>a</m:mi>
                                                <m:mi>s</m:mi>
                                                <m:mi>s</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                    </m:mfrac>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:mn>7</m:mn>
                        </m:mfrac>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqcqWGwbGvdaWgaaWcbaGaem4CamNaemiDaqNaemOCaiNaemyDauNaem4yamMaemiDaqNaemyDauNaemOCaiNaemyzaugabeaakiabg2da9maalaaabaWaaabmaeaadaWcaaqaaiabdohaZjabdsha0jabdsgaKjabdwgaLjabdAha2jabcIcaOiabd6eaonaaBaaaleaacqaIXaqmcqGGSaalcqWGJbWycqWGSbaBcqWGHbqycqWGZbWCcqWGZbWCaeqaaOGaeSOjGSKaemOta40aaSbaaSqaaiabiAda2iabcYcaSiabdogaJjabdYgaSjabdggaHjabdohaZjabdohaZbqabaGccqGGPaqkaeaacqWGnbqtcqWGLbqzcqWGHbqycqWGUbGBcqGGOaakcqWGobGtdaWgaaWcbaGaeGymaeJaeiilaWIaem4yamMaemiBaWMaemyyaeMaem4CamNaem4CamhabeaakiablAciljabd6eaonaaBaaaleaacqaI2aGncqGGSaalcqWGJbWycqWGSbaBcqWGHbqycqWGZbWCcqWGZbWCaeqaaOGaeiykaKcaaaWcbaGaem4yamMaemiBaWMaemyyaeMaem4CamNaem4CamNaeyypa0JaeGymaedabaGaeG4naCdaniabggHiLdaakeaacqaI3aWnaaGaaCzcaiaaxMaadaqadaqaaiabikdaYaGaayjkaiaawMcaaaaa@877C@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>CV<sub>structure </sub>was calculated separately for each functional role category, and these data are shown in Table <tblr tid="T2">2</tblr> and Figure <figr fid="F5">5A</figr>. The functional category with the lowest variation in the number of domains in each structural class is "Protein Synthesis," as would be expected from Figure <figr fid="F3">3</figr>. However, there are some interesting differences between the rankings based only on the CV<sub>sequence</sub>, and the rankings based on CV<sub>structure</sub>. For example, fold usage of proteins involved in biosynthesis of cofactors, carriers, and prosthetic groups varies to a higher degree than the variation in total numbers of these proteins in each proteome. This implies that the repertoire of specific functions in this broad category is specialized to the particular needs of each organism, even though the overall number of such proteins varies little. As expected, the distribution of structures in "catch-all" classes such as hypothetical and unclassified proteins are more varied than the distribution of structures found in more well-defined functional categories.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Variation in fold usage between organisms differs between functional categories</p>
               </caption>
               <text>
                  <p><b>Variation in fold usage between organisms differs between functional categories</b>. A) Variation in fold usage (CV<sub>structure</sub>) between organisms within each TIGR role category is shown for each category that represents a cellular function. The data are also given in the "fold-based variation" column in Table 2. B) Variation in fold usage between minimal organisms only, excluding <it>E. coli </it>data as per Table 3.</p>
               </text>
               <graphic file="1472-6807-6-7-5"/>
            </fig>
            <p>We also analyzed the degree of variation using data from only the five near-complete minimal organisms, excluding data from <it>E. coli</it>. Results are shown in Table <tblr tid="T3">3</tblr> and Figure <figr fid="F5">5B</figr>. As before, fold usage of proteins in the "protein synthesis" category shows the least variance of all functional categories. The total genome size also slows relatively little variation among minimal organisms, as has been observed previously <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. However, some functional categories show relatively more variation between minimal organisms than between minimal organisms and <it>E. coli</it>. For example, the cellular function categories "Cell envelope," "Central intermediary metabolism," and "Amino Acid Biosynthesis" all drop in rank (the relative degree of conservation in fold usage among functional categories) by 7 positions relative to Table <tblr tid="T2">2</tblr>, indicating higher diversity of folds in these functional categories among minimal organisms. In contrast, fold usage of proteins in the "Regulatory functions" category shows relatively less variation among minimal organisms than between minimal organisms and <it>E. coli</it>. This suggests that although the minimal organisms have lost many of the regulatory pathways unnecessary for survival in their relatively unchanging environments, they maintain a relatively conserved set of proteins responsible for common regulatory functions. A more thorough phylogenetic analysis of these proteins would be necessary to test this hypothesis.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Variation within functional categories in minimal organisms. Which functional categories show the most variation in fold usage between minimal organisms? The data are calculated as in Table 2, but ignore data from <it>E. coli</it>. The structure-based variation when <it>E coli </it>data are included (from Table 2) is provided for comparison.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Category</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Average # of Proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Fold-based variation (CV<sub>structure</sub>)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Fold-based variation, including <it>E. coli</it></b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sequence-based variation(CV<sub>sequence</sub>)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Fold-based Rank/Sequence-based Rank</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein synthesis</p>
                     </c>
                     <c ca="center">
                        <p>95.2</p>
                     </c>
                     <c ca="center">
                        <p>0.108</p>
                     </c>
                     <c ca="center">
                        <p>0.141</p>
                     </c>
                     <c ca="center">
                        <p>0.039</p>
                     </c>
                     <c ca="center">
                        <p>1/1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transcription</p>
                     </c>
                     <c ca="center">
                        <p>17.6</p>
                     </c>
                     <c ca="center">
                        <p>0.200</p>
                     </c>
                     <c ca="center">
                        <p>0.286</p>
                     </c>
                     <c ca="center">
                        <p>0.199</p>
                     </c>
                     <c ca="center">
                        <p>2/3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>All Proteins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>606.8</p>
                     </c>
                     <c ca="center">
                        <p>0.210</p>
                     </c>
                     <c ca="center">
                        <p>1.061</p>
                     </c>
                     <c ca="center">
                        <p>0.178</p>
                     </c>
                     <c ca="center">
                        <p>3/2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DNA metabolism</p>
                     </c>
                     <c ca="center">
                        <p>33.0</p>
                     </c>
                     <c ca="center">
                        <p>0.314</p>
                     </c>
                     <c ca="center">
                        <p>0.586</p>
                     </c>
                     <c ca="center">
                        <p>0.328</p>
                     </c>
                     <c ca="center">
                        <p>4/6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Fatty acid and phospholipid metabolism</p>
                     </c>
                     <c ca="center">
                        <p>12.0</p>
                     </c>
                     <c ca="center">
                        <p>0.358</p>
                     </c>
                     <c ca="center">
                        <p>1.328</p>
                     </c>
                     <c ca="center">
                        <p>0.486</p>
                     </c>
                     <c ca="center">
                        <p>5/9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Regulatory functions</p>
                     </c>
                     <c ca="center">
                        <p>7.2</p>
                     </c>
                     <c ca="center">
                        <p>0.402</p>
                     </c>
                     <c ca="center">
                        <p>1.427</p>
                     </c>
                     <c ca="center">
                        <p>0.465</p>
                     </c>
                     <c ca="center">
                        <p>6/8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Purines, pyrimidines, nucleosides, and nucleotides</p>
                     </c>
                     <c ca="center">
                        <p>28.8</p>
                     </c>
                     <c ca="center">
                        <p>0.405</p>
                     </c>
                     <c ca="center">
                        <p>0.462</p>
                     </c>
                     <c ca="center">
                        <p>0.284</p>
                     </c>
                     <c ca="center">
                        <p>7/4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein fate</p>
                     </c>
                     <c ca="center">
                        <p>34.6</p>
                     </c>
                     <c ca="center">
                        <p>0.560</p>
                     </c>
                     <c ca="center">
                        <p>0.731</p>
                     </c>
                     <c ca="center">
                        <p>0.303</p>
                     </c>
                     <c ca="center">
                        <p>8/5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Unknown function</p>
                     </c>
                     <c ca="center">
                        <p>27.8</p>
                     </c>
                     <c ca="center">
                        <p>0.776</p>
                     </c>
                     <c ca="center">
                        <p>1.659</p>
                     </c>
                     <c ca="center">
                        <p>0.555</p>
                     </c>
                     <c ca="center">
                        <p>9/12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Transport and binding proteins</p>
                     </c>
                     <c ca="center">
                        <p>27.4</p>
                     </c>
                     <c ca="center">
                        <p>0.796</p>
                     </c>
                     <c ca="center">
                        <p>1.809</p>
                     </c>
                     <c ca="center">
                        <p>0.574</p>
                     </c>
                     <c ca="center">
                        <p>10/13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Energy metabolism</p>
                     </c>
                     <c ca="center">
                        <p>59.4</p>
                     </c>
                     <c ca="center">
                        <p>0.799</p>
                     </c>
                     <c ca="center">
                        <p>1.276</p>
                     </c>
                     <c ca="center">
                        <p>0.454</p>
                     </c>
                     <c ca="center">
                        <p>11/7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Biosynthesis of cofactors, prosthetic groups, and carriers</p>
                     </c>
                     <c ca="center">
                        <p>38.6</p>
                     </c>
                     <c ca="center">
                        <p>0.816</p>
                     </c>
                     <c ca="center">
                        <p>1.332</p>
                     </c>
                     <c ca="center">
                        <p>0.666</p>
                     </c>
                     <c ca="center">
                        <p>12/15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Amino acid biosynthesis</p>
                     </c>
                     <c ca="center">
                        <p>29.0</p>
                     </c>
                     <c ca="center">
                        <p>0.844</p>
                     </c>
                     <c ca="center">
                        <p>0.935</p>
                     </c>
                     <c ca="center">
                        <p>0.782</p>
                     </c>
                     <c ca="center">
                        <p>13/17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cellular processes</p>
                     </c>
                     <c ca="center">
                        <p>29.8</p>
                     </c>
                     <c ca="center">
                        <p>0.853</p>
                     </c>
                     <c ca="center">
                        <p>1.364</p>
                     </c>
                     <c ca="center">
                        <p>0.636</p>
                     </c>
                     <c ca="center">
                        <p>14/14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cell envelope</p>
                     </c>
                     <c ca="center">
                        <p>45.8</p>
                     </c>
                     <c ca="center">
                        <p>0.893</p>
                     </c>
                     <c ca="center">
                        <p>1.099</p>
                     </c>
                     <c ca="center">
                        <p>0.506</p>
                     </c>
                     <c ca="center">
                        <p>15/10</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Central intermediary metabolism</p>
                     </c>
                     <c ca="center">
                        <p>15.4</p>
                     </c>
                     <c ca="center">
                        <p>0.952</p>
                     </c>
                     <c ca="center">
                        <p>1.228</p>
                     </c>
                     <c ca="center">
                        <p>0.552</p>
                     </c>
                     <c ca="center">
                        <p>16/11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Unclassified</p>
                     </c>
                     <c ca="center">
                        <p>30.0</p>
                     </c>
                     <c ca="center">
                        <p>1.006</p>
                     </c>
                     <c ca="center">
                        <p>2.020</p>
                     </c>
                     <c ca="center">
                        <p>0.749</p>
                     </c>
                     <c ca="center">
                        <p>17/16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical proteins</p>
                     </c>
                     <c ca="center">
                        <p>70.6</p>
                     </c>
                     <c ca="center">
                        <p>1.125</p>
                     </c>
                     <c ca="center">
                        <p>1.984</p>
                     </c>
                     <c ca="center">
                        <p>0.871</p>
                     </c>
                     <c ca="center">
                        <p>18/18</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Common and overrepresented folds in minimal organisms</p>
            </st>
            <p>We examined the most common protein folds (as defined in SCOP 1.67) in minimal organisms. Results are shown in Table <tblr tid="T4">4</tblr>. Four of the eleven most common SCOP folds (TIM barrel, nucleoside triphosphate hydrolase, flavodoxin-like, and ferredoxin-like) are among the nine superfolds originally described by Orengo and colleagues as scaffolds that can support a wide array of molecular functions <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. However, all have fewer copies in minimal organisms than are found in <it>E. coli</it>.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Most common SCOP folds in minimal organisms. Which SCOP folds are most common in minimal organisms? The first column gives the name and SCOP sccs identifier for folds classified in SCOP 1.67. The second column gives the total number of domains assigned to each fold among the five minimal organisms. The third column is calculated as the average number of domains among the five minimal organisms studied that were assigned to each fold, divided by the number of domains in <it>E. coli </it>assigned to the same fold.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Fold Name</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Number</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Ratio</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P-loop containing nucleoside triphosphate hydrolases (c.37)</p>
                     </c>
                     <c ca="center">
                        <p>319</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TIM beta/alpha-barrel (c.1)</p>
                     </c>
                     <c ca="center">
                        <p>115</p>
                     </c>
                     <c ca="center">
                        <p>0.14</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OB (Oligonucleotide/oligosaccharide-binding) fold (b.40)</p>
                     </c>
                     <c ca="center">
                        <p>108</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ferredoxin-like (d.58)</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>0.15</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Adenine nucleotide alpha hydrolase-like (c.26)</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ribonuclease H-like motif (c.55)</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>0.16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>NAD(P)-binding Rossmann-fold domains (c.2)</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>0.12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Class II aaRS and biotin synthetases (d.104)</p>
                     </c>
                     <c ca="center">
                        <p>56</p>
                     </c>
                     <c ca="center">
                        <p>0.75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DNA/RNA-binding 3-helical bundle (a.4)</p>
                     </c>
                     <c ca="center">
                        <p>53</p>
                     </c>
                     <c ca="center">
                        <p>0.04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Reductase/isomerase/elongation factor common domain (b.43)</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>0.43</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Flavodoxin-like (c.23)</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>0.11</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Table <tblr tid="T5">5</tblr> shows SCOP folds that are found in both minimal organisms and in <it>E. coli</it>, which are represented at equal or greater levels in the minimal organisms. Proteins with these folds are presumably important for the survival of the organisms, and were not eliminated during reductive evolution. Five SCOP folds are present in slightly greater numbers in minimal organisms than in <it>E. coli</it>. For example, the DNA primase core fold (e.13) has 3 representatives in <it>M. genitalium</it>: the DNA primase protein itself (dnaE) and two conserved hypothetical proteins (NP_072670 and NP_072719). All five folds are involved in the critical functions of transcription, translation, or DNA replication. Forty-two other SCOP folds are present in the same numbers in each minimal genome as in <it>E. coli</it>. The five with the largest number of copies per genome are shown in Table <tblr tid="T5">5</tblr>. Some appear to be key metabolic enzymes, while others are involved in transcription, translation, or DNA replication.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Over-represented SCOP folds in minimal organisms. Which SCOP folds are most over-represented in minimal organisms, relative to <it>E. coli</it>? The first column gives the name and SCOP sccs identifier for folds from SCOP 1.67. The second column gives the total number of domains with each fold among the five organisms. The third column is calculated as the average number of domains among the five minimal organisms studied that were assigned to each fold, divided by the number of domains in <it>E. coli </it>assigned to the same fold. 37 other folds also have a ratio of 1.0 and 1 representative in each minimal organism.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Fold Name</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Number</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Ratio</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DNA primase core (e.13)</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>1.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>An anticodon-binding domain of class I aminoacyl-tRNA synthetases (a.97)</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>1.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Head domain of nucleotide exchange factor GrpE (b.73)</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>1.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ribosomal proteins L23 and L15e (d.12)</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>1.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DNA clamp (d.131)</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>1.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ValRS/IleRS/LeuRS editing domain (b.51)</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>S-adenosylmethionine synthetase (d.130)</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dihydrofolate reductases (c.71)</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ribosomal protein L6 (d.141)</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>beta and beta-prime subunits of DNA dependent RNA- polymerase (e.29)</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>1.0</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Interestingly, all 47 SCOP folds present in equal or greater numbers in all minimal organisms as in <it>E. coli </it>are also folds for which only a single superfamily is characterized in SCOP; i.e., all proteins sharing the fold are also annotated as evolutionarily related to each other. The case of multiple superfamilies sharing one fold may arise from two alternative causes: convergent evolution of two or more families to one fold, or a single family that has diverged enough that homology between different branches of the family are no longer evident even from structure (in this case, each branch would be classified as a different superfamily in SCOP). These data imply that proteins that play sufficiently important roles to avoid elimination during reductive evolution have also not diverged as much as other protein families due to this same evolutionary pressure.</p>
            <p>An additional set of SCOP folds found only in minimal organisms and not in <it>E. coli </it>is given in Table <tblr tid="T6">6</tblr>. None of these folds are found in all five minimal organisms, and the proteins are not generally related to essential cellular functions such as transcription, translation, or replication. Some are presumably adaptations to the specific environment of the organism, and several (e.g., viral coat and capsid proteins, and the MHC antigen-recognition domain) are not typically found in bacteria. These may represent lateral gene transfers or erroneous annotations.</p>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>SCOP folds in minimal organisms but not <it>E. coli</it>. Which SCOP folds are found in minimal organisms, but not <it>E. coli</it>? The total number of domains from all five minimal organisms that were assigned to each fold is given in the second column.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Fold Name</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Number</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>alpha-2-Macroglobulin receptor associated protein (RAP) domain (a.13)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>STAT-like (a.47)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Annexin (a.65)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DBL homology domain (DH-domain) (a.87)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Non-globular all-alpha subunits of globular proteins (a.137)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>GatB/YqeY domain (a.182)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>gamma-Crystallin-like (b.11)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SMAD/FHA domain (b.26)</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Sortase (b.100)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>C-terminal autoproteolytic domain of nucleoporin nup98 (b.119)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nucleoplasmin-like/VP (viral coat and capsid proteins) (b.121)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein TM1070 (b.123)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein YojF (b.128)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Amidase signature (AS) enzymes (c.117)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>DegV-like (c.119)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Urease, gamma-subunit (d.8)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Penicillin-binding protein 2x (pbp-2x), c-terminal domain (d.11)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MHC antigen-recognition domain (d.19)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Thymidylate synthase-complementing protein Thy1 (d.207)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Smc hinge domain (d.215)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Polo-box domain (d.223)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>After five years of progress in structural genomics, near-complete structural complements of the soluble proteins of several "minimal organisms" are now known. A complete set of fold assignments for nearly all soluble, globular proteins in a proteome is providing a global view of how minimal organisms are using various protein fold classes for different cellular functions and how the fold usage in each class is conserved.</p>
         <p>Data from near-complete structural proteomes can yield hypotheses on protein evolution at a global level. Simple statistical analyses of the variation in numbers of structures in each structural and functional category can shed light on which functional categories are more or less conserved in minimal organisms. For example, the functional categories that showed the least variability in both sequence- and structure-based analyses were involved in essential cellular functions such as transcription and translation. Furthermore, every SCOP fold identified in equal or greater numbers in minimal organisms as in <it>E. coli </it>was the product of a single protein family, indicating that the proteins retained during reductive evolution of minimal organisms also tend to be from slow-evolving families. The latter observation was expected, as essential genes in other species have previously been shown to evolve more slowly than non-essential genes <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>.</p>
         <p>Such observations may be followed up with more detailed studies based on phylogenetic modeling of protein families <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> or the construction of atomic models of proteins in those categories. Detailed atomic modeling of all proteins in a biochemical pathway will be useful to study the plasticity of these pathways in response to evolutionary pressures imposed by different organisms' environments <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Databases</p>
            </st>
            <p>Our database of known protein structures, knownstr, was created on 22 Feb 2005. This database contained sequences of every protein chain released by the PDB <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, including those of obsolete entries, sequences of proteins deposited in the PDB and made available while the structures were still on hold, and sequences from TargetDB <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, for which a structure had been solved by a participating structural genomics center.</p>
            <p>Pfam <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> classification of known structures was evaluated using Pfam version 16.0. The HMMER tool (version 2.3.2) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> was used to compare the Pfam_ls library of hidden Markov models to the knownstr database, using the family-specific "trusted cutoff" score as a cutoff for assigning significance.</p>
            <p>INTEGR8 version 12 <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> was used for sequence data. The Integr8 database contains data for 238 complete proteomes, including 19 eukaryotes. The proteome for each organism is composed of proteins curated from the Swiss-Prot and TrEMBL databases. All proteins were annotated with hidden Markov models <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp> from the InterPro <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> database. Since InterPro includes models from Pfam, we used the supplied InterPro annotations to map Pfam domains onto each protein. The version of InterPro used to annotate Integr8 version 12 includes Pfam 16.0</p>
            <p>SUPERFAMILY <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> version 1.67 contains hidden Markov models based on superfamilies from the SCOP database <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B52">52</abbr></abbrgrp>, also version 1.67. Recent versions of SUPERFAMILY <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> provide pre-calculated annotations of genomes downloaded from NCBI with all the superfamily models. We used these precalculated annotations to assign SCOP domains to sequences from minimal organisms and <it>E. coli</it>, as described below. The false positive rate for SUPERFAMILY annotations is estimated to be less than 1% <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>.</p>
            <p>The Comprehensive Microbial Resource <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> contains annotations of TIGR role categories in its OMNIOME database. We obtained TIGR role annotations from the version of OMNIOME downloaded on 12 May 2005. Of 19 TIGR role categories, two ("signal transduction" and "other categories") were found in low average abundance in the proteomes we analyzed (averaging 0.7 and 9.0 proteins per proteome, respectively), and these categories were excluded from our analysis. The remaining 17 categories are listed in Table <tblr tid="T2">2</tblr>.</p>
         </sec>
         <sec>
            <st>
               <p>Mapping annotations</p>
            </st>
            <p>To use annotations from the SUPERFAMILY and OMNIOME databases, we mapped proteins from the Integr8 database onto corresponding proteins in the NCBI and CMR Locus databases, respectively. In most cases, this was done by mapping identical sequences from the corresponding genome. However, in some cases, the gene or ORF annotations of the same genomes varied between the databases, resulting in different protein sequences. In these cases, we used BLAST <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> version 2.2.9 to map each Integr8 sequence to the most similar sequence in the other databases. We mapped each protein in Integr8 that could not be mapped by direct sequence match to the most significant BLAST hit in the other database, provided the BLAST E-value of the hit at least as significant as an empirically chosen threshold of 10<sup>-10</sup>. An average of 16.3 proteins in each proteome could not be mapped to any of the functional categories in OMNIOME, and were not included in this analysis.</p>
         </sec>
         <sec>
            <st>
               <p>Predicting tractability in high-throughput experiments</p>
            </st>
            <p>We identified all proteins with a predicted transmembrane helix, or with 20% or more residues in low complexity regions, or with 20% or more residues in coiled coil regions, as likely to be intractable in high-throughput experiments. Other proteins were annotated as soluble, globular proteins. The 20% threshold were used in more recent target selection rounds at the Berkeley Structural Genomics Center <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Similar thresholds have also been justified by recent comprehensive crystallization trials on the <it>Thermotoga maritima </it>proteome <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>.</p>
            <p>The "seg" program <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> (version dated 5/24/2000) was run on all sequences in Integr8 to identify putative low complexity regions. The "ccp" program <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> (version dated 6/14/1998) was used to predict coiled coil regions in all sequences, and TMHMM 2.0a <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> was used to predict the locations of transmembrane helices. TMHMM can distinguish between soluble and membrane proteins with both specificity and sensitivity greater than 99%, but frequently produces false positive predictions when signal peptides are present. Default options were used for all programs.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JMC designed the study, carried out the analyses, and drafted the manuscript. SHK and JMC jointly made conceptual design of the study and interpreted the results, and SHK helped draft the manuscript. Both authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work is supported by grants from the NIH (1-P50-GM62412) and the U.S.</p>
            <p>Department of Energy under Contract No. DE-AC02-05CH11231.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Protein function in the post-genomic era</p>
            </title>
            <aug>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Xenarios</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Yeates</snm>
                  <fnm>TO</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>405</volume>
            <fpage>823</fpage>
            <lpage>826</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35015694</pubid>
                  <pubid idtype="pmpid" link="fulltext">10866208</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Integr8 and Genome Reviews: integrated views of complete genomes and proteomes</p>
            </title>
            <aug>
               <au>
                  <snm>Kersey</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bower</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Horne</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Petryszak</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kanz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kanapin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Das</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Michoud</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Phan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Gattiker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kulikova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Faruque</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Duggan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>McLaren</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Reimholz</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Penel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Reuter</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D297</fpage>
            <lpage>302</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">539993</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608201</pubid>
                  <pubid idtype="doi">10.1093/nar/gki039</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Structural genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Burley</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Bonanno</snm>
                  <fnm>JB</fnm>
               </au>
            </aug>
            <source>Methods Biochem Anal</source>
            <pubdate>2003</pubdate>
            <volume>44</volume>
            <fpage>591</fpage>
            <lpage>612</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12647406</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Structural genomics: an overview</p>
            </title>
            <aug>
               <au>
                  <snm>Blundell</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Mizuguchi</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Prog Biophys Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>73</volume>
            <fpage>289</fpage>
            <lpage>295</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0079-6107(00)00008-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">11063776</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A tour of structural genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>801</fpage>
            <lpage>809</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35093574</pubid>
                  <pubid idtype="pmpid" link="fulltext">11584296</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Structural genomics: an approach to the protein folding problem</p>
            </title>
            <aug>
               <au>
                  <snm>Montelione</snm>
                  <fnm>GT</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>13488</fpage>
            <lpage>13489</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">61067</pubid>
                  <pubid idtype="pmpid" link="fulltext">11717420</pubid>
                  <pubid idtype="doi">10.1073/pnas.261549098</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Structural genomics: a pipeline for providing structures for the biologist</p>
            </title>
            <aug>
               <au>
                  <snm>Chance</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Bresnick</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Burley</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Lima</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Sali</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Almo</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Bonanno</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Buglino</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Boulton</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Eswar</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ilyin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>McMahan</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pieper</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Ray</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>LK</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2002</pubdate>
            <volume>11</volume>
            <fpage>723</fpage>
            <lpage>738</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.4570102</pubid>
                  <pubid idtype="pmpid" link="fulltext">11910018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Progress of structural genomics initiatives: an analysis of solved target structures</p>
            </title>
            <aug>
               <au>
                  <snm>Todd</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Marsden</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Orengo</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>348</volume>
            <fpage>1235</fpage>
            <lpage>1260</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2005.03.037</pubid>
                  <pubid idtype="pmpid" link="fulltext">15854658</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches</p>
            </title>
            <aug>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2005</pubdate>
            <volume>58</volume>
            <fpage>166</fpage>
            <lpage>179</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20298</pubid>
                  <pubid idtype="pmpid" link="fulltext">15521074</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>ANDY: a general, fault-tolerant tool for database searching on computer clusters</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16397008</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Whole-genome random sequencing and assembly of Haemophilus influenzae Rd</p>
            </title>
            <aug>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kirkness</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Tomb</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Merrick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>FitzHugh</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shirley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Weidman</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Spriggs</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hedblom</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cotton</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Hanna</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Saudek</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Brandon</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Fine</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Fritchman</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Fuhrmann</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Geoghagen</snm>
                  <fnm>NSM</fnm>
               </au>
               <au>
                  <snm>Gnehm</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Small</snm>
                  <fnm>KV</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>269</volume>
            <fpage>496</fpage>
            <lpage>512</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7542800</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae</p>
            </title>
            <aug>
               <au>
                  <snm>Himmelreich</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hilbert</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Plagens</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pirkl</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Herrmann</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1996</pubdate>
            <volume>24</volume>
            <fpage>4420</fpage>
            <lpage>4449</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146264</pubid>
                  <pubid idtype="pmpid" link="fulltext">8948633</pubid>
                  <pubid idtype="doi">10.1093/nar/24.22.4420</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The minimal gene complement of Mycoplasma genitalium</p>
            </title>
            <aug>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Fritchman</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Weidman</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Small</snm>
                  <fnm>KV</fnm>
               </au>
               <au>
                  <snm>Sandusky</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fuhrman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Saudek</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Merrick</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Tomb</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Pott</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Lucier</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>SN</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Hutchison</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>397</fpage>
            <lpage>403</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7569993</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>How many genes can make a cell: the minimal-gene-set concept</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2000</pubdate>
            <volume>1</volume>
            <fpage>99</fpage>
            <lpage>116</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.1.1.99</pubid>
                  <pubid idtype="pmpid" link="fulltext">11701626</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A survey of the Mycoplasma genitalium genome by using random sequencing</p>
            </title>
            <aug>
               <au>
                  <snm>Peterson</snm>
                  <fnm>SN</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Bott</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Hutchison</snm>
                  <fnm>CA</fnm>
                  <suf>3rd</suf>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1993</pubdate>
            <volume>175</volume>
            <fpage>7918</fpage>
            <lpage>7930</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">206970</pubid>
                  <pubid idtype="pmpid">8253680</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Global transposon mutagenesis and a minimal Mycoplasma genome</p>
            </title>
            <aug>
               <au>
                  <snm>Hutchison</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>SN</fnm>
               </au>
               <au>
                  <snm>Gill</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Cline</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <fpage>2165</fpage>
            <lpage>2169</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.286.5447.2165</pubid>
                  <pubid idtype="pmpid" link="fulltext">10591650</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Sequencing and analysis of bacterial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Mushegian</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Rudd</snm>
                  <fnm>KE</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <fpage>404</fpage>
            <lpage>416</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(02)00508-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">8723345</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Novelties from the complete genome of Mycoplasma genitalium</p>
            </title>
            <aug>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Casari</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>1996</pubdate>
            <volume>20</volume>
            <fpage>898</fpage>
            <lpage>900</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8793887</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption</p>
            </title>
            <aug>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>In Silico Biol</source>
            <pubdate>1998</pubdate>
            <volume>1</volume>
            <fpage>55</fpage>
            <lpage>67</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11471243</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Errors in genome annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <fpage>132</fpage>
            <lpage>133</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(99)01706-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">10203816</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome</p>
            </title>
            <aug>
               <au>
                  <snm>Balasubramanian</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Regan</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>3075</fpage>
            <lpage>3082</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">108442</pubid>
                  <pubid idtype="pmpid" link="fulltext">10931922</pubid>
                  <pubid idtype="doi">10.1093/nar/28.16.3075</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>287</volume>
            <fpage>797</fpage>
            <lpage>815</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.2583</pubid>
                  <pubid idtype="pmpid" link="fulltext">10191147</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Fold and function predictions for Mycoplasma genitalium proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Rychlewski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Godzik</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Fold Des</source>
            <pubdate>1998</pubdate>
            <volume>3</volume>
            <fpage>229</fpage>
            <lpage>238</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1359-0278(98)00034-0</pubid>
                  <pubid idtype="pmpid">9710568</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>New local potential useful for genome annotation and 3D modeling</p>
            </title>
            <aug>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>FE</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>332</volume>
            <fpage>835</fpage>
            <lpage>850</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(03)00990-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">12972255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Target Selection and Deselection at the Berkeley Structural Genomics Center</p>
            </title>
            <aug>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2006</pubdate>
            <volume>62</volume>
            <fpage>356</fpage>
            <lpage>370</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20674</pubid>
                  <pubid idtype="pmpid" link="fulltext">16276528</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Structural genomics of minimal organisms and protein fold space</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Shin</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Oganesyan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>QS</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Das</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Schulze-Gahmen</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Holbrook</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Holbrook</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Martinez</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Oganesyan</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Degiovanni</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Henriquez</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jancarik</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pufan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>IG</fnm>
               </au>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Hou</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gold</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Yokota</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Struct Funct Genomics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>63</fpage>
            <lpage>70</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s10969-005-2651-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">16211501</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS</p>
            </title>
            <aug>
               <au>
                  <snm>Shigenobu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sakaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>407</volume>
            <fpage>81</fpage>
            <lpage>86</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35024074</pubid>
                  <pubid idtype="pmpid" link="fulltext">10993077</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Small genome of Candidatus Blochmannia, the bacterial endosymbiont of Camponotus, implies irreversible specialization to an intracellular lifestyle</p>
            </title>
            <aug>
               <au>
                  <snm>Wernegreen</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Lazarus</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Degnan</snm>
                  <fnm>PH</fnm>
               </au>
            </aug>
            <source>Microbiology</source>
            <pubdate>2002</pubdate>
            <volume>148</volume>
            <fpage>2551</fpage>
            <lpage>2556</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12177348</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia</p>
            </title>
            <aug>
               <au>
                  <snm>Akman</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Oshima</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shiba</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aksoy</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>32</volume>
            <fpage>402</fpage>
            <lpage>407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng986</pubid>
                  <pubid idtype="pmpid" link="fulltext">12219091</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Sequencing and analysis of the genome of the Whipple's disease bacterium Tropheryma whipplei</p>
            </title>
            <aug>
               <au>
                  <snm>Bentley</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Maiwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Pallen</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Yeats</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Dover</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Norbertczak</snm>
                  <fnm>HT</fnm>
               </au>
               <au>
                  <snm>Besra</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>von Herbay</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Goble</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rutter</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Squares</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Squares</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Relman</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Lancet</source>
            <pubdate>2003</pubdate>
            <volume>361</volume>
            <fpage>637</fpage>
            <lpage>644</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0140-6736(03)12597-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">12606174</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The genome sequence of Blochmannia floridanus: comparative analysis of reduced genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Gil</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Silva</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Zientz</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Delmotte</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gonzalez-Candelas</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Latorre</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rausell</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kamerbeek</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gadau</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Holldobler</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>van Ham</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Gross</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Moya</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>9388</fpage>
            <lpage>9393</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">170928</pubid>
                  <pubid idtype="pmpid" link="fulltext">12886019</pubid>
                  <pubid idtype="doi">10.1073/pnas.1533499100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Tropheryma whipplei Twist: a human pathogenic Actinobacteria with a reduced genome</p>
            </title>
            <aug>
               <au>
                  <snm>Raoult</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ogata</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Audic</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Robert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Suhre</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Drancourt</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1800</fpage>
            <lpage>1809</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403771</pubid>
                  <pubid idtype="pmpid" link="fulltext">12902375</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Update on the Pfam5000 Strategy for Selection of Structural Genomics Targets</p>
            </title>
            <aug>
               <au>
                  <snm>Chandonia</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, Shanghai, China</source>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Protein structure prediction and structural genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sali</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <fpage>93</fpage>
            <lpage>96</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1065659</pubid>
                  <pubid idtype="pmpid" link="fulltext">11588250</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Evolution of function in protein superfamilies, from a structural perspective</p>
            </title>
            <aug>
               <au>
                  <snm>Todd</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Orengo</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>307</volume>
            <fpage>1113</fpage>
            <lpage>1143</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4513</pubid>
                  <pubid idtype="pmpid" link="fulltext">11286560</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Supra-domains: evolutionary units larger than single protein domains</p>
            </title>
            <aug>
               <au>
                  <snm>Vogel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Berzuini</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bashton</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>336</volume>
            <fpage>809</fpage>
            <lpage>823</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2003.12.026</pubid>
                  <pubid idtype="pmpid" link="fulltext">15095989</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The relationship between protein structure and function: a comprehensive survey with application to the yeast genome</p>
            </title>
            <aug>
               <au>
                  <snm>Hegyi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>288</volume>
            <fpage>147</fpage>
            <lpage>164</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.2661</pubid>
                  <pubid idtype="pmpid" link="fulltext">10329133</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>SCOP: a structural classification of proteins database for the investigation of sequences and structures</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>247</volume>
            <fpage>536</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1995.0159</pubid>
                  <pubid idtype="pmpid" link="fulltext">7723011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>From protein structure to function</p>
            </title>
            <aug>
               <au>
                  <snm>Orengo</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Todd</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>374</fpage>
            <lpage>382</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(99)80051-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">10361094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>The Comprehensive Microbial Resource</p>
            </title>
            <aug>
               <au>
                  <snm>Peterson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Umayam</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Dickinson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hickey</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>123</fpage>
            <lpage>125</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29848</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125067</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.123</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Do essential genes evolve slowly?</p>
            </title>
            <aug>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>NG</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>747</fpage>
            <lpage>750</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(99)80334-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">10421576</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Biochemical evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Wilson</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Carlson</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>1977</pubdate>
            <volume>46</volume>
            <fpage>573</fpage>
            <lpage>639</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bi.46.070177.003041</pubid>
                  <pubid idtype="pmpid" link="fulltext">409339</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Assessing evolutionary relationships among microbes from whole-genome analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Curr Opin Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>3</volume>
            <fpage>475</fpage>
            <lpage>480</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1369-5274(00)00125-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">11050445</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Evolution of the protein repertoire</p>
            </title>
            <aug>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vogel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>300</volume>
            <fpage>1701</fpage>
            <lpage>1703</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1085371</pubid>
                  <pubid idtype="pmpid" link="fulltext">12805536</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>The Protein Data Bank</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Westbrook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gilliland</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bhat</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Weissig</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shindyalov</snm>
                  <fnm>IN</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>PE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>235</fpage>
            <lpage>242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102472</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592235</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.235</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>TargetDB: a target registration database for structural genomics projects</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Oughtred</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Westbrook</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>2860</fpage>
            <lpage>2862</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth300</pubid>
                  <pubid idtype="pmpid" link="fulltext">15130928</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>The Pfam protein families database</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Coin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Hollich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Studholme</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Yeats</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <issue>32 Database</issue>
            <fpage>D138</fpage>
            <lpage>141</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308855</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681378</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh121</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Profile hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>755</fpage>
            <lpage>763</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid>
                  <pubid idtype="pmpid" link="fulltext">9918945</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Hidden Markov models in computational biology. Applications to protein modeling</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mian</snm>
                  <fnm>IS</fnm>
               </au>
               <au>
                  <snm>Sjolander</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1994</pubdate>
            <volume>235</volume>
            <fpage>1501</fpage>
            <lpage>1531</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1994.1104</pubid>
                  <pubid idtype="pmpid" link="fulltext">8107089</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>The InterPro Database, 2003 brings increased coverage and new features</p>
            </title>
            <aug>
               <au>
                  <snm>Mulder</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Attwood</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Binns</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Biswas</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bradley</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Copley</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Courcelle</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Das</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Falquet</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Haft</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kanapin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Krestyaninova</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Letunic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lonsdale</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Silventoinen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Orchard</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Peyruc</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Selengut</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Servant</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Vaughan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>315</fpage>
            <lpage>318</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165493</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520011</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg046</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure</p>
            </title>
            <aug>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Karplus</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hughey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>313</volume>
            <fpage>903</fpage>
            <lpage>919</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5080</pubid>
                  <pubid idtype="pmpid" link="fulltext">11697912</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>SCOP database in 2004: refinements integrate structure and sequence family data</p>
            </title>
            <aug>
               <au>
                  <snm>Andreeva</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Howorth</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <issue>32 Database</issue>
            <fpage>D226</fpage>
            <lpage>229</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308773</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681400</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh039</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>The SUPERFAMILY database in 2004: additions and improvements</p>
            </title>
            <aug>
               <au>
                  <snm>Madera</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vogel</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kummerfeld</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <issue>32 Database</issue>
            <fpage>D235</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308851</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681402</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh117</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments</p>
            </title>
            <aug>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>268</fpage>
            <lpage>272</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99153</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752312</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.268</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Canaves</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>IA</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>344</volume>
            <fpage>977</fpage>
            <lpage>991</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.09.076</pubid>
                  <pubid idtype="pmpid" link="fulltext">15544807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Non-globular domains in protein sequences: automated segmentation using complexity measures</p>
            </title>
            <aug>
               <au>
                  <snm>Wootton</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Comput Chem</source>
            <pubdate>1994</pubdate>
            <volume>18</volume>
            <fpage>269</fpage>
            <lpage>285</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0097-8485(94)85023-2</pubid>
                  <pubid idtype="pmpid">7952898</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Prediction and analysis of coiled-coil structures</p>
            </title>
            <aug>
               <au>
                  <snm>Lupas</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1996</pubdate>
            <volume>266</volume>
            <fpage>513</fpage>
            <lpage>525</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8743703</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Larsson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>305</volume>
            <fpage>567</fpage>
            <lpage>580</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4315</pubid>
                  <pubid idtype="pmpid" link="fulltext">11152613</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

