<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1472-6807-9-34</ui>
   <ji>1472-6807</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Universal partitioning of the hierarchical fold network of 50-residue segments in proteins</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Ito</snm>
               <fnm>Jun-ichi</fnm>
               <insr iid="I1"/>
               <email>junichiito333@gmail.com</email>
            </au>
            <au id="A2">
               <snm>Sonobe</snm>
               <fnm>Yuki</fnm>
               <insr iid="I2"/>
               <email>velvet_morning5@yahoo.co.jp</email>
            </au>
            <au id="A3">
               <snm>Ikeda</snm>
               <fnm>Kazuyoshi</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <insr iid="I4"/>
               <email>ikeda@pharmadesign.co.jp</email>
            </au>
            <au id="A4">
               <snm>Tomii</snm>
               <fnm>Kentaro</fnm>
               <insr iid="I3"/>
               <email>k-tomii@aist.go.jp</email>
            </au>
            <au id="A5" ca="yes">
               <snm>Higo</snm>
               <fnm>Junichi</fnm>
               <insr iid="I5"/>
               <email>higo@protein.osaka-u.ac.jp</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Graduate School of Frontier Science, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan</p>
            </ins>
            <ins id="I2">
               <p>School of Life Sciences, Tokyo University of Pharmacy and Life Sciences, 1432-1 Horinouchi, Hachioji, Tokyo, 192-0392, Japan</p>
            </ins>
            <ins id="I3">
               <p>Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan</p>
            </ins>
            <ins id="I4">
               <p>PharmaDesign, Inc., 2-19-8 Hacchobori, Chuo-ku, Tokyo 104-0032, Japan</p>
            </ins>
            <ins id="I5">
               <p>The Center for Advanced Medical Engineering and Informatics, Osaka University, Open Laboratories for Advanced Bioscience and Biotechnology, 6-2-3, Furuedai, Suita, Osaka 565-0874, Japan</p>
            </ins>
         </insg>
         <source>BMC Structural Biology</source>
         <issn>1472-6807</issn>
         <pubdate>2009</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>34</fpage>
         <url>http://www.biomedcentral.com/1472-6807/9/34</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19454039</pubid>
               <pubid idtype="doi">10.1186/1472-6807-9-34</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>07</day>
               <month>10</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>20</day>
               <month>5</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>20</day>
               <month>5</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Ito et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Several studies have demonstrated that protein fold space is structured hierarchically and that power-law statistics are satisfied in relation between the numbers of protein families and protein folds (or superfamilies). We examined the internal structure and statistics in the fold space of 50 amino-acid residue segments taken from various protein folds. We used inter-residue contact patterns to measure the tertiary structural similarity among segments. Using this similarity measure, the segments were classified into a number (<it>K</it><sub>c</sub>) of clusters. We examined various <it>K</it><sub>c </sub>values for the clustering. The special resolution to differentiate the segment tertiary structures increases with increasing <it>K</it><sub>c</sub>. Furthermore, we constructed networks by linking structurally similar clusters.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The network was partitioned persistently into four regions for <it>K</it><sub>c </sub>&#8805; 1000. This main partitioning is consistent with results of earlier studies, where similar partitioning was reported in classifying protein domain structures. Furthermore, the network was partitioned naturally into several dozens of sub-networks (i.e., communities). Therefore, intra-sub-network clusters were mutually connected with numerous links, although inter-sub-network ones were rarely done with few links. For <it>K</it><sub>c </sub>&#8805; 1000, the major sub-networks were about 40; the contents of the major sub-networks were conserved. This sub-partitioning is a novel finding, suggesting that the network is structured hierarchically: Segments construct a cluster, clusters form a sub-network, and sub-networks constitute a region. Additionally, the network was characterized by non-power-law statistics, which is also a novel finding.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Main findings are: (1) The universe of 50 residue segments found here was characterized by non-power-law statistics. Therefore, the universe differs from those ever reported for the protein domains. (2) The 50-residue segments were partitioned persistently and universally into some dozens (ca. 40) of major sub-networks, irrespective of the number of clusters. (3) These major sub-networks encompassed 90% of all segments. Consequently, the protein tertiary structure is constructed using the dozens of elements (sub-networks).</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Despite the vast number of amino-acid sequences, protein folds (or superfamilies) are quantitatively limited <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Consequently, protein fold classification is an important subject for elucidating the construction of protein tertiary structures. A key word to characterize protein folds is "hierarchy". Well-known databases &#8211; SCOP <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and CATH <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> &#8211; have classified the tertiary structures of protein domains hierarchically. Similarly, a tree diagram was produced to classify the folds <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>Mapping the tertiary structures of full-length protein domains to a conformational space, a structure distribution is generated: a so-called protein fold universe <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. A key word to characterize the fold universe is "space partitioning". A two-dimensional (2D) representation of the fold universe was proposed in earlier reports <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>, where the universe was partitioned into three fold (&#945;, &#946;, and &#945;/&#946;) regions. A three-dimensional (3D) fold universe was partitioned into four fold regions: all-&#945;, all-&#946;, &#945;/&#946;, and &#945;+&#946; <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Software that is accessible on a web site, PDBj <url>http://eprots.protein.osaka-u.ac.jp/globe.cgi</url>, serves the distribution on a global surface <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
         <p>The structures of short protein segments have also been studied: Segments of a few (2&#8211;3) amino-acid residues long were projected in a two-dimensional (2D) space, where some typical combinations frequently appeared <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Fold universes of segments of 4&#8211;9 residues long <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and 10&#8211;20 residues long <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp> showed several clearly distinguishable structural clusters. A systematic survey for 10&#8211;50 residue segments has shown that the fold universe is classifiable into segment universes of three types: short (10&#8211;22 residues), medium (23&#8211;26 residues), and long (27&#8211;50 residues) <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. In this work, the 3D shape of the universe varied abruptly at 23 and 27 residues long. A sequence-structure correlation found in short segments supports the tertiary structure prediction of full-length proteins <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>.</p>
         <p>These studies of protein segments and domains exemplify some structural clusters existing in the low-dimensional (2D or 3D) conformational space. The benefit of the low-dimensional expression is that one can readily imagine the shape of the universe. Increasing the segment length, however, the lowering of the space dimensionality hides the internal architecture of the structure distribution. Consequently, the internal architecture of the distribution for 50-residue segments (or longer segments) is unclear <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. To compensate the full-dimensional information to the low-dimensional expression, a network is helpful in which two structures close to each other in the full-dimensional conformational space are connected.</p>
         <p>Presume an ensemble of points (or nodes). Inter-node linkages form the networks. The network concept has been applied recently to biological systems <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. Structurally similar segments can be linked for the segment fold universe. The structural similarity is computed for the overall structures of two segments (i.e., all coordinates of the segments). Therefore, the similarity is a quantity defined in full-dimensional space. Consequently, a 2D or 3D universe consisting of linked nodes involves full-dimensional information. To assign inter-node linkage in the ensemble, a score is important to quantify the structural similarity between two tertiary structures. Inter-residue contact (native contact) patterns have been used as reaction coordinates in protein folding studies <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. When two structures have similar native contact patterns, they exhibit similar inter-residue packing. Results of several studies indicate that the native contacts are useful indicators to assess the protein folding process <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp> and folding time scale <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>.</p>
         <p>Herein, we constructed a fold network of 50-residue segments taken from four major structural classes of protein domains. We used the inter-residue contact pattern for the similarity score. The resultant networks showed the main partitioning, as expected. Furthermore, as a new finding, the network of the segment structures was partitioned into dozens of universal communities (sub-networks). From these observations, we propose a novel protein structure hierarchy with community sites at a hierarchy level. The novelty of the currently identified hierarchy was ensured by non-power-law statistics in the hierarchy, which differs from power-law statistics characterizing other hierarchies ever found for protein tertiary structures.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>As described in <it>Methods</it>, 50-residue segments were taken from representative proteins and classified into <it>K</it><sub>c </sub>clusters, each of which consists of structurally similar segments. We calculated the native contact patterns that are common in each cluster, and constructed networks by connecting the clusters according to their contact pattern similarity. In <it>Results</it>, we first examine the general aspects of the obtained clusters. Second, we check the conformational distribution using a 3D map. Finally, we analyze the characterization of 50-residue segment universe using a network analysis.</p>
         <p>As described in this paper, indices <it>i </it>and <it>j </it>are used for specifying residue positions in a 50-residue segment, <it>s </it>and <it>t </it>for segment ordinal numbers, <it>u </it>and <it>v </it>for cluster ordinal numbers, and <it>w </it>for a community ordinal number.</p>
         <sec>
            <st>
               <p>General aspects for clusters</p>
            </st>
            <p>Figure <figr fid="F1">1A</figr> portrays the dependence of the average cluster size &lt;<it>S </it>> (Eq. 3) on the number <it>K</it><sub>c </sub>of clusters. Actually, <it>K</it><sub>c </sub>determines the spatial resolution to view the universe of the 50-residue segments: With decreasing <it>K</it><sub>c</sub>, &lt;<it>S </it>> increases because structurally different segments are fused into a cluster. The change of &lt;<it>S </it>> was rapid for small <it>K</it><sub>c </sub>and slow for larger <it>K</it><sub>c</sub>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>&lt;<it>S </it>> and &lt;<it>O </it>> as a function of <it>K</it><sub>c</sub></p>
               </caption>
               <text>
                  <p><b>&lt;<it>S </it>> and &lt;<it>O </it>> as a function of <it>K</it><sub>c</sub></b>. (A) &lt;<it>S </it>> is the average cluster size (Eq. 3). The error bar shows the standard deviation over clusters. (B) &lt;<it>O </it>> is the average number of segments supplied by a protein to a cluster (see the text for a detailed definition of &lt;<it>O </it>>).</p>
               </text>
               <graphic file="1472-6807-9-34-1"/>
            </fig>
            <p>The segments were generated by sliding a 50-residue window one residue by one residue along the domain sequences (see <it>Methods</it>). Consequently, two segments taken from the same protein domain with mutual adjacency in the sequence might have similar structures and might therefore be involved in a cluster. We did the following analysis to verify this possibility quantitatively: Presume that a cluster <it>u </it>involves <it>n</it><sub><it>m </it></sub>segments originated in a protein <it>m</it>. Subsequently, we introduced a quantity: <inline-formula><graphic file="1472-6807-9-34-i1.gif"/></inline-formula>, where the summation is taken over proteins that supply segment(s) to the cluster <it>u</it>, and <it>N</it><sub>p </sub>is the number of those proteins. Figure <figr fid="F1">1B</figr> presents a plot of the average of <it>O</it><sub><it>u </it></sub>as a function of <it>K</it><sub>c</sub>: <inline-formula><graphic file="1472-6807-9-34-i2.gif"/></inline-formula>. For <it>K</it><sub>c </sub>= 1000, &lt;<it>O </it>> converged to 2.2. Consequently, a protein supplies only two or three segments to a cluster on average: i.e., a cluster does not contain excessive segments derived from a single protein for <it>K</it><sub>c </sub>&#8805; 1000.</p>
            <p>Figure <figr fid="F2">2</figr> depicts the number (<it>n</it><sub>u</sub>) of segments involved in a cluster as a function of the cluster ordinal number for <it>K</it><sub>c </sub>= 1000. The decay of <it>n</it><sub>u </sub>is non-exponential. It is particularly interesting that even cluster #950 involves more than 100 segments, which means that the cluster comprises more than 40 (= 100/2.5) different proteins (&lt;<it>O </it>> &#8776; 2.5 for <it>K</it><sub><it>c </it></sub>= 1000). In the last 50 clusters, <it>n</it><sub>u </sub>decreased quickly. These clusters consist of randomly structured segments. Although segments were taken from all-&#945;, all-&#946;, &#945;/&#946;, and &#945;+&#946; SCOP classes, the structures can be random.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Number <it>n</it><sub><it>u </it></sub>of segments in a cluster as a function of the ordinal number of the cluster</p>
               </caption>
               <text>
                  <p><b>Number <it>n</it><sub><it>u </it></sub>of segments in a cluster as a function of the ordinal number of the cluster</b>.</p>
               </text>
               <graphic file="1472-6807-9-34-2"/>
            </fig>
            <p>Figure <figr fid="F3">3</figr> depicts &lt;<it>f </it>><sub><it>K</it>c </sub>(Eq. 9) depending on <it>K</it><sub>c</sub>. The value of &lt;<it>f </it>><sub><it>K</it>c </sub>was 0.60&#8211;0.65 for <it>K</it><sub>c </sub>&#8805; 1000. The similarity threshold <it>f</it><sub>0 </sub>for assigning the inter-cluster linkage (Eq. 7) was 0.7. Figure <figr fid="F3">3</figr> presents that the inter-residue similarity is compatible with the intra-cluster similarity.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Averaged correlation coefficient &lt;<it>f </it>><sub><it>K</it>c </sub>(Eq. 9) for intra-cluster segments as a function of <it>K</it><sub>c</sub></p>
               </caption>
               <text>
                  <p><b>Averaged correlation coefficient &lt;<it>f </it>><sub><it>K</it>c </sub>(Eq. 9) for intra-cluster segments as a function of <it>K</it><sub>c</sub></b>.</p>
               </text>
               <graphic file="1472-6807-9-34-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Fold universe and network of clusters</p>
            </st>
            <p>The inter-cluster (inter-node) links were assigned to the <it>K</it><sub><it>c </it></sub>clusters according to the adjacency matrix <it>a</it><sub><it>uv</it></sub>. Directly connected clusters have mutually similar inter-residue contact patterns. Internal architectures of the networks were investigated by dividing the networks into communities (sub-networks) using Newman's method <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. In parallel, we projected the networks into a 3D space to obtain positions in the conformational space (see Additional file <supplr sid="S1">1</supplr> for details). Although the clusters were embedded in the 3D space, the inter-cluster links were given to clusters that are mutually close in the full-dimensional space.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Supplementary Methods and Supplementary Results</b>. There are three sections in the Supplementary Methods as follows: (1) The method of embedding the inter-cluster network into 3D space. (2) The definition of F-measure. (3) The coloring method for clusters in the 3D network. In the Supplementary Results, tertiary structures of fragments in the same cluster and those in the same community are discussed.</p>
               </text>
               <file name="1472-6807-9-34-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Each community was characterized by five biophysical structural features: the &#945;, &#946;, &#945;&#946; secondary-structure elements, the radius of gyration, and the number of inter-residue contacts, denoted respectively as <it>n</it><sub><it>&#945;</it></sub>, <it>n</it><sub><it>&#946;</it></sub>, <it>n</it><sub><it>&#945;&#946;</it></sub>, <it>R</it><sub>g</sub>, and <it>N</it><sub>contact</sub>. Then, the communities were classified into four types (&#945;, &#946;, &#945;&#946;, and randomly structured communities) depending on the five structural features (see <it>Methods </it>for details).</p>
            <p>Figure <figr fid="F4">4</figr> portrays the 3D cluster distributions at <it>K</it><sub>c </sub>= 1000, 2000, and 3000, where a single color was assigned to a community depending on secondary-structure elements <it>n</it><sub><it>&#945;</it></sub>, <it>n</it><sub><it>&#946;</it></sub>, and <it>n</it><sub><it>&#945;&#946; </it></sub>(see Additional file <supplr sid="S1">1</supplr> for details). This figure clearly illustrates that the 3D cluster network is partitioned into four fold-regions (mainly &#945;, mainly &#946;, &#945;&#946;, and randomly structured regions) independent of <it>K</it><sub>c</sub>, which respectively consist of &#945;, &#946;, &#945;&#946;, and randomly structured communities. We termed this partitioning as "main partitioning". Figure <figr fid="F5">5</figr> shows that the overall shape of the network adopted a three-leaf clover shape (mainly &#945;, mainly &#946;, and &#945;&#946; regions surrounding the randomly structured region). We checked quantitatively whether the 3D distribution reflected the original full-dimensional distribution by calculating F-measure <inline-formula><graphic file="1472-6807-9-34-i3.gif"/></inline-formula> (see Additional file <supplr sid="S1">1</supplr> for the definition of <inline-formula><graphic file="1472-6807-9-34-i3.gif"/></inline-formula>). The value of <inline-formula><graphic file="1472-6807-9-34-i3.gif"/></inline-formula> was, respectively, 0.804 for <it>K</it><sub>c </sub>= 1000, 0.673 for <it>K</it><sub>c </sub>= 2000, and 0.593 for <it>K</it><sub>c </sub>= 3000. The large value of <inline-formula><graphic file="1472-6807-9-34-i3.gif"/></inline-formula> for <it>K</it><sub>c </sub>= 1000 indicates that the 3D cluster distribution fairly reflects the full-dimensional distribution. The <inline-formula><graphic file="1472-6807-9-34-i3.gif"/></inline-formula> value decreased concomitantly with increasing <it>K</it><sub>c</sub>. However, the three-leaf clover shape of the distribution was conserved at various <it>K</it><sub>c</sub>, which strongly suggests that the main partitioning exists in the 50-residue segments universe.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Networked 3D distribution of clusters for <it>K</it><sub>c </sub>= 1000 (A), 2000 (B), and 3000 (C)</p>
               </caption>
               <text>
                  <p><b>Networked 3D distribution of clusters for <it>K</it><sub>c </sub>= 1000 (A), 2000 (B), and 3000 (C)</b>. In this figure, a sphere represents a cluster. The larger the sphere, the more segments the cluster involves. The coloring method for clusters and inter-cluster links is explained briefly below (see Additional file <supplr sid="S1">1</supplr> for details): The &#945;, &#946;, and &#945;&#946; communities are, respectively, red, blue, and green. The larger the secondary-structure contents in a community, the greater the color strength. All randomly structured communities are shown in black. Colors assigned to cluster-cluster links are as follows: red for links within &#945; communities, blue for those within &#946; communities, green for those within &#945;&#946; communities, and black for other links.</p>
               </text>
               <graphic file="1472-6807-9-34-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Main and sub-partitioning of the cluster network</p>
               </caption>
               <text>
                  <p><b>Main and sub-partitioning of the cluster network</b>.</p>
               </text>
               <graphic file="1472-6807-9-34-5"/>
            </fig>
            <p>Figure <figr fid="F6">6</figr> displays segment tertiary structures picked from clusters. This figure portrays that the structure classification by the five structural features correlates well with the visual secondary-structure constitution. Most segments originating in the all-&#945; SCOP fold class were assigned to the &#945; communities (see a-1 and a-2 in Figure <figr fid="F6">6</figr>). Those that originated in the all-&#946; SCOP fold class were assigned to the &#946; communities (see b-1 &#8211; b-3). The majority of segments taken from the &#945;/&#946; SCOP fold class were assigned to the &#945;&#946; communities (see c-1 &#8211; c-4), although some were involved in other fold regions. In contrast, segments from the &#945;+&#946; SCOP fold class scattered to all the fold regions because the &#945;+&#946; proteins are a mixture of helices, strands, and randomly structured fragments, where the &#945; and &#946; secondary-structure elements are not necessarily neighbors to each other in the sequence. Consequently, the 50-residue segments from the &#945;+&#946; proteins can involve various structural features. The randomly structured region contained clusters with a few secondary-structure elements (see r-1 &#8211; r-4 in Figure <figr fid="F6">6</figr>). However, its polypeptide packing was loose, as portrayed in Figure <figr fid="F7">7</figr>, where the randomly structured clusters had large <it>R</it><sub>g</sub>.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Tertiary structures picked from 3D distribution for <it>K</it><sub>c </sub>= 1000 Colors</p>
               </caption>
               <text>
                  <p><b>Tertiary structures picked from 3D distribution for <it>K</it><sub>c </sub>= 1000 Colors</b>. of clusters are the same as those depicted in Figure 4. Inter-cluster links are not shown. This figure is presented with the same orientation as that of Figure 4.</p>
               </text>
               <graphic file="1472-6807-9-34-6"/>
            </fig>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Radius of gyration <it>R</it><sub>g </sub>of clusters</p>
               </caption>
               <text>
                  <p><b>Radius of gyration <it>R</it><sub>g </sub>of clusters</b>. With increasing <it>R</it><sub>g</sub>, the cluster color is redder. This figure is presented with the same orientation as that of Figure 4.</p>
               </text>
               <graphic file="1472-6807-9-34-7"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Non-power-law statistics</p>
            </st>
            <p>The protein-domain universe is known to be an extremely biased distribution <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B45">45</abbr></abbrgrp>. Many studies have suggested a power-law statistic to represent the relation between the number of families and the number of folds <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>. For instance, Shakhnovich and co-workers created a protein-domain universe graph (PDUG) with adoption of a DALI Z-score for the similarity score, and showed that the domain universe followed a power-law distribution <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Consequently, it is interesting to check if the currently produced network of the 50-residue segments follows the power law distribution.</p>
            <p>First, we calculated the number (<it>n</it><sub>seg</sub>) of segments involved in each cluster. Figures <figr fid="F8">8A, B</figr>, and <figr fid="F8">8C</figr> portray the relation between <it>n</it><sub>seg </sub>and the number of clusters that respectively involve <it>n</it><sub>seg </sub>segments at <it>K</it><sub>c </sub>= 1000, 2000, and 3000. The distributions were symmetric (the value of skewness was 0.138 for <it>K</it><sub><it>c </it></sub>= 1000, 0.006 for <it>K</it><sub><it>c </it></sub>= 2000, and -0.066 for <it>K</it><sub><it>c </it></sub>= 3000) on the X-axis, log(<it>n</it><sub>seg</sub>), and far from the power-law statistics. Therefore, the currently obtained universe differs from those that have ever been reported. Additionally, we calculated the number (<it>n</it>'<sub><it>seg</it></sub>) of segments involved in each community, and showed the relation between <it>n</it>'<sub><it>seg </it></sub>and the number of communities involved <it>n</it>'<sub><it>seg </it></sub>fragments for <it>K</it><sub>c </sub>= 1000, 2000, and 3000. We again obtained non-power-law statistics in the relation (data not shown).</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Relation between number (<it>n</it><sub>seg</sub>) of segments involved in a cluster and number of clusters for <it>K</it><sub>c </sub>= 1000 (A), 2000 (B), and 3000 (C)</p>
               </caption>
               <text>
                  <p><b>Relation between number (<it>n</it><sub>seg</sub>) of segments involved in a cluster and number of clusters for <it>K</it><sub>c </sub>= 1000 (A), 2000 (B), and 3000 (C)</b>.</p>
               </text>
               <graphic file="1472-6807-9-34-8"/>
            </fig>
            <p>Next, we calculated a connectivity distribution, <it>P</it>(<it>k</it>), of the networks to investigate details of the cluster network <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. The <it>P</it>(<it>k</it>) is defined as a distribution function of clusters that have <it>k </it>links to other clusters. Figures <figr fid="F9">9A, B</figr>, and <figr fid="F9">9C</figr> respectively present <it>P</it>(<it>k</it>) at <it>K</it><sub><it>c </it></sub>= 1000, 2000, and 3000. Subsequently, <it>P</it>(<it>k</it>) decays exponentially with increasing <it>k</it>. Therefore, these distributions are exponential ones (or possibly truncated power-law distributions). Consequently, non-power-law networks (i.e., non-scale-free networks) are again observed for the current networks.</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Connectivity distribution <it>P</it>(<it>k</it>) of cluster network at <it>K</it><sub>c </sub>= 1000 (A), 2000 (B), and 3000 (C)</p>
               </caption>
               <text>
                  <p><b>Connectivity distribution <it>P</it>(<it>k</it>) of cluster network at <it>K</it><sub>c </sub>= 1000 (A), 2000 (B), and 3000 (C)</b>. The X-axis <it>k </it>shows the number of links of a cluster connected to other clusters. Solid lines are the best-fit curves drawn assuming that <it>P</it>(<it>k</it>) decays with <it>k </it>exponentially.</p>
               </text>
               <graphic file="1472-6807-9-34-9"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Robustness of communities</p>
            </st>
            <p>We conducted modularity analysis to study cluster networks from another perspective. First, the networks were divided into communities (see <it>Methods</it>). A modularity <it>Q</it><sub>mod </sub>is an index to assess how well the network is divided into communities <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>: 0 &#8804; <it>Q</it><sub>mod </sub>&#8804; 1. A network with a large <it>Q</it><sub>mod </sub>is characterized by numerous intra-community links and a few inter-community links. Figure <figr fid="F10">10A</figr> portrays the <it>K</it><sub>c </sub>dependence of <it>Q</it><sub>mod</sub>, which has the maximum at <it>K</it><sub>c </sub>= 200, indicating that the communities were highly isolated at <it>K</it><sub>c </sub>= 200. For <it>K</it><sub>c </sub>> 200, the communities were connected gradually by links, thereby decreasing <it>Q</it><sub>mod</sub>. For <it>K</it><sub>c </sub>&#8805; 1000, <it>Q</it><sub>mod </sub>converged to a value (0.63), which indicates that the 50-residue segment network is characterized by high modularity.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p><it>K</it><sub>c </sub>dependence of <it>N</it><sub>com </sub>and <it>Q</it><sub>mod</sub></p>
               </caption>
               <text>
                  <p><b><it>K</it><sub>c </sub>dependence of <it>N</it><sub>com </sub>and <it>Q</it><sub>mod</sub></b>. (A) The <it>K</it><sub>c </sub>dependence of modularity <it>Q</it><sub>mod </sub>(Eq. 10). (B) The bar graph shows the <it>K</it><sub>c </sub>dependence of number, <it>N</it><sub>com</sub>, of communities assigned to the left y-axis. The line with filled circles represents the ratio (assigned to right y-axis) of clusters in major communities to all clusters.</p>
               </text>
               <graphic file="1472-6807-9-34-10"/>
            </fig>
            <p>We next calculated the number of communities at various <it>K</it><sub>c</sub>. We classified the communities into major and minor communities. Major ones are communities consisting of more than three clusters. Then, minor ones are small isolated communities consisting of only one or two clusters without links to other communities. No community involves only one cluster linked to another community. The <it>K</it><sub>c </sub>dependence of the number (<it>N</it><sub>com</sub>) of the major communities is presented in Figure <figr fid="F10">10B</figr>. The minor communities do not characterize the overall property of the network because only 10% of clusters belong to the minor communities at any <it>K</it><sub>c</sub>. The increment of <it>N</it><sub>com </sub>with increasing <it>K</it><sub>c </sub>was rapid for 100 &#8804; <it>K</it><sub>c </sub>&#8804; 1000 and slow for <it>K</it><sub>c </sub>&#8805; 1000. The values of <it>N</it><sub>com </sub>were, respectively, 36, 38, and 38 at <it>K</it><sub>c </sub>= 1000, 2000, and 3000. This result shows that the number of communities was conserved for <it>K</it><sub>c </sub>&#8805; 1000.</p>
            <p>In addition to the analysis presented above, we checked to determine whether the contents (i.e., segments) involved in the communities are conserved with variation of <it>K</it><sub>c</sub>. Subsequently, we assigned a single color to communities common to the universes at <it>K</it><sub>c </sub>= 1000 (Figure <figr fid="F11">11A</figr>), 2000 (Figure <figr fid="F11">11B</figr>), and 3000 (Figure <figr fid="F11">11C</figr>). For instance, the majority of segments in the orange community of Figure <figr fid="F11">11A</figr> were involved in the orange ones in Figures <figr fid="F11">11B</figr> and <figr fid="F11">11C</figr>. Consequently, the communities are conserved well in the universes at different <it>K</it><sub>c</sub>. In other words, the network partitioning into communities is universal, independent of the spatial resolution (i.e., <it>K</it><sub>c</sub>). We termed this inter-community partitioning as "sub-partitioning", whereas the main partitioning is inter-regional partitioning (Figure <figr fid="F5">5</figr>).</p>
            <fig id="F11">
               <title>
                  <p>Figure 11</p>
               </title>
               <caption>
                  <p>Communities at <it>K</it><sub>c </sub>= 1000 (A), 2000 (B), and 3000 (C)</p>
               </caption>
               <text>
                  <p><b>Communities at <it>K</it><sub>c </sub>= 1000 (A), 2000 (B), and 3000 (C)</b>. For each universe, only the top 13 communities by the number of involved clusters are shown. A single color is assigned to communities that are common to the three universes. Communities that are not common among the three are not shown, nor are minor communities.</p>
               </text>
               <graphic file="1472-6807-9-34-11"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Herein, we described universal partitioning of two types in the 50-residue segment networks (Figure <figr fid="F5">5</figr>) based on the network analysis. The main partitioning (the network separation by fold regions) resembles that in the classification scheme of existing databases such as CATH and SCOP. The mainly &#945;, mainly &#946;, &#945;&#946;, and randomly structured regions consist respectively of &#945;, &#946;, &#945;&#946;, and randomly structured communities. However, for the first time, we found communities in the segment fold universe: this sub-partitioning (network separation by communities) is a novel finding. High modularity ensures persistently existing communities, where the intra-community clusters are linked tightly and the inter-community clusters are linked weakly. The universality of the sub-partitioning was remarkable for <it>f</it><sub>0 </sub>(0.65 &#8804; <it>f</it><sub>0 </sub>&#8804; 0.75). Nevertheless, outside this range, the universality vanishes gradually. Our results reveal a hierarchically structured universe for 50-residue segments, as depicted in Figure <figr fid="F12">12</figr>. This hierarchy is robust because the main and sub-partitionings are independent of <it>K</it><sub>c </sub>for <it>K</it><sub>c </sub>&#8805; 1000.</p>
         <fig id="F12">
            <title>
               <p>Figure 12</p>
            </title>
            <caption>
               <p>Hierarchy in the segment universe proposed from the current study</p>
            </caption>
            <text>
               <p><b>Hierarchy in the segment universe proposed from the current study</b>.</p>
            </text>
            <graphic file="1472-6807-9-34-12"/>
         </fig>
         <p>Figure <figr fid="F10">10B</figr> portrays that the current universe for the 50-residue segments consists of some dozens (ca. 40) of major communities. Kihara and Skolnick reported that the current PDB database might cover almost all structures of small proteins <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Crippen and Maiorov generated many self-avoiding conformations of a chain and suggested that the possible structures of a 50-residue chain are classifiable roughly into a small number of types, although the secondary-structure formation was not incorporated in their model <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. A study proposed the conjecture that tertiary-structure evolution of proteins might be achieved using limited repertoires of basic units such as supersecondary structure elements <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Results of such studies are consistent with our results because we have shown that protein tertiary structures can be decomposed into the dozens of major communities of 50-residue segments. Actually, 90% of clusters belong to the major communities. To link those studies with our study more closely, detailed contents of each major community should be investigated. In fact, such a research project is proceeding now. Moreover, the role of the minor communities in the protein structure construction should be studied.</p>
         <p>The currently observed 50-residue segment universe was characterized by the non-power-law distribution. Our result apparently differs from the power-law distribution widely known for the hierarchical protein domain universe <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr><abbr bid="B53">53</abbr></abbrgrp>. The emergence of the non-power-law statistics might be related to the usage of the inter-residue contact, which is a more relaxed similarity measure than widely used ones such as RMSD or the DALI Z-score. It is known that in the power-law statistics the rate for isolated clusters in the entire clusters is high <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. In our non-power law statistics, the rate was low because the relaxed measure provided linkages between clusters. Thus, the two statistics compensate to each other to survey the fold universe. From the non-power-law universe, we could show a novel hierarchy (Figure <figr fid="F12">12</figr>) in the universe and the existence of 40 repertories (Figure 10) to construct the protein tertiary structures, which have not been reported from the power-law universe. These results were also found in the 60- and 70-residue segment universes (data not shown). This suggests that the non-power law is likely to be a general property for segment universes.</p>
         <p>The current network helps to trace conformational changes of segments along the network linkages. <it>Supplementary Results </it>displays that the conformation gradually changes when shifting the view from a cluster to another (see Additional file <supplr sid="S1">1</supplr>).</p>
         <p>The inter-residue contact (native contact) has been widely used as a reaction coordinate in protein folding (see <it>Introduction</it>). We intend to use the currently obtained networks for protein folding study. The networks of fixed-length segments are readily applicable for conformational sampling in protein folding, where the chain length is usually fixed. The randomly structured clusters are located at the root of the distribution (Figure <figr fid="F4">4</figr> and Figure <figr fid="F5">5</figr>), from which the segment conformation can diversify to mainly &#945;, mainly &#946;, or &#945;&#946; regions with increased compactness (Figure <figr fid="F7">7</figr>).</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We constructed a 50-residue segment network for investigating the protein local structure universe. The network was partitioned into some dozens (ca. 40) of major communities with high modularity (0.60 &lt;<it>Q</it><sub>mod </sub>&lt; 0.65), independent of the spatial resolution (<it>K</it><sub>c</sub>). The major communities existed universally and persistently in the universe. Surprisingly, 90% of all segments were covered by the major communities. Consequently, numerous similarities exist among local regions (i.e., 50-residue segments) of proteins. Furthermore, the currently constructed segments networks are characterized by non-power-law (non-scale-free) statistics, which apparently differs from reported characteristics for the fold universe of full-length proteins.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>This section includes six subsections. The first three &#8211; "Generation of 50-residue segment library", "Clustering segments", and "Computation of inter-residue contact patterns" &#8211; are preparative subsections describing construction of the 50-residue segment fold universe. In the subsection titled "Construction of a universe and network", construction of the fold universe and the network is described. "Modularity analysis" presents analyses used to examine the network. The subsection "Characterization of communities by structural features" describes a method to characterize communities depending on five structural features. Specification of indices <it>i</it>, <it>j</it>, <it>s</it>, <it>t</it>, <it>u</it>, <it>v</it>, and <it>w </it>is given at the beginning of <it>Results</it>.</p>
         <sec>
            <st>
               <p>Generation of 50-residue segment library</p>
            </st>
            <p>We generated a structure library of 50-residue segments with reference to the all-&#945;, all-&#946;, &#945;/&#946;, and &#945;+&#946; fold classes defined in the SCOP database (release 1.69) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The SCOP database presents a list that provides a representative for each protein family. We selected tertiary structures of the representative domains from the PDB database <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> with elimination of multi-chain domains, those involving structurally undetermined regions, and those shorter than 50 residues. Furthermore, we eliminated domains consisting of 400 residues or more, which might involve structurally repeating units. Then we obtained 1803 domains (456 from all-&#945;, 393 from all-&#946;, 393 from &#945;/&#946;, and 561 from &#945;+&#946;). A domain that is <it>n</it><sub>r </sub>amino-acid residues long produces <it>n</it><sub>r </sub>- 49 segments from sliding a 50-residue window along the sequence one residue-by-one residue. Finally, we obtained an ensemble of 186 821 segments (32 040 from all-&#945;, 39 375 from all-&#946;, 63 177 from &#945;/&#946;, and 52 229 from &#945;+&#946;). The residue site of each segment was re-numbered from 1 to 50 in our study.</p>
         </sec>
         <sec>
            <st>
               <p>Clustering segments</p>
            </st>
            <p>We classify the collected segments into clusters as follows: First, the inter-C<sub>&#945; </sub>atomic distances were calculated for segment <it>s</it>, where the distance between residues <it>i </it>and <it>j </it>is denoted as <it>r</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>). We eliminated residue pairs |<it>i </it>- <it>j</it>| &lt; 3 because the distances for these pairs are similar for all segments. In other words, those distances have less sensitivity to discriminate the structural differences of segments. Then, the number (<it>N</it><sub>pair</sub>) of the C<sub>&#945;</sub>-atomic pairs in a 50-residue segment is 1128: <it>N</it><sub>pair </sub>= 1128. The set of distances is expressed as a <it>N</it><sub>pair</sub>-dimensional vector: <inline-formula><graphic file="1472-6807-9-34-i4.gif"/></inline-formula> = [<it>r</it><sub><it>s</it></sub>(1, 4), <it>r</it><sub><it>s</it></sub>(1, 5), ..., <it>r</it><sub><it>s</it></sub>(47, 50)]. We define the root mean square distance (<it>rmsd</it><sub><it>st</it></sub>) between <inline-formula><graphic file="1472-6807-9-34-i4.gif"/></inline-formula> and <inline-formula><graphic file="1472-6807-9-34-i5.gif"/></inline-formula> as in the <it>N</it><sub>pair</sub>-dimensional Cartesian space: <inline-formula><graphic file="1472-6807-9-34-i6.gif"/></inline-formula>.</p>
            <p>For classifying the 186 821 segments into <it>K</it><sub>c </sub>clusters, we applied Lloyd's K-means algorithm <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> to the set of <it>rmsd</it><sub><it>st </it></sub>values, where <it>s</it>, <it>t </it>= 1, ..., 186821. One should set <it>K</it><sub>c </sub>in advance in the K-means algorithm. We examined various values for <it>K</it><sub>c </sub>(<it>K</it><sub>c </sub>&#8804; 5000). In Lloyd's method, the <it>K</it><sub>c </sub>clusters are set randomly at the beginning. The finally converged clusters are output. We have checked that the main results are independent of the initial set of clusters.</p>
            <p>We calculated the center (<inline-formula><graphic file="1472-6807-9-34-i7.gif"/></inline-formula>) of a cluster <it>u </it>in the <it>N</it><sub>pair</sub>-dimensional space as <inline-formula><graphic file="1472-6807-9-34-i8.gif"/></inline-formula>, where the element <inline-formula><graphic file="1472-6807-9-34-i9.gif"/></inline-formula> is given as</p>
            <p>
               <display-formula id="M1">
                  <graphic file="1472-6807-9-34-i10.gif"/>
               </display-formula>
            </p>
            <p>The <it>n</it><sub><it>u </it></sub>is the number of constituent segments of the cluster <it>u</it>.</p>
            <p>We defined a size <it>S</it><sub><it>u </it></sub>of the cluster <it>u </it>as</p>
            <p>
               <display-formula id="M2">
                  <graphic file="1472-6807-9-34-i11.gif"/>
               </display-formula>
            </p>
            <p>This equation simply quantifies the average distance from the cluster center <inline-formula><graphic file="1472-6807-9-34-i7.gif"/></inline-formula> to segments belonging to the cluster <it>u </it>in the <it>N</it><sub>pair</sub>-dimensional space. The average cluster size is defined simply as</p>
            <p>
               <display-formula id="M3">
                  <graphic file="1472-6807-9-34-i12.gif"/>
               </display-formula>
            </p>
            <p>where the summation is taken over all the <it>K</it><sub>c </sub>clusters.</p>
         </sec>
         <sec>
            <st>
               <p>Computation of inter-residue contact patterns</p>
            </st>
            <p>In this subsection, we present computation of the inter-cluster and intra-cluster structural similarity based on the inter-residue contact patterns. The inter-residue contacts in segment <it>s </it>were defined as follows: Calculating all the inter-heavy atomic distances between residues <it>i </it>and <it>j </it>for the segment, their minimum distance was registered as the inter-residue distance <it>q</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>). Then, if <it>q</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) &lt; 6.0 &#197;, we judged that the residues <it>i </it>and <it>j </it>were contacting and set a quantity <it>c</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) to 1 (otherwise, <it>c</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) = 0). Here, we again eliminated residue pairs of |<it>i </it>- <it>j</it>| &lt; 3 in the calculation of <it>c</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>). The set of <it>c</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) constructs a matrix <it>C</it><sub><it>s</it></sub>, where element (<it>i</it>, <it>j</it>) is <it>c</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>).</p>
            <p>The upper limit (6.0 &#197;) for <it>q</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) allows no penetration of a water molecule between residues <it>i </it>and <it>j</it>: At <it>q</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) = 6.0 &#197;, the substantial space for water penetration between the residues is approximately 2.0 &#197; (= 6.0 - 2 &#215; 2.0) assuming that radii of segment heavy atoms are 2.0 &#197;. This space of 2.0 &#197; is smaller than the diameter of a water molecule (2.8 &#197;).</p>
            <p>A structural similarity between segments <it>s </it>and <it>t </it>might be counted by comparing <it>C</it><sub><it>s </it></sub>and <it>C</it><sub><it>t</it></sub>. However, a strict comparison engenders an oversight of the similarity in the following case: Presume that <it>c</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) = 1 and <it>c</it><sub><it>t</it></sub>(<it>i</it>,+ 1, <it>j</it>) = 0 in the segment <it>s</it>, and <it>c</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) = 0 and <it>c</it><sub><it>t</it></sub>(<it>i</it>,+ 1, <it>j</it>) = 1 in segment <it>t</it>. The inter-residue contacts in these segments differ but they are similar. The strict comparison does not count such a similarity. To incorporate such similarity, smoothing of <it>C</it><sub><it>s </it></sub>was performed as</p>
            <p>
               <display-formula id="M4">
                  <graphic file="1472-6807-9-34-i13.gif"/>
               </display-formula>
            </p>
            <p>This smoothing (see Figure <figr fid="F13">13</figr>) was done only when residues <it>i' </it>and <it>j' </it>are not contacting and the residues <it>i </it>and <it>j </it>are contacting in the segment. If Eq. 4 produces a negative value, then <it>c</it><sub><it>s</it></sub>(<it>i'</it>, <it>j'</it>) is set to zero. If a non-contacting residue pair (<it>i'</it>, <it>j'</it>) has multiple values for <it>c</it><sub><it>s</it></sub>(<it>i'</it>, <it>j'</it>) attributable to contributions of some contacting pairs around (<it>i'</it>, <it>j'</it>), then the largest value is assigned to the non-contacting pair. As described in this paper, the inter-residue contact matrix <it>C</it><sub><it>s </it></sub>indicates that after the smoothing.</p>
            <fig id="F13">
               <title>
                  <p>Figure 13</p>
               </title>
               <caption>
                  <p>Smoothed inter-residue contacts <it>c</it>(<it>i</it>, <it>j</it>) (Eq. 4)</p>
               </caption>
               <text>
                  <p><b>Smoothed inter-residue contacts <it>c</it>(<it>i</it>, <it>j</it>) (Eq. 4)</b>. It is presumed that residue pair (<it>i</it>, <it>j</it>) is in contact (i.e., <it>c</it>(<it>i</it>, <it>j</it>) = 1), and that the other pairs are non-contacting. Equation 4 provides negative <it>c</it><sub><it>s</it></sub>(<it>i'</it>, <it>j'</it>) at sites where an inequality, |<it>i </it>- <it>i'</it>| + |<it>j </it>- <it>j'</it>| + |(|<it>i </it>- <it>i'</it>| - |<it>j </it>- <it>j'</it>|)| > 5, is satisfied. Besides, this inequality is satisfied without exception when any one of the three inequalities, |<it>i </it>- <it>i'</it>| > 2, |<it>j </it>- <it>j'</it>| > 2, or ||<it>i </it>- <it>i'</it>| - |<it>j </it>- <it>j'</it>|| > 2, is met. Those negative <it>c</it>(<it>i</it>, <it>j</it>) = 1), and that the other pairs are non-contacting. Equation 4 provides negative <it>c</it><sub><it>s</it></sub>(<it>i'</it>, <it>j'</it>) are reset to zero (see text).</p>
               </text>
               <graphic file="1472-6807-9-34-13"/>
            </fig>
            <p>Here, we calculate the contact patterns which are specific to a cluster. For this purpose, we averaged <it>C </it>over the entire segment library and over all segments in cluster <it>u</it>. We denote these averaged matrices as <inline-formula><graphic file="1472-6807-9-34-i14.gif"/></inline-formula> and <inline-formula><graphic file="1472-6807-9-34-i15.gif"/></inline-formula>, respectively. Then, we defined a quantity <inline-formula><graphic file="1472-6807-9-34-i16.gif"/></inline-formula>, where element (<it>i</it>, <it>j</it>) is denoted as <inline-formula><graphic file="1472-6807-9-34-i17.gif"/></inline-formula>. The similarity between clusters <it>u </it>and <it>v </it>was measured using the following correlation coefficient:</p>
            <p>
               <display-formula id="M5">
                  <graphic file="1472-6807-9-34-i18.gif"/>
               </display-formula>
            </p>
            <p>where</p>
            <p>
               <display-formula id="M6">
                  <graphic file="1472-6807-9-34-i19.gif"/>
               </display-formula>
            </p>
            <p>The term <inline-formula><graphic file="1472-6807-9-34-i20.gif"/></inline-formula> in Eq. 5 is defined by setting <it>u </it>= <it>v </it>in Eq. 6, and the term <inline-formula><graphic file="1472-6807-9-34-i21.gif"/></inline-formula> by setting <inline-formula><graphic file="1472-6807-9-34-i22.gif"/></inline-formula> = 1. A large correlation coefficient indicates similar inter-residue contact patterns between the clusters.</p>
            <p>The coefficient <inline-formula><graphic file="1472-6807-9-34-i23.gif"/></inline-formula> is useful as a distance between clusters <it>u </it>and <it>v </it>in a multi dimensional space. Consequently, the set of coefficients define a multi-dimensional weighted graph (i.e., weighted network). In this work, we must convert this weighted graph into an un-weighted one to perform community analysis, which only deals with the un-weighted graph. Therefore, we introduce an adjacency matrix <it>a</it><sub><it>uv </it></sub>in which element (<it>u</it>, <it>v</it>) is given as follows.</p>
            <p>
               <display-formula id="M7">
                  <graphic file="1472-6807-9-34-i24.gif"/>
               </display-formula>
            </p>
            <p>The inter-residue contact patterns are similar between clusters <it>u </it>and <it>v </it>only when <inline-formula><graphic file="1472-6807-9-34-i25.gif"/></inline-formula>. Herein, we set <it>f</it><sub>0 </sub>to 0.7. The meaning of 0.7 is explained in the <it>Results </it>section.</p>
            <p>We next assessed the intra-cluster similarity. First, we defined a quantity <inline-formula><graphic file="1472-6807-9-34-i26.gif"/></inline-formula> for a segment <it>s</it>, where element (<it>i</it>, <it>j</it>) of &#916;<it>C</it><sub><it>s </it></sub>is denoted as &#916;<it>C</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>). Then, we averaged &#916;<it>C</it><sub><it>s</it></sub>(<it>i</it>, <it>j</it>) over the segments in cluster <it>u</it>:</p>
            <p>
               <display-formula id="M8">
                  <graphic file="1472-6807-9-34-i27.gif"/>
               </display-formula>
            </p>
            <p>We define a matrix <it>G</it><sub><it>u </it></sub>for that the element (<it>i</it>, <it>j</it>) as <it>g</it><sub><it>u</it></sub>(<it>i</it>, <it>j</it>). Then, we calculated the correlation coefficient <it>f</it>(<it>G</it><sub><it>u</it></sub>, &#916;<it>C</it><sub><it>s</it></sub>) between <it>G</it><sub><it>u </it></sub>and &#916;<it>C</it><sub><it>s </it></sub>for segments in cluster <it>u</it>, using the same definition as that in Eq. 5. Subsequently, we calculated an averaged correlation coefficient &lt;<it>f </it>><sub><it>u </it></sub>over <it>f</it>(<it>G</it><sub><it>u</it></sub>,&#916;<it>C</it><sub><it>s</it></sub>) of the segments in the cluster <it>u</it>. This quantity is a measure to express the similarity of the inter-residue contact patterns among the segments in cluster <it>u</it>. Finally, &lt;<it>f </it>><sub><it>u </it></sub>was averaged over all clusters.</p>
            <p>
               <display-formula id="M9">
                  <graphic file="1472-6807-9-34-i28.gif"/>
               </display-formula>
            </p>
            <p>The larger the value of <inline-formula><graphic file="1472-6807-9-34-i29.gif"/></inline-formula>, the more similar the inter-residue contact patterns in each cluster are, on average.</p>
         </sec>
         <sec>
            <st>
               <p>Construction of a universe and network</p>
            </st>
            <p>We constructed a distribution (i.e., fold universe) of <it>K</it><sub>c </sub>clusters in a 3D conformational space with embedding clusters into the 3D. Details are presented in Additional file <supplr sid="S1">1</supplr>. As explained in the <it>Introduction</it>, lowering of the space dimensionality hides the internal architecture of the fold universe. To compensate the full-dimensional information to the 3D distribution, links were assigned to clusters with similar inter-residue contact patterns (<it>a</it><sub><it>uv </it></sub>= 1). The generated networks were subjected to the modularity analysis described in the next subsection.</p>
         </sec>
         <sec>
            <st>
               <p>Modularity analysis</p>
            </st>
            <p>To investigate a property of the cluster network, we divided the network into communities (i.e., sub-networks) using an efficient method <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. An example of a network is presented in Figure <figr fid="F14">14</figr>, where two communities (Com 1 and Com 2) exist. A modularity <it>Q</it><sub>mod </sub>is an index to assess how well the network is divided into communities <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>:</p>
            <fig id="F14">
               <title>
                  <p>Figure 14</p>
               </title>
               <caption>
                  <p>Two network types</p>
               </caption>
               <text>
                  <p><b>Two network types</b>. Network (A) has larger modularity <it>Q</it><sub>mod </sub>than (B) does. Filled circles form a community (Com 1); open ones construct the other community (Com 2). Lines between circles represent links.</p>
               </text>
               <graphic file="1472-6807-9-34-14"/>
            </fig>
            <p>
               <display-formula id="M10">
                  <graphic file="1472-6807-9-34-i30.gif"/>
               </display-formula>
            </p>
            <p>where <it>I</it><sub><it>w </it></sub>is the number of links connecting clusters within a community <it>w</it>, <it>N</it><sub>com </sub>is the number of communities existing in the entire network, and <it>I </it>is the number of links existing in the entire network. The quantity <it>d</it><sub><it>w </it></sub>is called the "total degree", which is defined for each community as <it>d</it><sub><it>w </it></sub>= 2<it>I</it><sub>w </sub>+ <it>I</it><sub>w-other</sub>, where <it>I</it><sub>w-other </sub>is the number of links connecting clusters in the community <it>w </it>and clusters outside the community. The value of <it>Q</it><sub>mod </sub>is 0&#8211;1: <it>Q</it><sub>mod </sub>approaches 1 when the number of links connecting different communities decreases. For instance, the network in Figure <figr fid="F14">14A</figr> has <it>Q</it><sub>mod </sub>of 0.466 (<it>I </it>= 34, <it>I</it><sub>1 </sub>= 18, <it>I</it><sub>2 </sub>= 15, <it>d</it><sub>1 </sub>= 37, and <it>d</it><sub>2 </sub>= 31). That of Figure <figr fid="F14">14B</figr> has <it>Q</it><sub>mod </sub>of 0.388 (<it>I </it>= 37, <it>I</it><sub>1 </sub>= 18, <it>I</it><sub>2 </sub>= 15, <it>d</it><sub>1 </sub>= 40, and <it>d</it><sub>2 </sub>= 34). The two networks are equivalent except for the inter-community links.</p>
         </sec>
         <sec>
            <st>
               <p>Characterization of communities by structural features</p>
            </st>
            <p>The manner of differentiating the communities is important. Herein, we characterize the communities depending on five biophysical structural features: radius of gyration (<it>R</it><sub>g</sub>), number of inter-residue contacts (<inline-formula><graphic file="1472-6807-9-34-i31.gif"/></inline-formula> with removal of pairs of |<it>i </it>- <it>j</it>| &lt; 3), number of &#945;-helical residues (<it>n</it><sub><it>&#945;</it></sub>), number of &#946;-helical residues (<it>n</it><sub><it>&#946;</it></sub>), and the sum of <it>n</it><sub><it>&#945; </it></sub>and <it>n</it><sub><it>&#946; </it></sub>(i.e., <it>n</it><sub><it>&#945;&#946; </it></sub>= <it>n</it><sub><it>&#945; </it></sub>+ <it>n</it><sub><it>&#946;</it></sub>).</p>
            <p>First, we calculate the five quantities for each segment. The secondary-structure assignment to each residue in a segment is done using software available at the STRIDE web site <url>http://webclu.bio.wzw.tum.de/stride/</url><abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. Next, we took the average for each of the five quantities over segments in a community. We designate the average quantities in a community <it>w </it>as <it>R</it><sub>g</sub>(<it>w</it>), <it>N</it><sub>contact</sub>(<it>w</it>), <it>n</it><sub><it>&#945;</it></sub>(<it>w</it>), <it>n</it><sub><it>&#946;</it></sub>(<it>w</it>), and <it>n</it><sub><it>&#945;&#946;</it></sub>(<it>w</it>). Then, we classify the communities into &#945;, &#946;, &#945;&#946;, and randomly structured ones according to the five quantities: Randomly structured communities are those with <it>R</it><sub>g </sub>> 14 &#197; and <it>N</it><sub>contact</sub>(<it>w</it>) &lt; 100 or those with <it>n</it><sub><it>&#945;&#946;</it></sub>(<it>w</it>) &lt; 15. In the remaining communities, &#945; communities are those with <it>n</it><sub><it>&#945;</it></sub>(<it>w</it>) > 0.7 &#215; <it>n</it><sub><it>&#945;&#946;</it></sub>(<it>w</it>). In the remaining communities, &#946; communities are those with <it>n</it><sub><it>&#945;</it></sub>(<it>w</it>) > 0.7 &#215; <it>n</it><sub><it>&#945;&#946;</it></sub>(<it>w</it>). The finally remaining communities are classified as &#945;&#946; communities. Each segment in the &#945;&#946; communities significantly involves both an &#945; helix and a &#946; strand.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>This study was conceived and carried out by JI, who also developed the main part of the methodology. YS participated in some analyses. IK participated in discussions. KT participated in the coordination of the study. He also helped to write the manuscript. JH participated in developing the methodology, designed the study, and wrote the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>KI and JH were partly supported by <b>BIRD </b>of Japan Science and Technology Agency (<b>JST</b>). JH was also partly supported by New Energy and Industrial Technology Development Organization (<b>NEDO</b>).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Proteins. One thousand families for the molecular biologist</p>
            </title>
            <aug>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1992</pubdate>
            <volume>357</volume>
            <fpage>543</fpage>
            <lpage>544</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/357543a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">1608464</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Surprising similarities in structure comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Gibrat</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Madej</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <fpage>377</fpage>
            <lpage>385</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(96)80058-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">8804824</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A unifold, mesofold, and superfold model of protein fold use</p>
            </title>
            <aug>
               <au>
                  <snm>Coulson</snm>
                  <fnm>AFW</fnm>
               </au>
               <au>
                  <snm>Moult</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2002</pubdate>
            <volume>46</volume>
            <fpage>61</fpage>
            <lpage>71</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10011</pubid>
                  <pubid idtype="pmpid" link="fulltext">11746703</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The number of protein folds and their distribution over families in nature</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Fan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2004</pubdate>
            <volume>54</volume>
            <fpage>491</fpage>
            <lpage>499</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10514</pubid>
                  <pubid idtype="pmpid" link="fulltext">14747997</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>SCOP: a structural classification of proteins database for the investigation of sequences and structures</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>247</volume>
            <fpage>536</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7723011</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>CATH &#8211; a hierarchic classification of protein domain structures</p>
            </title>
            <aug>
               <au>
                  <snm>Orengo</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Michie</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Swindells</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Structure</source>
            <pubdate>1997</pubdate>
            <volume>5</volume>
            <fpage>1093</fpage>
            <lpage>1108</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0969-2126(97)00260-8</pubid>
                  <pubid idtype="pmpid">9309224</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Structural trees for protein superfamilies</p>
            </title>
            <aug>
               <au>
                  <snm>Efimov</snm>
                  <fnm>AV</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1997</pubdate>
            <volume>28</volume>
            <fpage>241</fpage>
            <lpage>260</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/(SICI)1097-0134(199706)28:2&lt;241::AID-PROT12>3.0.CO;2-I</pubid>
                  <pubid idtype="pmpid" link="fulltext">9188741</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Mapping the protein universe</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>273</volume>
            <fpage>595</fpage>
            <lpage>602</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.273.5275.595</pubid>
                  <pubid idtype="pmpid" link="fulltext">8662544</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Expanding protein universe and its origin from the biological Big Bang</p>
            </title>
            <aug>
               <au>
                  <snm>Dokholyan</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Shakhnovich</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Shakhnovich</snm>
                  <fnm>EI</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>14132</fpage>
            <lpage>14136</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137849</pubid>
                  <pubid idtype="pmpid" link="fulltext">12384571</pubid>
                  <pubid idtype="doi">10.1073/pnas.202497999</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A global representation of the protein fold space</p>
            </title>
            <aug>
               <au>
                  <snm>Hou</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sims</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S-H</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>2386</fpage>
            <lpage>2390</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151350</pubid>
                  <pubid idtype="pmpid" link="fulltext">12606708</pubid>
                  <pubid idtype="doi">10.1073/pnas.2628030100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Global mapping of the protein structure space and application in structure-based inference of protein function</p>
            </title>
            <aug>
               <au>
                  <snm>Hou</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jun</snm>
                  <fnm>S-R</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S-H</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>3651</fpage>
            <lpage>3656</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">548596</pubid>
                  <pubid idtype="pmpid" link="fulltext">15705717</pubid>
                  <pubid idtype="doi">10.1073/pnas.0409772102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Protein structure comparison by alignment of distance matrices</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1993</pubdate>
            <volume>233</volume>
            <fpage>123</fpage>
            <lpage>138</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1993.1489</pubid>
                  <pubid idtype="pmpid" link="fulltext">8377180</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Identification and classification of protein fold families</p>
            </title>
            <aug>
               <au>
                  <snm>Orengo</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Flores</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>1993</pubdate>
            <volume>6</volume>
            <fpage>485</fpage>
            <lpage>500</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/protein/6.5.485</pubid>
                  <pubid idtype="pmpid" link="fulltext">8415576</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Protein structure databases with new web services for structural biology and biomedical research</p>
            </title>
            <aug>
               <au>
                  <snm>Standley</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Kinjo</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Kinoshita</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Brief Bioinfo</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>276</fpage>
            <lpage>285</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/bib/bbn015</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Conformational classification of short backbone fragments in globular proteins and its use for coding backbone conformations</p>
            </title>
            <aug>
               <au>
                  <snm>Takahashi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Go</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Biophys Chem</source>
            <pubdate>1993</pubdate>
            <volume>47</volume>
            <fpage>163</fpage>
            <lpage>178</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0301-4622(93)85034-F</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Systematic detection of protein structural motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Tomii</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Pattern discovery in biomolecular data</source>
            <publisher>New York: Oxford University Press</publisher>
            <editor>Wang JTL, Shapiro BA, Shasha D</editor>
            <pubdate>1999</pubdate>
            <fpage>97</fpage>
            <lpage>110</lpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Local feature frequency profile: A method to measure structural similarity in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Choi</snm>
                  <fnm>IG</fnm>
               </au>
               <au>
                  <snm>Kwon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S-H</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>3797</fpage>
            <lpage>3802</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">374324</pubid>
                  <pubid idtype="pmpid" link="fulltext">14985506</pubid>
                  <pubid idtype="doi">10.1073/pnas.0308656100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Visualization of conformational distribution of short to medium size segments in globular proteins and identification of local structural motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Ikeda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tomii</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yokomizo</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mitomo</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Maruyama</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Higo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <fpage>1253</fpage>
            <lpage>1265</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2253271</pubid>
                  <pubid idtype="pmpid" link="fulltext">15802651</pubid>
                  <pubid idtype="doi">10.1110/ps.04956305</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Structural diversity of protein segments follows a power-law distribution</p>
            </title>
            <aug>
               <au>
                  <snm>Sawada</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Honda</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Biophys J</source>
            <pubdate>2006</pubdate>
            <volume>91</volume>
            <fpage>1213</fpage>
            <lpage>1223</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1518648</pubid>
                  <pubid idtype="pmpid" link="fulltext">16731566</pubid>
                  <pubid idtype="doi">10.1529/biophysj.105.076661</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Protein-segment universe exhibiting transitions at intermediate segment length in conformational subspaces</p>
            </title>
            <aug>
               <au>
                  <snm>Ikeda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hirokawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Higo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tomii</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>BMC Structural Biology</source>
            <pubdate>2008</pubdate>
            <volume>8</volume>
            <fpage>37</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2529298</pubid>
                  <pubid idtype="pmpid" link="fulltext">18700043</pubid>
                  <pubid idtype="doi">10.1186/1472-6807-8-37</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions</p>
            </title>
            <aug>
               <au>
                  <snm>Simons</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>Kooperberg</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>209</fpage>
            <lpage>225</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0959</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149153</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>De novo prediction of three-dimensional structures for major protein families</p>
            </title>
            <aug>
               <au>
                  <snm>Bonneau</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Strauss</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Rohl</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Chivian</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bradley</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Malmstr&#246;m</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>322</volume>
            <fpage>65</fpage>
            <lpage>78</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)00698-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">12215415</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>A reversible fragment assembly method for de novo protein structure prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Chikenji</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fujitsuka</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Takada</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Chem Phys</source>
            <pubdate>2003</pubdate>
            <volume>119</volume>
            <fpage>6895</fpage>
            <lpage>6903</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1063/1.1597474</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Lethality and centrality in protein networks</p>
            </title>
            <aug>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mason</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Barab&#225;si</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Oltvai</snm>
                  <fnm>ZN</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>411</volume>
            <fpage>41</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35075138</pubid>
                  <pubid idtype="pmpid" link="fulltext">11333967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Subnetwork hierarchies of biochemical pathways</p>
            </title>
            <aug>
               <au>
                  <snm>Holme</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huss</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>532</fpage>
            <lpage>538</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg033</pubid>
                  <pubid idtype="pmpid" link="fulltext">12611809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Functional cartography of complex metabolic networks</p>
            </title>
            <aug>
               <au>
                  <snm>Guimer&#224;</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Amaral</snm>
                  <fnm>LAN</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>433</volume>
            <fpage>895</fpage>
            <lpage>900</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2175124</pubid>
                  <pubid idtype="pmpid" link="fulltext">15729348</pubid>
                  <pubid idtype="doi">10.1038/nature03288</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Uncovering the overlapping community structure of complex net-works in nature and society</p>
            </title>
            <aug>
               <au>
                  <snm>Palla</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Der&#233;nyi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Farkas</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Vicsek</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>435</volume>
            <fpage>814</fpage>
            <lpage>818</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03607</pubid>
                  <pubid idtype="pmpid" link="fulltext">15944704</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Theoretical studies of protein folding</p>
            </title>
            <aug>
               <au>
                  <snm>Go</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Annu Rev Biophys Bioeng</source>
            <pubdate>1983</pubdate>
            <volume>12</volume>
            <fpage>183</fpage>
            <lpage>210</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bb.12.060183.001151</pubid>
                  <pubid idtype="pmpid" link="fulltext">6347038</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Randomness of the process of protein folding</p>
            </title>
            <aug>
               <au>
                  <snm>Go</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Abe</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Int J Pept Protein Res</source>
            <pubdate>1983</pubdate>
            <volume>22</volume>
            <fpage>622</fpage>
            <lpage>632</lpage>
            <xrefbib>
               <pubid idtype="pmpid">6654607</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Navigating the folding routes</p>
            </title>
            <aug>
               <au>
                  <snm>Wolynes</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Onuchic</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Thirumalai</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>267</volume>
            <fpage>1619</fpage>
            <lpage>1620</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.7886447</pubid>
                  <pubid idtype="pmpid" link="fulltext">7886447</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>A theoretical search for folding/unfolding nuclei in three-dimensional protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Galzitskaya</snm>
                  <fnm>OV</fnm>
               </au>
               <au>
                  <snm>Finkelstein</snm>
                  <fnm>AV</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>11229</fpage>
            <lpage>11304</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">18016</pubid>
                  <pubid idtype="pmpid" link="fulltext">10500159</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.20.11299</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>A simple model for calculating the kinetics of protein folding from three-dimensional structures</p>
            </title>
            <aug>
               <au>
                  <snm>Munoz</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Eaton</snm>
                  <fnm>WA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>11311</fpage>
            <lpage>11316</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">18030</pubid>
                  <pubid idtype="pmpid" link="fulltext">10500173</pubid>
                  <pubid idtype="doi">10.1073/pnas.96.20.11311</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>From folding theories to folding proteins: a review and assessment of simulation studies of protein folding and unfolding</p>
            </title>
            <aug>
               <au>
                  <snm>Shea</snm>
                  <fnm>J-E</fnm>
               </au>
               <au>
                  <snm>Brooks</snm>
                  <fnm>CL</fnm>
                  <suf>III</suf>
               </au>
            </aug>
            <source>Annu Rev Phys Chem</source>
            <pubdate>2001</pubdate>
            <volume>52</volume>
            <fpage>499</fpage>
            <lpage>535</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.physchem.52.1.499</pubid>
                  <pubid idtype="pmpid" link="fulltext">11326073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Roles of native topology and chain-length scaling in protein folding: A simulation study with a Go-like model</p>
            </title>
            <aug>
               <au>
                  <snm>Koga</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Takada</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>313</volume>
            <fpage>171</fpage>
            <lpage>180</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5037</pubid>
                  <pubid idtype="pmpid" link="fulltext">11601854</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>How the folding rate constant of simple, single-domain proteins depends on the number of native contacts</p>
            </title>
            <aug>
               <au>
                  <snm>Makarov</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Plaxco</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Metiu</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Porc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>3535</fpage>
            <lpage>3539</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.052713599</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Theory for the rate of contact formation in a polymer chain with local conformational transitions</p>
            </title>
            <aug>
               <au>
                  <snm>Zhou</snm>
                  <fnm>HX</fnm>
               </au>
            </aug>
            <source>J Chem Phys</source>
            <pubdate>2003</pubdate>
            <volume>118</volume>
            <fpage>2010</fpage>
            <lpage>2015</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1063/1.1531588</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Scrutinizing the squeezed exponential kinetics observed in the folding simulation of an off-lattice Go-like protein model</p>
            </title>
            <aug>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>HK</fnm>
               </au>
               <au>
                  <snm>Sasai</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Takano</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Chem Phys</source>
            <pubdate>2004</pubdate>
            <volume>307</volume>
            <fpage>259</fpage>
            <lpage>267</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.chemphys.2004.07.011</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Transition state of a SH3 domain detected with principle component analysis and a charge-neutralized all-atom protein model</p>
            </title>
            <aug>
               <au>
                  <snm>Mitomo</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>HK</fnm>
               </au>
               <au>
                  <snm>Ikeda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yamagishi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Higo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2006</pubdate>
            <volume>64</volume>
            <fpage>883</fpage>
            <lpage>894</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.21069</pubid>
                  <pubid idtype="pmpid" link="fulltext">16807919</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Conformational sampling of a 40-residue protein consisting of &#945; and &#946; secondary-structure elements in explicit solvent</p>
            </title>
            <aug>
               <au>
                  <snm>Ikebe</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kamiya</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Shindo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Higo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Chem Phys Lett</source>
            <pubdate>2007</pubdate>
            <volume>443</volume>
            <fpage>364</fpage>
            <lpage>368</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.cplett.2007.06.102</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Folding of the 25 residue Abeta(12&#8211;36) peptide in TFE/water: temperature-dependent transition from a funneled free-energy landscape to a rugged one</p>
            </title>
            <aug>
               <au>
                  <snm>Kamiya</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mitomo</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Shea</snm>
                  <fnm>J-E</fnm>
               </au>
               <au>
                  <snm>Higo</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Phys Chem B</source>
            <pubdate>2007</pubdate>
            <volume>111</volume>
            <fpage>5351</fpage>
            <lpage>5356</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/jp067075v</pubid>
                  <pubid idtype="pmpid" link="fulltext">17439167</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>A surprising simplicity to protein folding</p>
            </title>
            <aug>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>405</volume>
            <fpage>39</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35011000</pubid>
                  <pubid idtype="pmpid" link="fulltext">10811210</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Unification of the folding mechanisms of non-two-state and two-state proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Kamagata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Arai</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kuwajima</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>339</volume>
            <fpage>951</fpage>
            <lpage>965</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.04.015</pubid>
                  <pubid idtype="pmpid" link="fulltext">15165862</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Surprisingly high correlation between early and late stages in non-two-state protein folding</p>
            </title>
            <aug>
               <au>
                  <snm>Kamagata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kuwajima</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2006</pubdate>
            <volume>357</volume>
            <fpage>1647</fpage>
            <lpage>1654</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2006.01.072</pubid>
                  <pubid idtype="pmpid" link="fulltext">16490205</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Finding community structure in net-works using the eigenvectors of matrices</p>
            </title>
            <aug>
               <au>
                  <snm>Newman</snm>
                  <fnm>MEJ</fnm>
               </au>
            </aug>
            <source>Phys Rev E</source>
            <pubdate>2006</pubdate>
            <volume>74</volume>
            <fpage>036104</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1103/PhysRevE.74.036104</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Progress towards mapping the universe of protein folds</p>
            </title>
            <aug>
               <au>
                  <snm>Grant</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Orengo</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>GenomeBiology</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>107</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">416458</pubid>
                  <pubid idtype="pmpid" link="fulltext">15128436</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>The structure of the protein universe and genome evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Karev</snm>
                  <fnm>GP</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>218</fpage>
            <lpage>223</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01256</pubid>
                  <pubid idtype="pmpid" link="fulltext">12432406</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Protein Family and Fold Occurrence in Genomes: Power-law Behaviour and Evolutionary Model</p>
            </title>
            <aug>
               <au>
                  <snm>Qian</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Luscombe</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>313</volume>
            <fpage>673</fpage>
            <lpage>681</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5079</pubid>
                  <pubid idtype="pmpid" link="fulltext">11697896</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Emergence of scaling in random networks</p>
            </title>
            <aug>
               <au>
                  <snm>Barab&#225;si</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Albert</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <fpage>509</fpage>
            <lpage>512</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.286.5439.509</pubid>
                  <pubid idtype="pmpid" link="fulltext">10521342</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Fast algorithm for detecting community structure in networks</p>
            </title>
            <aug>
               <au>
                  <snm>Newman</snm>
                  <fnm>MEJ</fnm>
               </au>
               <au>
                  <snm>Girvan</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Phys Rev E</source>
            <pubdate>2004</pubdate>
            <volume>69</volume>
            <fpage>026113</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1103/PhysRevE.69.026113</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>The PDB is a covering set of small protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Kihara</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Skolnick</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>334</volume>
            <fpage>793</fpage>
            <lpage>802</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2003.10.027</pubid>
                  <pubid idtype="pmpid" link="fulltext">14636603</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>How Many Protein Folding Motifs are There?</p>
            </title>
            <aug>
               <au>
                  <snm>Crippen</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Maiorov</snm>
                  <fnm>VN</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>252</volume>
            <fpage>144</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1995.0481</pubid>
                  <pubid idtype="pmpid" link="fulltext">7666426</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>More than the sum of their parts: on the evolution of proteins from peptides</p>
            </title>
            <aug>
               <au>
                  <snm>Soding</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lupas</snm>
                  <fnm>AN</fnm>
               </au>
            </aug>
            <source>BioEssay</source>
            <pubdate>2003</pubdate>
            <volume>25</volume>
            <fpage>837</fpage>
            <lpage>846</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/bies.10321</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>A graph spectral analysis of the structural similarity of protein chains</p>
            </title>
            <aug>
               <au>
                  <snm>Krishnadev</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Brinda</snm>
                  <fnm>KV</fnm>
               </au>
               <au>
                  <snm>Vishveshwara</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2005</pubdate>
            <volume>61</volume>
            <fpage>152</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20532</pubid>
                  <pubid idtype="pmpid" link="fulltext">16080147</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>The protein data bank</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Westbrook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gilliland</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bhat</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Weissig</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shindyalov</snm>
                  <fnm>IN</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>PE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>235</fpage>
            <lpage>242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102472</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592235</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.235</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Least squares quantization in PCM</p>
            </title>
            <aug>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>SP</fnm>
               </au>
            </aug>
            <source>IEEE Transactions on Information Theory</source>
            <pubdate>1982</pubdate>
            <volume>28</volume>
            <fpage>129</fpage>
            <lpage>137</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/TIT.1982.1056489</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Knowledge-based protein secondary structure assignment</p>
            </title>
            <aug>
               <au>
                  <snm>Frishman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Argos</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1995</pubdate>
            <volume>23</volume>
            <fpage>566</fpage>
            <lpage>579</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.340230412</pubid>
                  <pubid idtype="pmpid">8749853</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
