<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-106</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Discovering structural motifs using a structural alphabet: Application to magnesium-binding sites</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Dudev</snm>
               <fnm>Minko</fnm>
               <insr iid="I1"/>
               <email>frater_ia@yahoo.com</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Lim</snm>
               <fnm>Carmay</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>carmay@gate.sinica.edu.tw</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Institute of Biomedical Sciences, Academia Sinica, Taipei 115, Taiwan</p>
            </ins>
            <ins id="I2">
               <p>Department of Chemistry, National Tsing-Hua University, Hsinchu 300, Taiwan</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>106</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/106</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17389049</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-106</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>27</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>28</day>
               <month>3</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>3</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Dudev and Lim; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>For many metalloproteins, sequence motifs characteristic of metal-binding sites have not been found or are so short that they would not be expected to be metal-specific. Striking examples of such metalloproteins are those containing Mg<sup>2+</sup>, one of the most versatile metal cofactors in cellular biochemistry. Even when Mg<sup>2+</sup>-proteins share insufficient sequence homology to identify Mg<sup>2+</sup>-specific sequence motifs, they may still share similarity in the Mg<sup>2+</sup>-binding site structure. However, no structural motifs characteristic of Mg<sup>2+</sup>-binding sites have been reported. Thus, our aims are (i) to develop a general method for discovering structural patterns/motifs characteristic of ligand-binding sites, given the 3D protein structures, and (ii) to apply it to Mg<sup>2+</sup>-proteins sharing &lt;30% sequence identity. Our motif discovery method employs structural alphabet encoding to convert 3D structures to the corresponding 1D structural letter sequences, where the Mg<sup>2+</sup>-structural motifs are identified as recurring structural patterns.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The structural alphabet-based motif discovery method has revealed the structural preference of Mg<sup>2+</sup>-binding sites for certain local/secondary structures: compared to all residues in the Mg<sup>2+</sup>-proteins, both first and second-shell Mg<sup>2+</sup>-ligands prefer loops to helices. Even when the Mg<sup>2+</sup>-proteins share no significant sequence homology, some of them share a similar Mg<sup>2+</sup>-binding site structure: 4 Mg<sup>2+</sup>-structural motifs, comprising 21% of the binding sites, were found. In particular, one of the Mg<sup>2+</sup>-structural motifs found maps to a specific functional group, namely, hydrolases. Furthermore, 2 of the motifs were not found in non metalloproteins or in Ca<sup>2+</sup>-binding proteins. The structural motifs discovered thus capture some essential biochemical and/or evolutionary properties, and hence may be useful for discovering proteins where Mg<sup>2+ </sup>plays an important biological role.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The structural motif discovery method presented herein is general and can be applied to any set of proteins with known 3D structures. This new method is timely considering the increasing number of structures for proteins with unknown function that are being solved from structural genomics incentives. For such proteins, which share no significant sequence homology to proteins of known function, the presence of a structural motif that maps to a specific protein function in the structure would suggest likely active/binding sites and a particular biological function.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Magnesium is one of the most versatile metal cofactors in cellular biochemistry, serving both intra and extracellular, catalytic and/or structural roles <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. It is used to stabilize a variety of protein structures; e.g., the interface of the ribonucleotide reductase subunits <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. It is also used to stabilize nucleic acids by alleviating electrostatic repulsion between negatively charged phosphates. Furthermore, Mg<sup>2+</sup>, together with Ca<sup>2+</sup>, stabilize biological membranes by charge neutralization after binding to the carboxylated and phosphorylated headgroups of lipids. It also activates enzymes that regulate the biochemistry of nucleic acids such as restriction nucleases, ligases, and topoisomerases, and is essential for the fidelity of DNA replication <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Divalent Mg<sup>2+ </sup>is a "hard" ion and prefers "hard" ligands of low polarizability like oxygen. It tends to bind directly to the amino acid residues, primarily to the Asp/Glu carboxylic side chains, followed by the Asn/Gln side chains or peptide backbone carbonyl groups <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The rest of the metal coordination sphere, which is usually octahedral, is complemented by water ligand(s).</p>
         <p>Unlike Zn<sup>2+ </sup>and Ca<sup>2+</sup>-binding sites, only a few, relatively short, sequence motifs have been discovered for Mg<sup>2+ </sup>proteins with close sequence homology. These include the -NA<b>D</b>F<b>D</b>G<b>D</b>- motif, found in different RNA polymerases, DNA Pol I and HIV reverse transcriptase, and the -YX<b>DD- </b>or -LX<b>DD</b>- motifs in reverse transcriptase and telomerase, where residues in bold are the Mg<sup>2+ </sup>ligands <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. As the known Mg<sup>2+ </sup>sequence motifs are short, they could easily be found in other nonMg<sup>2+</sup>-proteins and would <it>not </it>be expected to be Mg<sup>2+</sup>-specific. Interestingly, some homology in the 3D structure of the Mg<sup>2+</sup>-binding sites has been observed for certain classes of enzymes such as restriction enzymes, bacterial and viral RNase H domains, and viral integrases <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. However, systematic studies of the structural preference/conservation of Mg<sup>2+</sup>-binding sites in nonhomologous proteins have not been reported; hence, no structural motifs of the Mg<sup>2+</sup>-binding sites have been extracted.</p>
         <p>The aims in this work are to address the following intriguing questions: (1) Do Mg<sup>2+</sup>-binding sites exhibit any preference for certain local/secondary structures? If so, which types of local/secondary structures are favored and which are disfavored? (2) Even when the Mg<sup>2+</sup>-proteins share no significant sequence homology, do they share a similar Mg<sup>2+</sup>-binding site structure? (3) If structural motifs of the Mg<sup>2+</sup>-binding sites exist, do they map to specific protein functions? (4) Are the structural motifs Mg<sup>2+</sup>-specific? In particular, are they found in proteins that do not bind metal ions or in proteins that bind Ca<sup>2+</sup>, which like Mg<sup>2+</sup>, is also a divalent "hard" ion, binding preferentially to "hard" oxygen-containing ligands?</p>
         <p>To address the aforementioned questions, we have developed a general strategy for discovering 3D motifs that are hidden in the local structure of the active/binding site, based on the fact that the local structure is generally more evolutionary conserved than the amino acid sequence <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The 3D motifs of the metal-binding sites were obtained by encoding the 3D representation based on Cartesian coordinates into a 1D representation based on a 16-letter structural alphabet <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. The structural alphabet represents recurring short structural prototypes and has been used to (i) compare/analyze 3D structures <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, (ii) predict protein 3D structures from amino acid sequences <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B11">11</abbr></abbrgrp>, (iii) reconstruct the protein backbone <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, and (iv) model loops <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. However, it has not been used to discover structural motifs of metal/ligand-binding sites in proteins. First, the structural-alphabet based motif discovery approach was validated by using it to "rediscover" the structural motif of Cys<sub>4 </sub>Zn-finger domains, which are known to adopt a specific structure. Next, it was used to discover structural motifs of Mg<sup>2+</sup>-binding sites in a set of nonredundant Mg<sup>2+</sup>-proteins sharing &lt;30% sequence identity. The results reveal clear trends in the structural composition of Mg<sup>2+</sup>-binding sites, 4 Mg<sup>2+</sup>-structural motifs, and important relationships between these motifs and other features of the proteins. The specificity of the structural motifs discovered for certain Mg<sup>2+</sup>-binding sites was assessed by determining their occurrence in a set of nonredundant non-metal containing protein structures and in a set of nonredundant Ca<sup>2+</sup>-bound protein structures.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Validation against Proteins with known Structural Motifs</p>
            </st>
            <p>To test the structural alphabet-based strategy for discovering metal-binding site structural motifs, a database of 42 structural zinc sites from 29 proteins in previous work <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> was searched for proteins containing the <b>C</b>(2)<b>C</b>(13&#8211;15)<b>C</b>(2)<b>C </b>sequence motif, where the number in parentheses indicates the number of amino acid residues separating the conserved Zn-binding cysteines. Proteins with such a sequence motif belong to the Zn-finger family of the nuclear receptor type, having a Cys<sub>4 </sub>Zn-binding site <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, which is known to adopt a specific structure. Each of the Zn-proteins containing the <b>C</b>(2)<b>C</b>(13&#8211;15)<b>C</b>(2)<b>C </b>sequence motif was represented by a 1D structural alphabet, as described in Methods and illustrated in Figure <figr fid="F1">1</figr>. All of these proteins were found to possess a <it>f(2)o(13&#8211;15)f(2)m </it>structural motif of the Zn-binding site (see Figure <figr fid="F1">1</figr>). This shows that the structural-alphabet based approach for discovering new structural motifs seems promising.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Zn-binding site structural motifs derived from the structural alphabet representation of 3 Zn-finger proteins</p>
               </caption>
               <text>
                  <p><b>Zn-binding site structural motifs derived from the structural alphabet representation of 3 Zn-finger proteins</b>. For each protein, the PDB entry and chain is given, followed below by its amino acid sequence (in capital letters), aligned with the corresponding structural alphabet representation (lower-case letters); '<it>Z</it>', means a letter cannot be assigned to this residue (see Methods). Zn<sup>2+</sup>-binding residues are underlined and in bold. Only the first 30 amino acid residues are shown. The C<sub>&#945; </sub>root-mean-square deviation RMSD of 1LAT and 2NLL from 1HCQ are 1.73 and 1.33 &#197;, respectively, whereas that of 1LAT from 2NLL is 1.25 &#197;.</p>
               </text>
               <graphic file="1471-2105-8-106-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Structural Preference of Mg<sup>2+</sup>-Binding Sites</p>
            </st>
            <p>Although the 70 Mg<sup>2+</sup>-proteins used herein share &lt;30% sequence identity, do their Mg<sup>2+</sup>-binding sites prefer certain local structures? To answer this question, the 3D structure of each of the 70 nonredundant Mg<sup>2+ </sup>proteins was represented by a 16-letter structural alphabet (see Methods and Additional file <supplr sid="S1">1</supplr>), and the frequencies of the letters in all the first-and second-shells as well as in the entire Mg<sup>2+ </sup>dataset were compared (see Figure <figr fid="F2">2</figr>). The results reveal a clear preference towards certain types of local structures in the Mg<sup>2+</sup>-binding sites. The '<it>b'</it>, <it>'d'</it>, <it>'f'</it>, and <it>'h' </it>frequencies of first-shell Mg<sup>2+</sup>-ligands and the '<it>d', 'e', 'f' </it>and <it>'k' </it>frequencies of second-shell Mg<sup>2+</sup>-ligands are statistically significantly higher than the respective frequencies of all the amino acid residues in the dataset (see Table <tblr tid="T1">1</tblr>). Both first and second-shell Mg<sup>2+</sup>-ligands favor the '<it>d</it>' and '<it>f' </it>structures. Furthermore, the first-shell (but not the second-shell) Mg<sup>2+</sup>-ligands <it>strongly </it>prefer the local structure <it>'h'</it>, whose frequency of first-shell ligands is 5.3-fold greater than that of all residues in Mg<sup>2+ </sup>proteins. However, compared to all amino acid residues in the Mg<sup>2+ </sup>proteins, both first and second-shell Mg<sup>2+</sup>-ligands disfavor certain local protein structures such as the '<it>c</it>' and '<it>m</it>' structures: The '<it>c', 'i', 'm'</it>, and <it>'p' </it>frequencies of first-shell Mg<sup>2+</sup>-ligands and the '<it>a', 'c', 'm' </it>and <it>'o' </it>frequencies of second-shell Mg<sup>2+</sup>-ligands are statistically significantly lower than the respective frequencies of all the amino acid residues in the dataset (see Table <tblr tid="T1">1</tblr>).</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>The Mg<sup>2+</sup>-dataset containing 77 metal-binding sites in 70 nonredundant Mg<sup>2+</sup>-proteins. A table listing the PDB entries, protein description, native metal-cofactors (if known), EC code, metal-bound amino acid residues, and first-shell structural representation of the 70 nonredundant Mg<sup>2+</sup>-proteins.</p>
               </text>
               <file name="1471-2105-8-106-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>The letter and secondary structural element (SSE) frequency distributions and 2-sample T-tests of first-and second-shell amino acid residues vs. all amino acid residues in the Mg<sup>2+</sup>-proteins</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>1<sup>st</sup>-shell vs. all residues</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>2<sup>nd</sup>-shell vs. all residues</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Letter, <it>x</it><sup>a</sup></p>
                     </c>
                     <c ca="center">
                        <p>&#957;<sub><it>x</it>,1</sub>/&#957;<sub><it>x</it>, all</sub><sup>b</sup></p>
                     </c>
                     <c ca="center">
                        <p>T-test<sup>c</sup></p>
                     </c>
                     <c ca="center">
                        <p>p-value<sup>c,d</sup></p>
                     </c>
                     <c ca="center">
                        <p>&#957;<sub><it>x</it>,2</sub>/&#957;<sub><it>x</it>, all</sub><sup>e</sup></p>
                     </c>
                     <c ca="center">
                        <p>T-test<sup>c</sup></p>
                     </c>
                     <c ca="center">
                        <p>p-value<sup>c,d</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>a</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.47</p>
                     </c>
                     <c ca="center">
                        <p>1.4037</p>
                     </c>
                     <c ca="center">
                        <p>0.0802</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>2.4731</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0067</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>b</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.86</p>
                     </c>
                     <c ca="center">
                        <p>2.7909</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0027</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.20</p>
                     </c>
                     <c ca="center">
                        <p>1.2200</p>
                     </c>
                     <c ca="center">
                        <p>0.1113</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>c</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.56</p>
                     </c>
                     <c ca="center">
                        <p>2.0160</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0219</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>4.3510</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>&lt;0.0001</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>d</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.23</p>
                     </c>
                     <c ca="center">
                        <p>1.7376</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0412</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.23</p>
                     </c>
                     <c ca="center">
                        <p>3.1829</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0008</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>e</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.46</p>
                     </c>
                     <c ca="center">
                        <p>1.0111</p>
                     </c>
                     <c ca="center">
                        <p>0.1560</p>
                     </c>
                     <c ca="center">
                        <p>2.03</p>
                     </c>
                     <c ca="center">
                        <p>4.1825</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>&lt;0.0001</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>f</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.47</p>
                     </c>
                     <c ca="center">
                        <p>1.9389</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0263</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.70</p>
                     </c>
                     <c ca="center">
                        <p>5.4060</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>&lt;0.0001</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>g</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.15</p>
                     </c>
                     <c ca="center">
                        <p>0.2494</p>
                     </c>
                     <c ca="center">
                        <p>0.4015</p>
                     </c>
                     <c ca="center">
                        <p>1.18</p>
                     </c>
                     <c ca="center">
                        <p>0.5381</p>
                     </c>
                     <c ca="center">
                        <p>0.2953</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>h</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5.29</p>
                     </c>
                     <c ca="center">
                        <p>9.3752</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>&lt; 0.0001</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.19</p>
                     </c>
                     <c ca="center">
                        <p>0.7921</p>
                     </c>
                     <c ca="center">
                        <p>0.2142</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>i</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>1.8928</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0292</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.34</p>
                     </c>
                     <c ca="center">
                        <p>1.1910</p>
                     </c>
                     <c ca="center">
                        <p>0.1168</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>j</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2.21</p>
                     </c>
                     <c ca="center">
                        <p>1.6156</p>
                     </c>
                     <c ca="center">
                        <p>0.0531</p>
                     </c>
                     <c ca="center">
                        <p>1.54</p>
                     </c>
                     <c ca="center">
                        <p>1.3401</p>
                     </c>
                     <c ca="center">
                        <p>0.0901</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>k</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.40</p>
                     </c>
                     <c ca="center">
                        <p>1.4992</p>
                     </c>
                     <c ca="center">
                        <p>0.0669</p>
                     </c>
                     <c ca="center">
                        <p>1.60</p>
                     </c>
                     <c ca="center">
                        <p>4.1820</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>&lt;0.0001</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>l</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                     <c ca="center">
                        <p>0.9209</p>
                     </c>
                     <c ca="center">
                        <p>0.1786</p>
                     </c>
                     <c ca="center">
                        <p>1.08</p>
                     </c>
                     <c ca="center">
                        <p>0.5978</p>
                     </c>
                     <c ca="center">
                        <p>0.275</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>m</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.52</p>
                     </c>
                     <c ca="center">
                        <p>2.9377</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0017</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.74</p>
                     </c>
                     <c ca="center">
                        <p>5.2192</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>&lt;0.0001</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>n</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>1.1306</p>
                     </c>
                     <c ca="center">
                        <p>0.1291</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>0.5208</p>
                     </c>
                     <c ca="center">
                        <p>0.3013</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>o</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.52</p>
                     </c>
                     <c ca="center">
                        <p>1.4066</p>
                     </c>
                     <c ca="center">
                        <p>0.0798</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>3.3637</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0004</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>p</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>3.1174</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0009</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>1.3204</p>
                     </c>
                     <c ca="center">
                        <p>0.0934</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>SSE, <it>x</it></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Loop</p>
                     </c>
                     <c ca="center">
                        <p>1.56</p>
                     </c>
                     <c ca="center">
                        <p>2.5575</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0053</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1.47</p>
                     </c>
                     <c ca="center">
                        <p>2.1874</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0144</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>&#946;-strands</p>
                     </c>
                     <c ca="center">
                        <p>1.30</p>
                     </c>
                     <c ca="center">
                        <p>1.0780</p>
                     </c>
                     <c ca="center">
                        <p>0.1405</p>
                     </c>
                     <c ca="center">
                        <p>1.34</p>
                     </c>
                     <c ca="center">
                        <p>1.2170</p>
                     </c>
                     <c ca="center">
                        <p>0.1118</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>&#945;-helices</p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                     <c ca="center">
                        <p>3.6454</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0002</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>3.3621</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.0004</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a</sup>16-letter structural alphabet defined by de Brevern and co-workers (see Methods and original reference) [6]. <sup>b</sup>The ratio of the letter/SSE '<it>x</it>' frequency of first-shell amino acid residues to that of all amino acid residues in the 70 Mg<sup>2+ </sup>proteins. <sup>c</sup>The statistical analyses were carried out using the package, SAS/STAT version 8 (SAS Institute, NC). <sup>d</sup>P-values &lt;0.05 are highlighted in bold. <sup>e</sup>The ratio of the letter/SSE '<it>x</it>' frequency of second-shell amino acid residues to that of all amino acid residues in the 70 Mg<sup>2+ </sup>proteins.</p>
               </tblfn>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The percentage letter frequency distributions of first-shell amino acid residues (gray), second-shell amino acid residues (white), and all amino acid residues (black) in the Mg<sup>2+</sup>-proteins</p>
               </caption>
               <text>
                  <p><b>The percentage letter frequency distributions of first-shell amino acid residues (gray), second-shell amino acid residues (white), and all amino acid residues (black) in the Mg<sup>2+</sup>-proteins</b>. There is a total of 25,406 amino acid residues in the Mg<sup>2+</sup>-proteins, of which 250 are in the first shell, while 898 are in the second shell</p>
               </text>
               <graphic file="1471-2105-8-106-2"/>
            </fig>
            <p>To relate the observed bias of the first-shell Mg<sup>2+</sup>-ligands for <it>certain </it>structures to standard regular and irregular secondary structures, the percentage frequency distribution of first-shell, second-shell, and all amino acid residues that are found in &#945;-helices, &#946;-strands, or loops in the Mg<sup>2+</sup>-proteins according to the secondary structure information in the Protein Data Bank <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> (PDB) files were computed (see Figure <figr fid="F3">3</figr>). The <it>loop </it>occurrence frequency of the first or second-shell Mg<sup>2+</sup>-residues (47&#8211;50%) is significantly higher than that of all residues (~32%) with p-values &#8804; 0.014 (see Table <tblr tid="T1">1</tblr>). However the <it>&#946;-sheet </it>occurrence frequency of the first or second-shell Mg<sup>2+</sup>-residues (~29%) is <it>not </it>significantly higher than that of all residues (~22%). In contrast, the &#945;-<it>helix </it>occurrence frequency of the first or second shell Mg<sup>2+</sup>-residues (22&#8211;23%) is nearly half of the respective frequency of all residues (~46%) with p-values &#8804; 0.0004.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>The percentage secondary structure frequency distributions of first-shell amino acid residues (gray), second-shell amino acid residues (white), and all amino acid residues (black) in the Mg<sup>2+</sup>-proteins</p>
               </caption>
               <text>
                  <p><b>The percentage secondary structure frequency distributions of first-shell amino acid residues (gray), second-shell amino acid residues (white), and all amino acid residues (black) in the Mg<sup>2+</sup>-proteins</b>. The amino acid residues found in &#945;-helices, &#946;-strands, or loops are according to the secondary structure information in the PDB files.</p>
               </text>
               <graphic file="1471-2105-8-106-3"/>
            </fig>
            <p>In summary, the Mg<sup>2+</sup>-binding sites generally prefer certain local structures: compared to all amino acid residues in the Mg<sup>2+ </sup>proteins, both first and second-shell ligands tend to prefer loops to helices. This may be due to the need to position not only the first and second-shell ligands, but also the helix dipole, in a proper orientation for metal binding.</p>
         </sec>
         <sec>
            <st>
               <p>Structural Motifs of Mg<sup>2+</sup>-Binding Sites</p>
            </st>
            <p>Even when the Mg<sup>2+</sup>-proteins share no significant sequence homology (&lt;30% sequence identity), do any of them share a common structure of the metal-binding site? Such structural motifs are defined in this work to exist if 3 or more Mg<sup>2+</sup>-binding sites have the same first-shell letters and similar interletter spacing (see Methods and Additional file <supplr sid="S1">1</supplr>). These structural motifs are listed in Table <tblr tid="T2">2</tblr> and illustrated in Figure <figr fid="F4">4</figr>, while first-shell structural patterns that are common to only 2 Mg<sup>2+</sup>-binding sites are listed in Additional file <supplr sid="S2">2</supplr>. For the first shell, 4 structural motifs, representing about a fifth (16/77 or 21%) of all Mg<sup>2+</sup>-binding sites, were discovered. All 4 motifs occur in proteins whose functions are either Mg<sup>2+</sup>-dependent or whose native co-factors are Mg<sup>2+ </sup>according to UniProt and/or the literature. Consistent with the above finding that the <it>'h' </it>structure is preferred by the first-shell Mg<sup>2+</sup>-ligands, it is in the middle of all 4 motifs and the partial motif '<it>f(1&#8211;2)h</it>' accounts for half of the Mg<sup>2+</sup>-proteins with structural motifs. For the second shell, too many residues define the Mg<sup>2+</sup>-binding site; hence not enough Mg<sup>2+</sup>-binding sites possess the same second-shell letters and similar interletter spacing. However, 5 partial motifs for the second shell were found: These are <it>f(1)lm, kl(0&#8211;1)m, d(1&#8211;2)ff, d(1)e(1)i(0&#8211;5)l, f(1)l(18&#8211;25)d</it>, with an occurrence frequency of 21, 12, 11, 8, and 6%, respectively.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>1<sup>st</sup>-shell patterns common to two Mg<sup>2+</sup>-proteins. A table listing 1<sup>st</sup>-shell structural patterns that is common to only 2 Mg<sup>2+</sup>-binding sites.</p>
               </text>
               <file name="1471-2105-8-106-S2.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>1<sup>st</sup>-shell structural motifs in Mg<sup>2+</sup>-proteins</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <it>Motif</it>
                           <sup>
                              <it>a</it>
                           </sup>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>PDB code</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Mg<sup>2+ </sup>-Ligands</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>CATH number</it>
                           <sup>
                              <it>b</it>
                           </sup>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Functional Group</it>
                           <sup>
                              <it>c</it>
                           </sup>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>EC code</it>
                           <sup>
                              <it>d</it>
                           </sup>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>e(24&#8211;47)h(24)k</p>
                     </c>
                     <c ca="left">
                        <p>1SJC</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>189</sup>, E<sup>214</sup>, D<sup>239</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.20.20.120</p>
                     </c>
                     <c ca="left">
                        <p>Lyase<sup>e</sup>, Isomerase<sup>f</sup></p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1TKK</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>191</sup>, E<sup>219</sup>, D<sup>244</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.20.20.120</p>
                     </c>
                     <c ca="left">
                        <p>Isomerase<sup>f</sup></p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>2AKZ</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>244</sup>, E<sup>292</sup>, D<sup>317</sup></p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                     <c ca="left">
                        <p>Lyase<sup>e</sup></p>
                     </c>
                     <c ca="left">
                        <p>4.2.1.11</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>f(1)h(109&#8211;349)b</p>
                     </c>
                     <c ca="left">
                        <p>1O08</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>1008</sup>, D<sup>1010</sup>, D<sup>1170</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.40.50.1000</p>
                     </c>
                     <c ca="left">
                        <p>Isomerase<sup>f</sup></p>
                     </c>
                     <c ca="left">
                        <p>5.4.2.6</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1U7P</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>11</sup>, D<sup>13</sup>, D<sup>123</sup></p>
                     </c>
                     <c ca="left">
                        <p>NYC</p>
                     </c>
                     <c ca="left">
                        <p>Hydrolase<sup>g</sup></p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1WPG</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>351</sup>, T<sup>353</sup>, D<sup>703</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.40.50.1000</p>
                     </c>
                     <c ca="left">
                        <p>Hydrolase<sup>g</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.6.3.8</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>2B82</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>44</sup>, D<sup>46</sup>, D<sup>167</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.40.50.1000</p>
                     </c>
                     <c ca="left">
                        <p>Hydrolase<sup>g</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.1.3.2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>2C4N</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>9</sup>, D<sup>11</sup>, D<sup>201</sup></p>
                     </c>
                     <c ca="left">
                        <p>NYC</p>
                     </c>
                     <c ca="left">
                        <p>Hydrolase<sup>g</sup></p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>f(2)h(126&#8211;158)m</p>
                     </c>
                     <c ca="left">
                        <p>1KA1</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>142</sup>, D<sup>145</sup>, D<sup>294</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.30.540.10</p>
                     </c>
                     <c ca="left">
                        <p>Hydrolase<sup>g</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.1.3.7</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1NUY</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>1118</sup>, D<sup>1121</sup>, E<sup>1280</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.30.540.10+ 3.40.190.80</p>
                     </c>
                     <c ca="left">
                        <p>Hydrolase<sup>g</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.1.3.11</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>2BJI</p>
                     </c>
                     <c ca="left">
                        <p>E<sup>1090</sup>, D<sup>1093</sup>, D<sup>1220</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.30.540.10+ 3.40.190.80</p>
                     </c>
                     <c ca="left">
                        <p>Hydrolase<sup>g</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.1.3.25</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>k(26&#8211;29)h(1)a</p>
                     </c>
                     <c ca="left">
                        <p>1ITZ</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>168</sup>, N<sup>198</sup>, I<sup>200</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.40.50.970</p>
                     </c>
                     <c ca="left">
                        <p>Transferase<sup>h</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.2.1.1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1POX</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>447</sup>, N<sup>474</sup>, Q<sup>476</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.40.50.970+ 3.40.50.1220</p>
                     </c>
                     <c ca="left">
                        <p>Oxidoreductase<sup>i</sup></p>
                     </c>
                     <c ca="left">
                        <p>1.2.3.3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1UMD</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>175</sup>, N<sup>204</sup>, Y<sup>206</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.40.50.970</p>
                     </c>
                     <c ca="left">
                        <p>Oxidoreductase<sup>i</sup></p>
                     </c>
                     <c ca="left">
                        <p>1.2.4.4</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1ZPD</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>440</sup>, N<sup>467</sup>, G<sup>469</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.40.50.970</p>
                     </c>
                     <c ca="left">
                        <p>Lyase<sup>e</sup></p>
                     </c>
                     <c ca="left">
                        <p>4.1.1.1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>2C3M</p>
                     </c>
                     <c ca="left">
                        <p>D<sup>963</sup>, T<sup>991</sup>, V<sup>993</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.40.50.970</p>
                     </c>
                     <c ca="left">
                        <p>Oxidoreductase<sup>i</sup></p>
                     </c>
                     <c ca="left">
                        <p>1.2.7.1</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>a</sup>The number in parentheses indicates the number of residues separating the letters corresponding to the Mg<sup>2+</sup>-bound residues. <sup>b</sup>The CATH code of the domain containing the Mg<sup>2+</sup>-ligands; a dash implies that no domain could be assigned to the PDB entry, while NYC means the protein has not yet been chopped. <sup>c</sup>The functional group from the PDB header. <sup>d</sup>The enzyme class from PDBsum [25]; a dash means no EC code was found. <sup>e</sup>Lyases (EC4---) catalyze C-C/O/N and other bond cleavage; e.g., RCOCOOH &#8594; RCOH + CO<sub>2</sub>. <sup>f</sup>Isomerases (EC5---) catalyze geometric changes within a molecule. <sup>g</sup>Hydrolases (EC3---) catalyze hydrolytic bond cleavage: AB + H<sub>2</sub>O &#8594; AOH + BH. <sup>h</sup>Transferases (EC2---) catalyze AB + C &#8594; A + BC. <sup>i</sup>Oxidoreductases (EC1---) catalyze oxido-reductions: AH + B &#8594; A + BH (reduced) and A + O &#8594; AO (oxidized).</p>
               </tblfn>
            </tbl>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>The conserved local structures of the 4 Mg<sup>2+</sup>-structural motifs</p>
               </caption>
               <text>
                  <p><b>The conserved local structures of the 4 Mg<sup>2+</sup>-structural motifs</b>. (a) <it>e(24&#8211;47)h(24)k</it>, (b) <it>f(1)h(109&#8211;349)b</it>, (c) <it>f(2)h(126&#8211;158)m</it>, and (d) <it>k(26&#8211;29)h(1)a</it>.</p>
               </text>
               <graphic file="1471-2105-8-106-4"/>
            </fig>
            <p>Each of the 4 motifs in Table <tblr tid="T2">2</tblr> is found in proteins containing Mg<sup>2+</sup>-binding domains belonging to the same superfamily. This is evidenced by the fact that proteins with the same Mg<sup>2+</sup>-structural motif have Mg<sup>2+</sup>-binding domains belonging to the same superfamily with the same CATH numbers (Table <tblr tid="T2">2</tblr>), implying structurally homologous domains. For example, all 3 proteins with the <it>f(2)h(126&#8211;158)m </it>motif possess in common a Mg<sup>2+</sup>-binding domain belonging to the fructose-1,6-bisphosphatase, subunit A, domain 1 superfamily (CATH number 3.30.540.10). Likewise, all 5 proteins with the <it>k(26&#8211;29)h(1)a </it>motif possess Mg<sup>2+</sup>-binding domains with the same CATH number, 3.40.50.970. The fact that the motifs are found in structurally homologous Mg<sup>2+</sup>-binding domains further supports the use of the structural alphabet to discover motifs.</p>
            <p>The first-shell motifs discovered herein can also help to uncover relationships between proteins with unassigned CATH numbers. For example, 2 of the 3 proteins with the <it>e(24&#8211;47)h(24)k </it>motif (1SJC and 1TKK) possess Mg<sup>2+</sup>-binding domains pertaining to the enolase superfamily (CATH number 3.20.20.120), whereas the third protein (2AKZ) has not yet been assigned a domain and therefore has no CATH number. Although the n-acylamino acid racemase (1SJC) and gamma enolase (2AKZ) proteins do not share significant sequence homology (only 15.4% identity) and overall structure similarity (protein backbone rmsd = 17.5 &#197;), they possess similar Mg<sup>2+</sup>-binding site structures (backbone rmsd of the first-shell letters = 0.5 &#197;), as shown in Figure <figr fid="F5">5</figr>. In analogy, 3 of the 5 proteins with the <it>f(1)h(109&#8211;349)b </it>motif (1O08, 1WPG, and 2B82) possess Mg<sup>2+</sup>-binding domains with the same CATH number (3.40.50.1000), whereas the other 2 proteins (1U7P and 2C4N) have not yet been chopped into domains and therefore have not been assigned CATH numbers. The results in Table <tblr tid="T2">2</tblr> predict that the Mg<sup>2+</sup>-dependent phosphatase (1U7P) and NagD (2C4N) proteins are likely to possess Mg<sup>2+</sup>-binding domains that are structurally homologous to those assigned with the CATH number 3.40.50.1000.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>The conserved binding site of 2 nonhomologous Mg<sup>2+</sup>-proteins</p>
               </caption>
               <text>
                  <p><b>The conserved binding site of 2 nonhomologous Mg<sup>2+</sup>-proteins</b>. (a) Cartoon diagram of the metal-binding domain in N-acylamino acid racemase (1SJC). (b) Cartoon diagram of the metal-binding domain in gamma enolase (2AKZ). (c) Superposition of the first-shell structural letters of 1SJC (blue) and 2AKZ (yellow).</p>
               </text>
               <graphic file="1471-2105-8-106-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Relation between Mg<sup>2+</sup>-Structural Motifs and PROSITE Sequence Motifs</p>
            </st>
            <p>To see if any of the Mg<sup>2+</sup>-proteins containing structural motifs match sequence motifs stored in the PROSITE database <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, the sequences of the proteins listed in Table <tblr tid="T2">2</tblr> were scanned for the occurrence of PROSITE sequence motifs. None of the proteins match any PROSITE sequence motifs encompassing residues involved in Mg<sup>2+</sup>-binding. However, the halotolerance protein hal 2 (1KA1) containing the <it>f(2)h(126&#8211;158)m </it>motif matched 2 inositol monophosphatase family signatures (PROSITE PDOC00547) containing conserved metal-binding residues. This supports the '<it>f(2)h(126&#8211;158)m</it>' motif as a signature of Mg<sup>2+</sup>-binding sites.</p>
         </sec>
         <sec>
            <st>
               <p>Relation between Mg<sup>2+</sup>-Structural Motifs and Protein Function</p>
            </st>
            <p>Do any of the structural motifs found for the Mg<sup>2+</sup>-proteins map to specific protein functions? To answer this question, for each of the Mg<sup>2+</sup>-proteins found with a structural motif, the functional group of the protein from the PDB header and enzyme classification (EC) code, if applicable, are listed in Table <tblr tid="T2">2</tblr>. Note that proteins belonging to the same functional group have the same first EC number. The results in Table <tblr tid="T2">2</tblr> show that most of the structural motifs found for the Mg<sup>2+</sup>-proteins map to certain protein functions. For example, proteins with the partial <it>f(1&#8211;2)h </it>motif are all hydrolases, catalyzing the hydrolytic cleavage of mostly ester bonds (EC3.1.-.-), except for beta-phosphoglucomutase (1O08), which is an isomerase converting beta-D-glucose 1-phosphate to beta-D-glucose 6-phosphate. Interestingly, although class b acid phosphatase (2B82) and the halotolerance protein hal 2 (1KA1) contain structurally <it>nonhomologous </it>Mg<sup>2+</sup>-binding domains with different CATH numbers, both are phosphoric monoester hydrolases (EC3.1.3.-). Proteins with the <it>e(24&#8211;47)h(24)k </it>motif are either lyases and/or isomerases, whereas proteins with the <it>k(26&#8211;29)h(1)a </it>motif have even more diverse functions: 3 are oxidoreductases (1POX, 1UMD, 2C3M), one is a lyase (1ZPD) and the other is a transferase (1ITZ). This shows that even if the proteins share structurally homologous domains (CATH number 3.40.50.970) and structurally similar Mg<sup>2+</sup>-binding sites, as represented by the <it>k(26&#8211;29)h(1)a </it>motif, they can perform different functions.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical Significance of the Mg<sup>2+</sup>-Structural Motifs</p>
            </st>
            <p>Do the structural motifs found for Mg<sup>2+</sup>-proteins in Table <tblr tid="T2">2</tblr> occur in other proteins that do not bind metal ions? To address this question, de Brevern's databank of protein structures that have been encoded into 1D structural sequences was searched for the occurrence of each of the 4 structural motifs listed in Table <tblr tid="T2">2</tblr>. Consistent with the Mg<sup>2+ </sup>and Ca<sup>2+ </sup>datasets, proteins in de Brevern's databank sharing &#8805; 30% sequence identity with &#8805; 2.5-&#197; resolution X-ray structures were removed. Proteins in de Brevern's databank whose structures are complexed with metal ions were also removed, yielding a set of 385 non-homologous test proteins. In order to eliminate those matched structural letters that cannot spatially bind Mg<sup>2+</sup>, the maximum C<sub>&#945;</sub>-C<sub>&#945; </sub>distance between any pair of Mg<sup>2+</sup>-ligands belonging to proteins of a given structural motif in Table <tblr tid="T2">2</tblr> was extracted; this distance is 9.32 &#197; for the <it>e(24&#8211;47)h(24)k </it>motif, 8.32 &#197; for <it>f(1)h(109&#8211;349)b</it>, 8.44 &#197; for <it>f(2)h(126&#8211;158)m</it>, and 7.86 &#197; for <it>k(26&#8211;29)h(1)a</it>. For a given structural motif in Table <tblr tid="T2">2</tblr>, matched letters in the test proteins whose C<sub>&#945;</sub>-C<sub>&#945; </sub>distances exceed the respective maximum distance were eliminated. This resulted in no matches for the <it>e(24&#8211;47)h(24)k </it>and <it>f(2)h(126&#8211;158)m </it>motifs, whereas 2 proteins (1C3K, 1GPE) contained the <it>f(1)h(109&#8211;349)b </it>motif, and another 2 proteins (1A7U, 1JFR) contained the <it>k(26&#8211;29)h(1)a </it>motif. A check of the literature confirmed that these 4 proteins (1C3K, 1GPE 1A7U, 1JFR) do not bind metal ions. This suggests that (i) the 4 Mg<sup>2+</sup>-structural motifs discovered are statistically significant, and (ii) the <it>e(24&#8211;47)h(24)k </it>and <it>f(2)h(126&#8211;158)m </it>motifs could be used to predict metal-binding sites.</p>
         </sec>
         <sec>
            <st>
               <p>Metal Preference of the Mg<sup>2+</sup>-Structural Motifs</p>
            </st>
            <p>To check the specificity of the 4 structural motifs in Table <tblr tid="T2">2</tblr> for Mg<sup>2+</sup>, the same procedure used to represent the Mg<sup>2+</sup>-binding sites in terms of their local structure was repeated for Ca<sup>2+</sup>, which is the metal ion most similar to Mg<sup>2+</sup>. Both Mg<sup>2+ </sup>and Ca<sup>2+ </sup>are closed-shell divalent cations belonging to the same group (IIA) with similar chemical properties: They are both "hard" dications that prefer to bind directly to "hard" oxygen-containing anions, and are hence often found to bind in the same protein cavity <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Thus, the 3D structure of each of the 177 nonredundant Ca<sup>2+ </sup>proteins was represented by a 16-letter structural alphabet (see Methods), and the 1D structural letter representation of the 230 Ca<sup>2+</sup>-binding sites are listed in Additional file <supplr sid="S3">3</supplr>.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>The Ca<sup>2+</sup>-dataset containing 230 metal-binding sites in 177 nonredundant Ca<sup>2+</sup>-proteins. A table listing the PDB entries, protein description, native metal-cofactors (if known), EC code, metal-bound amino acid residues, and first-shell structural representation of the 177 nonredundant Ca<sup>2+</sup>-proteins.</p>
               </text>
               <file name="1471-2105-8-106-S3.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>None of the structural motifs in Table <tblr tid="T2">2</tblr> or Additional file <supplr sid="S2">2</supplr> were found in 3 or more Ca<sup>2+</sup>-binding sites, and therefore cannot be classified as Ca<sup>2+</sup>-structural motifs according to our definition. The <it>f(1)h(109&#8211;349)b </it>motif is found in the Ca<sup>2+</sup>-binding site of the hydrolase from the haloacid dehalogenase family (2FI1), which appears to utilize Mg<sup>2+ </sup>as a natural co-factor <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Although the <it>k(26&#8211;29)h(1)a </it>motif is found in the calcium-binding sites of the transketolase protein (1TRK) and benzoylformate decarboxylase (1Q6Z), the latter is a Mg<sup>2+</sup>-dependent enzyme <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The <it>e(24&#8211;47)h(24)k </it>and <it>f(2)h(126&#8211;158)m </it>motifs did not match any first-shell structural letters of the Ca<sup>2+</sup>-binding sites, indicating that they seem to favor Mg<sup>2+ </sup>over its competitor, Ca<sup>2+</sup>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion and conclusion</p>
         </st>
         <sec>
            <st>
               <p>Comparison with Previous Structural Motif Discovery Methods</p>
            </st>
            <p>Assuming that similarity in the local active site structure implies similarity in biological function, 3D patterns/templates of key active sites have been used to suggest the function of a protein whose structure is known. The 3D patterns/templates have been constructed either manually or automatically using various methods, which have been reviewed recently by Watson et al. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Recently, 3D templates in the absence of experimental data have been constructed using the evolutionary trace method to identify evolutionarily important, solvent accessible residues that cluster in the protein structure <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Furthermore, structural motifs for the metal-binding sites of 3 distinct metalloenzymes families; viz., DNase 1 homologs, dimetallic phosphatases, and dioxygenases, have been obtained by first identifying physical chemical property-based sequence motifs in homologous protein sequences, and subsequently identifying those motifs whose structures are conserved in members of a family/superfamily <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. However, to the best of our knowledge, 3D patterns of key active sites and recurrent patterns (structural motifs) have not been identified using the structural alphabet to convert 3D structures to the respective 1D letter sequences. Also, systematic studies of the structural preference or conservation of Mg<sup>2+</sup>-binding sites in nonhomologous proteins and Mg<sup>2+</sup>-specific structural motifs have not been reported.</p>
         </sec>
         <sec>
            <st>
               <p>Advantages of the Structural-Alphabet Based Motif Discovery Approach</p>
            </st>
            <p>This work presents the first application of the structural alphabet approach to define the 3D patterns of metal active sites and to identify recurrent patterns (structural motifs). The method requires as input only the 3D protein structure to define a 1D structural representation of the respective active site. The structural alphabet-based approach has several advantages: (i) It is efficient at handling many structures as it takes less than a minute on a present-day PC to convert a 3D structure to the corresponding 1D structural sequence. (ii) It requires no sequence comparisons, no parameters or scoring functions and can thus produce consistent structural motifs, whose structures are readily visualized, as illustrated in Figures <figr fid="F4">4</figr> and <figr fid="F5">5</figr>. (iii) It is general and can be used to define 3D patterns not only in metal-binding sites, but also in enzyme active sites, ligand-binding clefts and interacting regions between proteins and their respective partners. The 3D patterns/motifs discovered could be incorporated in methods to detect metal/ligand-binding sites to improve their prediction accuracy.</p>
         </sec>
         <sec>
            <st>
               <p>Secondary Structure Preference of Mg<sup>2+</sup>-Binding Residues</p>
            </st>
            <p>In this work, the structural alphabet-based approach has been used to reveal the structural preference of Mg<sup>2+</sup>-binding sites. Even though helix-like segments represented by the letter 'm' is the most common building block of the Mg<sup>2+</sup>-proteins in the dataset, residues that bind Mg<sup>2+ </sup>disfavor helices, but favor loops. The similarity in the structural preference of the first and second-shell residues supports previous conclusions regarding the relationship between these 2 layers; namely, the structure and properties of the 2<sup>nd- </sup>shell are dictated by those of the 1<sup>st </sup>layer <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Similar Mg<sup>2+</sup>-Binding Site Structures in Dissimilar Protein Sequences</p>
            </st>
            <p>The motif discovery method herein has revealed 4 structural motifs, comprising 21% of the Mg<sup>2+</sup>-binding sites. The 3D structural motifs discovered seems to have more predictive utility in identifying Mg<sup>2+</sup>-binding sites than 1D sequence motifs: A scan of the Mg<sup>2+</sup>-protein sequences in our dataset for the occurrence of sequence motifs stored in the PROSITE <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> database yielded only a single positive match, 1WC1, which contains a PROSITE sequence motif predicting the protein to bind Mg<sup>2+</sup>. However, the ScanProsite results did not identify any of the Mg<sup>2+ </sup>proteins with structural motifs.</p>
         </sec>
         <sec>
            <st>
               <p>Functional Preference of the Mg<sup>2+</sup>-Structural Motifs</p>
            </st>
            <p>The structural motifs discovered generally relate to the biological role of Mg<sup>2+ </sup>and the function of the respective proteins. They capture some essential biochemical and/or evolutionary properties, as proteins found to contain a specific structural motif possess structurally homologous Mg<sup>2+</sup>-binding domains, even though they share no significant sequence homology. Furthermore, the <it>f(2)h(126&#8211;158)m </it>motif maps to a specific functional group, namely, hydrolases, indicating the apparent importance of the local Mg<sup>2+</sup>-binding site structure for the function of these hydrolases. As the <it>f(2)h(126&#8211;158)m </it>motif was <it>not </it>found in non-metalloproteins and in Ca<sup>2+</sup>-binding proteins, the presence of this motif in a novel protein structure may suggest a likely Mg<sup>2+</sup>-binding site and the protein function. On the other hand, the other 3 motifs map to more than one functional group, suggesting that the local Mg<sup>2+</sup>-binding site structure is <it>not </it>the only determinant of the protein's function.</p>
         </sec>
         <sec>
            <st>
               <p>Why Mg<sup>2+</sup>-Specific Structural Motifs are Not Found For Most Mg<sup>2+</sup>-Proteins</p>
            </st>
            <p>Out of the 70 nonhomologous Mg<sup>2+</sup>-proteins, only 16 have first-shell structural motifs, while the rest do not seem to possess any metal-binding site structural motifs-why? One reason might be that some Mg<sup>2+</sup>-structural motifs may have been missed out in this work. As the dataset employed only proteins with Mg<sup>2+</sup>-bound structures (see Database subsection below), some PDB structures complexed with heavier metal ions such as Mn<sup>2+ </sup>may in fact correspond to native Mg<sup>2+</sup>-binding site(s); moreover, not all structures of proteins whose native co-factor is Mg<sup>2+ </sup>have been solved. A second reason is that for native Mg<sup>2+</sup>-binding sites that can accommodate other metal ions such as Ca<sup>2+ </sup>or Mn<sup>2+</sup>, the binding-site structure need not be conserved in order to recognize a specific metal co-factor. A third reason is that although Mg<sup>2+ </sup>occupied the binding site in the 3D structure, it is not the native cofactor. Although all 70 proteins are bound to Mg<sup>2+ </sup>in our dataset, according to PDBSUM <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and from the UniProt annotation and references therein, 14 proteins do not employ Mg<sup>2+ </sup>as the native co-factor, while for 6 proteins, the native metal-cofactor is apparently not known (see Additional file <supplr sid="S1">1</supplr>). For example, calbindin d9K is a vitamin D-dependent calcium-binding protein, but in the 1IG5 structure, the native cofactor Ca<sup>2+ </sup>has been replaced by Mg<sup>2+</sup>. In Mg<sup>2+</sup>-proteins with multiple Mg<sup>2+</sup>-binding sites, one or more sites may be non-native, as they have been artificially induced by the high Mg<sup>2+ </sup>concentration used during crystallization. In these cases, the local structure of the non-native metal-binding site would not be expected to share any similarity with that of a native Mg<sup>2+</sup>-binding site, where the conserved local structure (as in the <it>f(2)h(126&#8211;158)m </it>motif) plays an important role in the protein's function. The absence of structural motifs for non-native Mg<sup>2+</sup>-binding sites indirectly supports our strategy.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Database</p>
            </st>
            <p>A set of 70 nonredundant Mg<sup>2+ </sup>proteins was created by searching the PDB <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> for structures with resolution &lt; 2.5-&#197; containing Mg<sup>2+ </sup>with &lt;30% sequence identity. Structures with residues missing in the middle of the sequence were excluded because such gaps in the structure could alter the spacing in the binding-site motifs (see below). Structures with Mg<sup>2+ </sup>bound to &lt;3 protein ligands were also excluded, as 2-residue motifs cannot be considered specific enough for any practical use. The resulting Mg<sup>2+ </sup>dataset comprise 77 binding sites in 70 proteins. Note that although most Mg<sup>2+</sup>-proteins have only one binding site, some proteins have more than one Mg<sup>2+</sup>-binding sites (PDB entries 1MXG, 1NUY, 1VCL, 1WL6, 2BJI, and 2BVC). A set of nonredundant Ca<sup>2+ </sup>proteins was created following the same procedure used to create the Mg<sup>2+ </sup>dataset. This resulted in 230 Ca<sup>2+</sup>-binding sites in 177 proteins. The PDB entries, EC code, and amino acid residues bound to the metal ion in the 77 Mg<sup>2+ </sup>and 230 Ca<sup>2+ </sup>sites are given in Additional files <supplr sid="S1">1</supplr> and <supplr sid="S3">3</supplr>, respectively.</p>
         </sec>
         <sec>
            <st>
               <p>The Structural Alphabet</p>
            </st>
            <p>Each metalloprotein structure was encoded into its 1D structural sequence according to the original structural alphabet defined by de Brevern and co-workers <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. We refer the reader to the original work <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> for details of how this alphabet was devised, and briefly outline the procedure here. The backbone of each protein from a nonredundant protein structure database was represented by consecutive 5-residue segments, each described by a vector of 8 backbone dihedral angles <b><it>V</it></b>(&#968;<sub>n-2</sub>, &#966;<sub>n-1</sub>, &#968;<sub>n-1</sub>, &#966;<sub>n</sub>, &#968;<sub>n</sub>, &#966;<sub>n+1</sub>, &#968;<sub>n+1</sub>, &#966;<sub>n+2</sub>). The dissimilarity between 2 vectors <b><it>V</it></b><sub>1 </sub>and <b><it>V</it></b><sub>2 </sub>of dihedral angles is measured by the root-mean-square deviations of the dihedral angle values (rmsda), which is defined as the Euclidean distance among the 4 links:</p>
            <p>
               <m:math name="1471-2105-8-106-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtext>rmsda&#160;</m:mtext>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>V</m:mi>
                           <m:mn>1</m:mn>
                        </m:msub>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>V</m:mi>
                           <m:mn>2</m:mn>
                        </m:msub>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:msqrt>
                           <m:mrow>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:munderover>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mn>4</m:mn>
                                       </m:munderover>
                                       <m:mrow>
                                          <m:msup>
                                             <m:mrow>
                                                <m:mo stretchy="false">[</m:mo>
                                                <m:msub>
                                                   <m:mi>&#968;</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:msub>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>V</m:mi>
                                                         <m:mo stretchy="true">&#8594;</m:mo>
                                                      </m:mover>
                                                   </m:mrow>
                                                   <m:mn>1</m:mn>
                                                </m:msub>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:mo>&#8722;</m:mo>
                                                <m:msub>
                                                   <m:mi>&#968;</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:msub>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>V</m:mi>
                                                         <m:mo stretchy="true">&#8594;</m:mo>
                                                      </m:mover>
                                                   </m:mrow>
                                                   <m:mn>2</m:mn>
                                                </m:msub>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:mo stretchy="false">]</m:mo>
                                             </m:mrow>
                                             <m:mn>2</m:mn>
                                          </m:msup>
                                       </m:mrow>
                                    </m:mstyle>
                                    <m:mo>+</m:mo>
                                    <m:msup>
                                       <m:mrow>
                                          <m:mo stretchy="false">[</m:mo>
                                          <m:msub>
                                             <m:mi>&#966;</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>+</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mrow>
                                                <m:mover accent="true">
                                                   <m:mi>V</m:mi>
                                                   <m:mo stretchy="true">&#8594;</m:mo>
                                                </m:mover>
                                             </m:mrow>
                                             <m:mn>1</m:mn>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo>&#8722;</m:mo>
                                          <m:msub>
                                             <m:mi>&#966;</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>+</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:msub>
                                             <m:mrow>
                                                <m:mover accent="true">
                                                   <m:mi>V</m:mi>
                                                   <m:mo stretchy="true">&#8594;</m:mo>
                                                </m:mover>
                                             </m:mrow>
                                             <m:mn>2</m:mn>
                                          </m:msub>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mo stretchy="false">]</m:mo>
                                       </m:mrow>
                                       <m:mn>2</m:mn>
                                    </m:msup>
                                 </m:mrow>
                                 <m:mn>8</m:mn>
                              </m:mfrac>
                           </m:mrow>
                        </m:msqrt>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGYbGCcqqGTbqBcqqGZbWCcqqGKbazcqqGHbqycqqGGaaicqGGOaakieWacqWFwbGvdaWgaaWcbaGaeGymaedabeaakiabcYcaSiab=zfawnaaBaaaleaacqaIYaGmaeqaaOGaeiykaKIaeyypa0ZaaOaaaeaadaWcaaqaamaaqahabaGaei4waSLaeqiYdK3aaSbaaSqaaiabdMgaPbqabaGccqGGOaakdaWhcaqaaiabdAfawbGaay51GaWaaSbaaSqaaiabigdaXaqabaGccqGGPaqkcqGHsislcqaHipqEdaWgaaWcbaGaemyAaKgabeaakiabcIcaOmaaFiaabaGaemOvayfacaGLxdcadaWgaaWcbaGaeGOmaidabeaakiabcMcaPiabc2faDnaaCaaaleqabaGaeGOmaidaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabisda0aqdcqGHris5aOGaey4kaSIaei4waSLaeqOXdy2aaSbaaSqaaiabdMgaPjabgUcaRiabigdaXaqabaGccqGGOaakdaWhcaqaaiabdAfawbGaay51GaWaaSbaaSqaaiabigdaXaqabaGccqGGPaqkcqGHsislcqaHgpGzdaWgaaWcbaGaemyAaKMaey4kaSIaeGymaedabeaakiabcIcaOmaaFiaabaGaemOvayfacaGLxdcadaWgaaWcbaGaeGOmaidabeaakiabcMcaPiabc2faDnaaCaaaleqabaGaeGOmaidaaaGcbaGaeGioaGdaaaWcbeaaaaa@764B@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Using an unsupervised cluster analyzer based on the above rmsda of the segments, 16 letters (also called protein blocks) were identified, which in turn comprise the structural 'alphabet'.</p>
         </sec>
         <sec>
            <st>
               <p>Converting 3D Structure to 1D Structural Alphabet</p>
            </st>
            <p>The 3D structures of the 70 Mg<sup>2+ </sup>and 177 Ca<sup>2+ </sup>proteins were converted into strings of structural letters using the program PBE published in ref. 9. For a given <it>n- </it>residue protein, <it>n-4 </it>letter assignments were obtained by scanning the protein sequence using a 5-residue sliding window. The structure of each 5-residue segment is compared with that of each of the 16 letters and the letter that has the closest structure (as measured by the rmsda) to the 5-residue segment is assigned to the middle residue of that segment. This process is illustrated in Figure <figr fid="F6">6</figr>: The first letter is assigned to the 3<sup>rd </sup>residue, Val, representing the first 5-residue segment. Its structure is closest to that of the structural letter '<it>d'</it>, therefore Val 3 is assigned <it>'d'</it>. Note that no letters can be assigned to the first 2 and last 2 residues of each protein.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Conversion of the 3D protein backbone into a 1D structural alphabet representation</p>
               </caption>
               <text>
                  <p><b>Conversion of the 3D protein backbone into a 1D structural alphabet representation</b>. The first 2 and the last 2 residues are assigned '<it>Z</it>', meaning a letter cannot be assigned at these residues. The first valid assignment is '<it>d</it>', at Val 3 and spanning residues 1 to 5. The next is assigned to Asp 4 and spans residues 2 to 6.</p>
               </text>
               <graphic file="1471-2105-8-106-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Definition of 1<sup>st </sup>and 2<sup>nd</sup>-Shell Metal Ligands</p>
            </st>
            <p>Analyses of high-resolution X-ray structures with crystallographic <it>R </it>factor &#8804; 0.065 of small metal complexes in the Cambridge Structural Database <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> have shown that the mean 1<sup><it>st</it>- </sup>shell Mg-O<sub>water</sub>, Mg-O<sub>carboxylate</sub>, and Mg-O<sub>alcohol </sub>distances do not exceed 2.11 &#197;, while the Ca-O<sub>water</sub>, Ca-O<sub>carboxylate</sub>, Ca-O<sub>alcohol</sub>, and Ca-N<sub>imidazole </sub>bond distances do not exceed 2.55 &#197; <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. To account for the lower resolution of the PDB structures, a slightly larger cutoff was used to locate the 1<sup>st</sup>-shell amino acid ligands. Thus, the Mg<sup>2+ </sup>and Ca<sup>2+ </sup>ligands were defined as residues with a donor atom within 2.5 &#197; and 2.7 &#197; from the metal ion, respectively. The heavy atoms of the metal-bound amino acid residues were then selected as centers to search for the 2<sup>nd</sup>-shell protein ligands using a hydrogen-bonding cutoff of 3.5 &#197; <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Note that water molecules in the first and second shells were not identified, as they were not used to define a structural motif.</p>
         </sec>
         <sec>
            <st>
               <p>Definition of 1<sup>st </sup>and 2<sup>nd</sup>-Shell Structural Representation/Pattern</p>
            </st>
            <p>Since the 3D structure of each metalloprotein has been converted into the respective 1D letter sequence as described above, the letters that correspond to the metal-bound amino acid residues yielded a structural representation of the first-shell, as shown in the last columns of Additional files <supplr sid="S1">1</supplr> and <supplr sid="S3">3</supplr> for each metal-binding site. For example, in the case of the human/chicken estrogen receptor (1HCQ), the letters corresponding to the Zn-binding Cys residues at position 7, 10, 24 and 27 are <it>f, o, f</it>, and <it>m</it>, respectively, yielding a <it>f(2)o(13)f(2)m </it>representation of the first-shell for 1HCQ (see Figure <figr fid="F1">1</figr>).</p>
         </sec>
         <sec>
            <st>
               <p>Definition of Structural Motifs</p>
            </st>
            <p>In previous work <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, all values of <it>k </it>between 2 and 20 were used to define a structural motif, where <it>k </it>is the number of first-shell structural patterns with the same structural letters and similar interletter spacing. Here, <it>k </it>&#8805; 3 was used to define a structural motif. Thus, if 3 or more proteins possess first-shell structural patterns with the same structural letters and similar interletter spacing, these proteins are assumed to share a common structural motif. For example, transketolase (1ITZ), pyruvate oxidase (1POX), 2 oxo-acid dehydrogenase alpha subunit (1UMD), pyruvate decarboxylase (1ZPD), and pyruvate-ferredoxin oxidoreductase (2C3M) share the first-shell structural pattern, <it>k(26&#8211;29)h(1)a</it>, which thus defines a structural motif.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>MD carried out all the calculations, including writing programs, and drafted the manuscript. CL conceived of the study, participated in its design and analysis/interpretation of data, and writing/revising the manuscript. Both authors have read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank anonymous reviewers for constructive comments/suggestions. We are grateful to Steven Wu, Michael J. B. Lin, and Backy Chen for assistance in the statistical analyses, and Leon Li, Todor Dudev, and Gopi Kuppuraj for literature assistance. This work was supported by NSC contract no. NSC 94-2113-M-001-018 to CL.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Biological Chemistry of Magnesium</p>
            </title>
            <aug>
               <au>
                  <snm>Cowan</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <publisher>New York , VCH</publisher>
            <pubdate>1995</pubdate>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Three-dimensional structure of the free radical protein of ribonucleotide reductase</p>
            </title>
            <aug>
               <au>
                  <snm>Nordlund</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sjoberg</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Eklund</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1990</pubdate>
            <volume>345</volume>
            <fpage>593</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/345593a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">2190093</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Competitive Binding in Magnesium Coordination Chemistry: Water versus Ligands of Biological Interest</p>
            </title>
            <aug>
               <au>
                  <snm>Dudev</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cowan</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Am Chem Soc</source>
            <pubdate>1999</pubdate>
            <volume>121</volume>
            <fpage>7665</fpage>
            <lpage>7673</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1021/ja984470t</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Metal activation of enzymes in nucleic acid biochemistry</p>
            </title>
            <aug>
               <au>
                  <snm>Cowan</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Chem Rev</source>
            <pubdate>1998</pubdate>
            <volume>98</volume>
            <fpage>1067</fpage>
            <lpage>1087</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/cr960436q</pubid>
                  <pubid idtype="pmpid" link="fulltext">11848925</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The relation between the divergence of sequence and structure in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Chotia</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lesk</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1986</pubdate>
            <volume>5</volume>
            <fpage>823</fpage>
            <lpage>826</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1166865</pubid>
                  <pubid idtype="pmpid">3709526</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks</p>
            </title>
            <aug>
               <au>
                  <snm>de Brevern</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Etchebest</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hazout</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proteins: Struct Funct Genet</source>
            <pubdate>2000</pubdate>
            <volume>41</volume>
            <fpage>271</fpage>
            <lpage>287</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/1097-0134(20001115)41:3&lt;271::AID-PROT10>3.0.CO;2-Z</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>New assessment of a structural alphabet</p>
            </title>
            <aug>
               <au>
                  <snm>de Brevern</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>In Silico Biol</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <fpage>26</fpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The importance of short structural motifs in protein structure analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Unger</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sussman</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>J Comput Aided Mol Des</source>
            <pubdate>1993</pubdate>
            <volume>7</volume>
            <fpage>457</fpage>
            <lpage>472</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02337561</pubid>
                  <pubid idtype="pmpid">8229095</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Protein block expert (PBE): a web-based protein structure analysis server using a structural alphabet</p>
            </title>
            <aug>
               <au>
                  <snm>Tyagi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sharma</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Swamy</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Cadet</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>de Brevern</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Offman</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>W119</fpage>
            <lpage>W123</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1538797</pubid>
                  <pubid idtype="pmpid" link="fulltext">16844973</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl199</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications.</p>
            </title>
            <aug>
               <au>
                  <snm>Tyagi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gowri</snm>
                  <fnm>VS</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>de Brevern</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Offmann</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Proteins: Structure, Function &amp; Bioinformatics</source>
            <pubdate>2006 </pubdate>
            <volume>65</volume>
            <issue>1</issue>
            <fpage>32</fpage>
            <lpage>39</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/prot.21087</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Prediction of local structure in proteins using a library of sequence- structure motifs.</p>
            </title>
            <aug>
               <au>
                  <snm>Bystroff</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>281</volume>
            <issue>3</issue>
            <fpage>565</fpage>
            <lpage>577</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1998.1943</pubid>
                  <pubid idtype="pmpid" link="fulltext">9698570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Small libraries of protein fragments model native structures accurately</p>
            </title>
            <aug>
               <au>
                  <snm>Kolodny</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Koehl</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Guibas</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Levitt</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>323</volume>
            <fpage>297</fpage>
            <lpage>307</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)00942-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12381322</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Use of a structural alphabet for analysis of short loops connecvting repetitive structures</p>
            </title>
            <aug>
               <au>
                  <snm>Fourrier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Benros</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>de Brevern</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>58</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">450294</pubid>
                  <pubid idtype="pmpid" link="fulltext">15140270</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-58</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>First-Second Shell Interactions in Metal Binding Sites in Proteins: A PDB Survey and DFT/CDM Calculations</p>
            </title>
            <aug>
               <au>
                  <snm>Dudev</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>YL</fnm>
               </au>
               <au>
                  <snm>Dudev</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Am Chem Soc</source>
            <pubdate>2003</pubdate>
            <volume>125</volume>
            <fpage>3168</fpage>
            <lpage>3180</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ja0209722</pubid>
                  <pubid idtype="pmpid" link="fulltext">12617685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A human retinoic acid receptor which belongs to the family of nuclear receptors</p>
            </title>
            <aug>
               <au>
                  <snm>Petkovich</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brand</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Krust</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chambon</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1987</pubdate>
            <volume>330</volume>
            <fpage>444</fpage>
            <lpage>450</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/330444a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">2825025</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The Protein Data Bank</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Battistuz</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bhat</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Bluhm</snm>
                  <fnm>WF</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Burkhardt</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Iype</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Jain</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fagan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Marvin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Padilla</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ravichandran</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Thanki</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Weissig</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Westbrook</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Zardecki</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Acta Crystallogr D</source>
            <pubdate>2002</pubdate>
            <volume>58</volume>
            <fpage>899</fpage>
            <lpage>907</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1107/S0907444902003451</pubid>
                  <pubid idtype="pmpid" link="fulltext">12037327</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>PROSITE: a documented database using patters and profiles as motif desriptors</p>
            </title>
            <aug>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gattiker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Falquet</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>265</fpage>
            <lpage>274</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bib/3.3.265</pubid>
                  <pubid idtype="pmpid" link="fulltext">12230035</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Principles Governing Mg, Ca, and Zn Binding and Selectivity in Proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Dudev</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Chem Rev</source>
            <pubdate>2003</pubdate>
            <volume>103</volume>
            <fpage>773</fpage>
            <lpage>787</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/cr020467n</pubid>
                  <pubid idtype="pmpid" link="fulltext">12630852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Diversification of function in the haloacid dehalogenase enzyme superfamily: The role of the cap domain in hydrolytic phosphorussingle bondcarbon bond cleavage</p>
            </title>
            <aug>
               <au>
                  <snm>Lahiri</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Dunaway-Mariano</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>KN</fnm>
               </au>
            </aug>
            <source>Bioinorg Chem</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>394</fpage>
            <lpage>409</lpage>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Benzoylformate decarboxylase from Pseudomonas putida as stable catalyst for the synthesis of chiral 2-hydroxy ketones</p>
            </title>
            <aug>
               <au>
                  <snm>Iding</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Dunnwald</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Greiner</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Liese</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Siegert</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Grotzinger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Demir</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Pohl</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Chemistry -A Eur J</source>
            <pubdate>2000</pubdate>
            <volume>6</volume>
            <fpage>1483</fpage>
            <lpage>1495</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/(SICI)1521-3765(20000417)6:8&lt;1483::AID-CHEM1483>3.0.CO;2-S</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Predicting protein function from sequence and structural data</p>
            </title>
            <aug>
               <au>
                  <snm>Watson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Laskowski</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Curr Op Struct Biol</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>275</fpage>
            <lpage>284</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/j.sbi.2005.04.003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Recurrent use of evolutionary importance for functional annotation of proteins based on local structural similarity</p>
            </title>
            <aug>
               <au>
                  <snm>Kristensen</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>BY</fnm>
               </au>
               <au>
                  <snm>Fofanov</snm>
                  <fnm>VY</fnm>
               </au>
               <au>
                  <snm>Ward</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Lisewski</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Kimmel</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kavraki</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lichtarge</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Prot Sci</source>
            <pubdate>2006</pubdate>
            <volume>15</volume>
            <fpage>1530</fpage>
            <lpage>1536</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1110/ps.062152706</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Identifying property based sequence motifs in protein families and superfamilies: application to DNase-1 related endonucleases</p>
            </title>
            <aug>
               <au>
                  <snm>Mathura</snm>
                  <fnm>VS</fnm>
               </au>
               <au>
                  <snm>Schein</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Braun</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proteins: Structure, Function and Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1381</fpage>
            <lpage>1390</lpage>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Molego-based definition of the architecture and specificity of metal-binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Schein</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Oezguen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mathura</snm>
                  <fnm>VS</fnm>
               </au>
               <au>
                  <snm>Braun</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proteins: Structure, Function and Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>58</volume>
            <fpage>200</fpage>
            <lpage>210</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/prot.20253</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>PDBsum: summaries and analyses of PDB structures</p>
            </title>
            <aug>
               <au>
                  <snm>Laskowski</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>1</issue>
            <fpage>221</fpage>
            <lpage>222</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29784</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125097</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.221</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>The Cambridge structural database: a quarter of a million crystal structures and rising</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>FH</fnm>
               </au>
            </aug>
            <source>Acta Cryst</source>
            <pubdate>2002</pubdate>
            <volume>B58</volume>
            <fpage>380</fpage>
            <lpage>388</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The geometry of metal-ligand interactions relevant to proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Harding</snm>
                  <fnm>MM</fnm>
               </au>
            </aug>
            <source>Acta Cryst</source>
            <pubdate>1999</pubdate>
            <volume>D55</volume>
            <fpage>1432</fpage>
            <lpage>1443</lpage>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Satisfying hydrogen bonding potential in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>McDonald</snm>
                  <fnm>IK</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1994</pubdate>
            <volume>238</volume>
            <issue>5</issue>
            <fpage>777</fpage>
            <lpage>793</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1994.1334</pubid>
                  <pubid idtype="pmpid" link="fulltext">8182748</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Structure motif discovery and mining the PDB</p>
            </title>
            <aug>
               <au>
                  <snm>Jonassen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Eidhammer</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Conklin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>362</fpage>
            <lpage>367</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/bioinformatics/18.2.362</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
