<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-321</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Automatic generation of 3D motifs for classification of protein binding sites</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Nebel</snm>
               <fnm>Jean-Christophe</fnm>
               <insr iid="I1"/>
               <email>J.Nebel@kingston.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Herzyk</snm>
               <fnm>Pawel</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>ph53d@udcf.gla.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Gilbert</snm>
               <mi>R</mi>
               <fnm>David</fnm>
               <insr iid="I2"/>
               <email>drg@dcs.gla.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Faculty of Computing, Information Systems &amp; Mathematics, Kingston University, Kingston-upon-Thames, KT1 2EE, UK</p>
            </ins>
            <ins id="I2">
               <p>Bioinformatics Research Centre, University of Glasgow, Glasgow, G12 8QQ, UK</p>
            </ins>
            <ins id="I3">
               <p>The Sir Henry Wellcome Functional Genomics Facility, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>321</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/321</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17760982</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-321</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>16</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>30</day>
               <month>8</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>30</day>
               <month>8</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Nebel et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Since many of the new protein structures delivered by high-throughput processes do not have any known function, there is a need for structure-based prediction of protein function. Protein 3D structures can be clustered according to their fold or secondary structures to produce classes of some functional significance. A recent alternative has been to detect specific 3D motifs which are often associated to active sites. Unfortunately, there are very few known 3D motifs, which are usually the result of a manual process, compared to the number of sequential motifs already known. In this paper, we report a method to automatically generate 3D motifs of protein structure binding sites based on consensus atom positions and evaluate it on a set of adenine based ligands.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Our new approach was validated by generating automatically 3D patterns for the main adenine based ligands, i.e. AMP, ADP and ATP. Out of the 18 detected patterns, only one, the ADP4 pattern, is not associated with well defined structural patterns. Moreover, most of the patterns could be classified as binding site 3D motifs. Literature research revealed that the ADP4 pattern actually corresponds to structural features which show complex evolutionary links between ligases and transferases. Therefore, all of the generated patterns prove to be meaningful. Each pattern was used to query all PDB proteins which bind either purine based or guanine based ligands, in order to evaluate the classification and annotation properties of the pattern. Overall, our 3D patterns matched 31% of proteins with adenine based ligands and 95.5% of them were classified correctly.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>A new metric has been introduced allowing the classification of proteins according to the similarity of atomic environment of binding sites, and a methodology has been developed to automatically produce 3D patterns from that classification. A study of proteins binding adenine based ligands showed that these 3D patterns are not only biochemically meaningful, but can be used for protein classification and annotation.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Structural genomics projects aim at high-throughput delivery of protein structures regardless of the state of their functional annotation. Moreover, roughly half of gene-products delivered by completed genomes of various organisms do not show sequence homology to existing proteins of known function. Therefore, structure-based prediction of protein molecular function is essential. Protein 3D structures can be clustered according to their fold in classes which usually have some functional significance, e.g. SCOP <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, FSSP <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and CATH <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. More recently, researchers have investigated the detection of functional 3D patterns associated with active sites or/and atom interactions (see the Methods section for an explanation of the terminology used throughout this paper). These patterns may be based on secondary structures, such as the EF-hand domain and zinc fingers or sets of 3D positions of atoms involved either in H-bonds <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> or ligand binding <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Moreover, tools have been developed which allow the detection of residue based 3D patterns within a protein structure <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> and the comparison of the 3D structure of binding sites with other proteins <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Unfortunately, many patterns do not correspond to any specific function and the number of known 3D motifs is rather small compared to the number of sequential motifs captured in databases such as PROSITE <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and InterPro <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. So far, 3D motifs have been mainly the result of some manual and experimental process, e.g. the Catalytic Site Atlas <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Moreover, patterns are usually constructed on the basis of residues and are represented either by their C&#945; atoms, atoms interacting with ligands or all their atoms.</p>
         <p>Although proteins are composed of amino acids which are very convenient and useful structural units for the analysis of proteins, ultimately chemical interactions happen at the atomic level. The associations of residues in physicochemical groups which are not mutually exclusive implicitly acknowledge the limitation of residue-centred approaches <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. We propose a new approach where 3D motifs are generated automatically and are based only on consensus atom positions without explicit reference to the residues to which they belong and the direct interactions they may have with ligands. In this work, the ligand binding sites of a protein are compared by superimposing the corresponding ligands. The similarity between ligand environments is then evaluated by calculating the number of atoms of the same type which share equivalent spatial positions. By converting that similarity measure into a normalised metric, a similarity matrix can be generated for a given set of proteins in order to permit clustering of their ligand binding sites. Subsequently, consensus 3D patterns can be produced to represent each of the clusters. Because the clusters can be shown to be associated with specific biochemical functions, protein structures can be compared to these 3D motifs in order to predict their function.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>3D pattern generation for adenine phosphate</p>
            </st>
            <p>We evaluated our method by automatically generating 3D patterns for the main adenine based ligands, i.e. AMP, ADP and ATP. These ligands were selected because they are relatively common, key to many biochemical reactions and contain rigid groups which make their superimposition meaningful. Subsequently, the patterns produced were tested against other adenine based molecules, i.e. ACP and ANP, and guanine based ligands, i.e. GMP, GDP and GTP. Figure <figr fid="F1">1</figr> shows the main chemical structures involved in the ligands of interest.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Chemical structures of adenine, ATP and guanine</p>
               </caption>
               <text>
                  <p>Chemical structures of adenine, ATP and guanine.</p>
               </text>
               <graphic file="1471-2105-8-321-1"/>
            </fig>
            <p>For each of the main adenine based ligands, a binding site similarity matrix was generated using ligand-specific training sets (see Methods section). In this work, two atoms of the same chemical type are considered to share a similar position if the distance between them is less than 1.25 &#197;. This value was chosen because in previous work <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> it was shown to be a good compromise between accuracy and flexibility. The binding sites were then clustered and outliers were removed using a similarity threshold <it>S</it><sub><it>T </it></sub>= 0.6 (see Method section). The consensus 3D patterns were generated for valid clusters that contain at least 3 binding sites. Table <tblr tid="T1">1</tblr> presents statistics associated with the 18 valid clusters identified using three main adenine based ligands AMP, ADP and ATP, including the number of proteins from which they were generated and their core size.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Initial clusters for each ligand</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Ligand</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Proteins in PDB50</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Valid clusters</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Core cluster sizes</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AMP</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>5-4-3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ADP</p>
                     </c>
                     <c ca="center">
                        <p>185</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>10-9-8-7-6-5-5-4-4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ATP</p>
                     </c>
                     <c ca="center">
                        <p>133</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>10-9-4-3-3-3</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The 3D patterns corresponding to the above 18 clusters were then compared in an all against all manner. Clusters with highly similar 3D patterns were merged into new 3D patterns, resulting in a reduction of the number of clusters from 18 to 13 [see cluster composition in Additional file <supplr sid="S1">1</supplr>]. Merging was performed between AMP and ATP, ADP and ATP, but not between AMP and ADP clusters. The similarity threshold associated with each pattern was defined as the lowest similarity score between any pair of binding sites belonging to the cluster represented by that pattern. These values are called the automatic similarity threshold.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Cluster composition. File providing the list of PDB proteins defining each cluster.</p>
               </text>
               <file name="1471-2105-8-321-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>From 3D pattern to 3D motif</p>
            </st>
            <p>Table <tblr tid="T2">2</tblr> presents the characteristics of the final 3D patterns in form of cluster sizes, numbers of consensus atoms as well as the consensus information collected from PDBSum <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> for all the proteins contributing to the patterns. The pattern sizes in terms of the number of atoms involved are very different and range from 6 to 71, constituting between 4% and 46% of the average binding site respectively, as there are on average 154 atoms in a binding site [see 3D patterns in Additional files <supplr sid="S2">2</supplr>, <supplr sid="S3">3</supplr>, <supplr sid="S4">4</supplr>, <supplr sid="S5">5</supplr>, <supplr sid="S6">6</supplr>, <supplr sid="S7">7</supplr>, <supplr sid="S8">8</supplr>, <supplr sid="S9">9</supplr>, <supplr sid="S10">10</supplr>, <supplr sid="S11">11</supplr>, <supplr sid="S12">12</supplr>, <supplr sid="S13">13</supplr>, <supplr sid="S14">14</supplr>].</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>ADP0. File containing the coordinates and types of atoms belonging to the 3D motif named ADP0. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S2.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>ADP1. File containing the coordinates and types of atoms belonging to the 3D motif named ADP1. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S3.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>ADP2. File containing the coordinates and types of atoms belonging to the 3D motif named ADP2. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S4.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p>ADP3. File containing the coordinates and types of atoms belonging to the 3D motif named ADP3. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S5.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S6">
               <title>
                  <p>Additional file 6</p>
               </title>
               <text>
                  <p>ADP4. File containing the coordinates and types of atoms belonging to the 3D motif named ADP4. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S6.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S7">
               <title>
                  <p>Additional file 7</p>
               </title>
               <text>
                  <p>ADP5. File containing the coordinates and types of atoms belonging to the 3D motif named ADP5. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S7.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S8">
               <title>
                  <p>Additional file 8</p>
               </title>
               <text>
                  <p>ADP6. File containing the coordinates and types of atoms belonging to the 3D motif named ADP6. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S8.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S9">
               <title>
                  <p>Additional file 9</p>
               </title>
               <text>
                  <p>AMP0. File containing the coordinates and types of atoms belonging to the 3D motif named AMP0. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S9.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S10">
               <title>
                  <p>Additional file 10</p>
               </title>
               <text>
                  <p>ATP0. File containing the coordinates and types of atoms belonging to the 3D motif named ATP0. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S10.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S11">
               <title>
                  <p>Additional file 11</p>
               </title>
               <text>
                  <p>AxP0. File containing the coordinates and types of atoms belonging to the 3D motif named AxP0. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S11.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S12">
               <title>
                  <p>Additional file 12</p>
               </title>
               <text>
                  <p>AxP1. File containing the coordinates and types of atoms belonging to the 3D motif named AxP1. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S12.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S13">
               <title>
                  <p>Additional file 13</p>
               </title>
               <text>
                  <p>AxP2. File containing the coordinates and types of atoms belonging to the 3D motif named AxP2. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S13.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S14">
               <title>
                  <p>Additional file 14</p>
               </title>
               <text>
                  <p>AxP3. File containing the coordinates and types of atoms belonging to the 3D motif named AxP3. It also includes atoms belonging to the ligand.</p>
               </text>
               <file name="1471-2105-8-321-S14.pdb">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Consensus information about each pattern</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Pattern</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Rep.</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Ligand</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Size</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Atoms</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Function</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>EC</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>CATH</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>DALI</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>IPR</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>PS</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1vom</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="left">
                        <p>Myosin</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                     <c ca="center">
                        <p>1.10.162.10</p>
                        <p>1.10.183.10</p>
                        <p>1.10.465.10</p>
                        <p>3.30.538.10</p>
                     </c>
                     <c ca="center">
                        <p>1237</p>
                     </c>
                     <c ca="center">
                        <p>001093</p>
                        <p>001609</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1q0b</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="left">
                        <p>Kinesin</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                     <c ca="center">
                        <p>3.40.850.10</p>
                     </c>
                     <c ca="center">
                        <p>1236</p>
                     </c>
                     <c ca="center">
                        <p>001752</p>
                     </c>
                     <c ca="center">
                        <p>50067</p>
                        <p>00411</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1b62</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3.30.565.10</p>
                     </c>
                     <c ca="center">
                        <p>(846, 847, 848)</p>
                     </c>
                     <c ca="center">
                        <p>003594</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1qf9</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>Transferase</p>
                     </c>
                     <c ca="center">
                        <p>2.7. (2.7.1., 2.7.4.)</p>
                     </c>
                     <c ca="center">
                        <p>3.40.50.300</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP4</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1ehi</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>Ligase (60%) Transferase (40%)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3.30.</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1njf</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>1.10.8.60</it>
                        </p>
                        <p>
                           <it>3.40.50.300</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1062</p>
                     </c>
                     <c ca="center">
                        <p>003593</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP6</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1oxu</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="left">
                        <p>Abc transporter</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                     <c ca="center">
                        <p>3.40.50.300</p>
                     </c>
                     <c ca="center">
                        <p>1069</p>
                        <p>1071</p>
                     </c>
                     <c ca="center">
                        <p>003593</p>
                        <p>003439</p>
                     </c>
                     <c ca="center">
                        <p>00211</p>
                        <p>50893</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AMP0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1v26</p>
                     </c>
                     <c ca="center">
                        <p>AMP</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="left">
                        <p>Ligase</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>2.30.38.10</it>
                        </p>
                        <p>
                           <it>3.30.300.30</it>
                        </p>
                        <p>
                           <it>3.40.50.980</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>736</p>
                        <p>992</p>
                     </c>
                     <c ca="center">
                        <p>000873</p>
                     </c>
                     <c ca="center">
                        <p>00455</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ATP0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1y8q</p>
                     </c>
                     <c ca="center">
                        <p>ATP</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                     <c ca="left">
                        <p>Ligase</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>3.40.50.70</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>851</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>000205</p>
                        <p>000594</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AxP0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2a40</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                        <p>ATP</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>Actin (85%) Heat shock (15%)</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                     <c ca="center">
                        <p>3.30.420.40</p>
                        <p>3.90.640.10</p>
                     </c>
                     <c ca="center">
                        <p>1175</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AxP1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1j1c</p>
                     </c>
                     <c ca="center">
                        <p>ADP</p>
                        <p>ATP</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>45</p>
                     </c>
                     <c ca="left">
                        <p>Kinase</p>
                     </c>
                     <c ca="center">
                        <p>2.7.1.</p>
                     </c>
                     <c ca="center">
                        <p>1.10.510.10</p>
                        <p>3.30.200.20</p>
                     </c>
                     <c ca="center">
                        <p>593</p>
                     </c>
                     <c ca="center">
                        <p>000719</p>
                     </c>
                     <c ca="center">
                        <p>50011</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AxP2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1ses</p>
                     </c>
                     <c ca="center">
                        <p>AMP</p>
                        <p>ATP</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="left">
                        <p>Ligase</p>
                     </c>
                     <c ca="center">
                        <p>6.1.1.</p>
                     </c>
                     <c ca="center">
                        <p>3.30.930.10</p>
                     </c>
                     <c ca="center">
                        <p>1116</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>50862</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AxP3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1o97</p>
                     </c>
                     <c ca="center">
                        <p>AMP</p>
                        <p>ATP</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>42</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3.40.50.620</p>
                     </c>
                     <c ca="center">
                        <p>(973 974 975)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>This table consists of the name of the pattern, its PDB representative (Rep.), names of ligands the pattern was created from, the size of the cluster, the number of atoms in the pattern, the annotated function, the EC (Enzyme Commission) number if relevant [35], the protein fold classifications according to CATH [3] and DALI [2], and finally detected sequence motifs according to InterPro (IPR) [11] and PROSITE (PS) [10]. When consensus values could not be found, but close alternatives were available, values are shown between brackets. Finally, when data was not available through PDBSum, it was inferred when possible using homologues; in this case values are shown in italic. Otherwise when not enough data was available to generate a meaningful consensus value, the code 'ND' is used.</p>
               </tblfn>
            </tbl>
            <p>Table <tblr tid="T2">2</tblr> demonstrates that all 3D patterns, except ADP4, are associated with well defined structural patterns as described by the CATH and DALI identifiers. This observation validates our similarity metric which is based on the number of common atoms in the neighbourhood of a ligand, as a meaningful metric for protein structure comparison. Since 7 out of the 13 3D patterns also combine consensus function and sequential motifs (ADP0, ADP1, ADP6, AMP0, ATP0, AxP1, AxP2), we can classify them as binding site 3D motifs. Furthermore, although neither ADP2 nor ADP5 are associated with a specific function, each is related to a sequence motif. Therefore they must be linked to sub-functions which are performed by ADP binding and consequently can be classified as 3D motifs. Finally, ADP3 can also be considered as a 3D motif since it has a clear function related to its EC number and it belongs to a unique CATH homologous superfamily. DALI classifies the transferases of ADP3 into two very different folds, which suggests the local similarity of their binding sites is lost in DALI's global structure comparison process.</p>
            <p>The case of the 3 remaining 3D patterns (ADP4, AxP0, AxP3) requires further analysis to decide if they can be classified as 3D motifs. Consequently, we decided to check if any of the 13 patterns (Table <tblr tid="T2">2</tblr>) correspond to structural templates generated from the annotated catalytic sites of CSA <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, see Table <tblr tid="T3">3</tblr>. Submission of the representatives to the web server provided by CSA returned only three hits which were classified either as probable or highly probable. Since two of them did not target the adenine binding site, only one was relevant to this study. This CSA template matched the binding site of 1ses It corresponds very closely to the AxP2 pattern since out of the 5 residues present in CSA, 3 of them are represented in that pattern.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>3D patterns detected by other systems</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Pattern</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Rep.</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>CSA 3D Template</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>SuMo similarity</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>PINTS similarity</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1vom</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>YES</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>YES</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1q0b</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>PROBABLE</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ADP6</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1oxu</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>YES</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>ATP0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1y8q</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>YES</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AxP0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2a40</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>PROBABLE</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <b>AxP2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1ses</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>YES</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>This table consists of the patterns which were identified by other systems able to detect 3D motifs (i.e. CSA, SuMo and PINTS).</p>
               </tblfn>
            </tbl>
            <p>Since CSA recognised only 1 out of our 13 templates, we investigated if they could potentially be detected by other methods. The adenine binding sites of the representative proteins were submitted to the SuMo server for 3D searches for functional sites <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and to the PINTS server find 3D local structural patterns <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, which compared them to all ligand binding sites in the PDB <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. SuMo only detected strong similarities between the binding sites of the core proteins associated with the ADP0 patterns: 5 proteins were ranked very high and only 1 protein was not matched by the site. For PINTS, all core proteins of ADP0 and ATP0 ranked very highly against their cluster representative. The existence of a common pattern was also detected with ADP6 where only 1 protein out of 5 was not highly ranked. Moreover, some evidence of binding site similarity could be collected for ADP1 and AxP0, since respectively 4 out of 7 and 7 out of 13 proteins (all of which are actins) were shown to share similar sites. Results from these three servers confirm that our metric can be used for generating meaningful patterns, most of which are 3D motifs. Moreover, the fact that only 6 of our 13 patterns could potentially have been detected by these systems suggests that our metric captures some new important features from binding sites.</p>
         </sec>
         <sec>
            <st>
               <p>Structural alignment and complex evolutionary links</p>
            </st>
            <p>Contrary to all the other patterns, ADP4 does not correspond to a class with a common function, structure or sequence pattern. The analysis of the properties of the five proteins which comprise its core reveals that three of them are ligases (1ehi, 1e4e &amp; 1gsa) and two are transferases (1kjq &amp; 1iah). Structurally however, the 1ehi, 1e4e, 1gsa and 1kjq ATP-binding domains are classified by SCOP <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> as belonging to the ATP-grasp fold and the glutathione synthetase ATP-binding domain-like superfamily and share other structure and sequence patterns such as:</p>
            <p>CATH 3.30.1490.20, FSSP 1053, 1055 &amp; 1058, IPR011761 and PS50975</p>
            <p>The 1iah structure, however, is classified by SCOP as a MHCK/EF2 atypical kinase belonging to the protein kinase-like superfamily and fold; its structure and sequence patterns, namely CATH 3.30.200.20, FSSP 601 &amp; 2482, IPR002111, IPR005821 and IPR004166 do not match the ones for the other ADP4 proteins.</p>
            <p>The VAST structure similarity server <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> does not detect any structural neighbour of 1iah within the cluster. Neither does the multiple sequence alignment of these five proteins with either ClustalW <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, MUSCLE <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> or TCoffee <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> show any sequence pattern shared by this cluster. However, their structural alignment constructed using our system highlights the following pattern: [EQ]-X-[ACVY] [MLV], which is completed by two non-polar residues and a conserved Lysine, [FI] [VLI] [K] 36&#8211;72 residues upstream. Multiple comparisons with either the MSDfold <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> or CE_MC <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> structure similarity search engines do not detect this pattern.</p>
            <p>The sequences for these structures can be manually aligned with Jalview <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, using these constraints; see Figure <figr fid="F2">2</figr> where the pattern is highlighted. This reveals the remote similarity between 1iah and the other proteins of the ADP4 pattern. Figure <figr fid="F3">3</figr> shows the 3D pattern and Figure <figr fid="F4">4</figr> shows the superimposition of residues whose atoms belong to the pattern.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Alignment of the sequences using structural constraints</p>
               </caption>
               <text>
                  <p>Alignment of the sequences using structural constraints.</p>
               </text>
               <graphic file="1471-2105-8-321-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Common pattern associated to the ADP4 pattern</p>
               </caption>
               <text>
                  <p><b>Common pattern associated to the ADP4 pattern</b>. Superimposition of atoms from 1ehi, 1e4e, 1gsa, 1kjq &amp; 1iah used for the generation of the pattern. Wireframe shows consensus atoms belonging to the adenine-based ligand. a) CPK colour scheme is used. b) Amino acid/Shapely colour scheme is used.</p>
               </text>
               <graphic file="1471-2105-8-321-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Superimposition of residues associated to the ADP4 pattern</p>
               </caption>
               <text>
                  <p><b>Superimposition of residues associated to the ADP4 pattern</b>. Superimposition of residues (Wireframe representation and Amino acid/Shapely colour scheme) which have atoms belonging to the pattern. a) [FI] [VLI] [K] part of the pattern ([VLI] is not part of the structural pattern). b) [EQ]-X-[ACVY] [MLV] part of the pattern</p>
               </text>
               <graphic file="1471-2105-8-321-4"/>
            </fig>
            <p>Taking into account that 1iah structure has a protein kinase-like fold, it is intriguing why our method clustered the 1iah ATP binding site together with structures from the ATP-grasp fold rather than with other protein kinases clustered in AxP1 (Table <tblr tid="T2">2</tblr>). ATP molecules bind in the cleft between the N- and C-terminal lobs in all ADP4 and AxP1 proteins. Detailed structural analysis <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> revealed that 1iah has a chimeric structure where the N-terminal domain is very similar to domains of classical protein kinases whilst the C-terminal one is similar to domains of the ATP-grasp fold. In that respect the 1iha structure makes a link between protein kinase and the ATP-grasp folds, which explains why its ATP-binding site was clustered together with those belonging to the ATP-grasp fold. Interestingly, the remote similarity between classical protein kinase folds and the ATP-grasp fold had been noted previously and explained using the concept of either convergent <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> or divergent <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> evolution. We thus believe that the detected pattern is meaningful.</p>
         </sec>
         <sec>
            <st>
               <p>Towards protein classification, function annotation &amp; putative site discovery</p>
            </st>
            <p>Possible applications of 3D patterns and motifs are protein classification, functional annotation and the discovery of putative binding sites. In order to evaluate the classification and annotation power of the patterns generated by our method, each of them was used to query all PDB proteins binding purine based ligands. The first targets were all the AMP, ADP and ATP binding proteins. We then added proteins binding two very similar ligands: ANP and ACP. Finally, we looked at quite a different family of ligands with three guanine based ligands: GMP, GDP and GTP. The total number of PDB entries per ligand as well as the search results are shown in Table <tblr tid="T4">4</tblr>.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Total number of PDB entries per ligand and matches against the generated 3D patterns</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>AMP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>ADP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>ATP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>ANP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>ACP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>GMP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>GDP</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>GTP</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDB entries containing ligand</p>
                     </c>
                     <c ca="center">
                        <p>128</p>
                     </c>
                     <c ca="center">
                        <p>406</p>
                     </c>
                     <c ca="center">
                        <p>234</p>
                     </c>
                     <c ca="center">
                        <p>125</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>199</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hits against the 13 patterns</p>
                     </c>
                     <c ca="center">
                        <p>14.8%</p>
                     </c>
                     <c ca="center">
                        <p>30.5%</p>
                     </c>
                     <c ca="center">
                        <p>35.5%</p>
                     </c>
                     <c ca="center">
                        <p>36.8%</p>
                     </c>
                     <c ca="center">
                        <p>44.8%</p>
                     </c>
                     <c ca="center">
                        <p>0.0%</p>
                     </c>
                     <c ca="center">
                        <p>0.5%</p>
                     </c>
                     <c ca="center">
                        <p>1.3%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PDB50 entries in valid clusters</p>
                     </c>
                     <c ca="center">
                        <p>16.0%</p>
                     </c>
                     <c ca="center">
                        <p>30.8%</p>
                     </c>
                     <c ca="center">
                        <p>23.3%</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                     <c ca="center">
                        <p>/</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>During the search procedure those proteins whose similarity was higher than the automatic similarity threshold were classified as positive hits, see Table <tblr tid="T4">4</tblr>. For each of these proteins, its annotations, i.e. functions, EC numbers, fold classifications and sequence motifs, were compared manually with those of the pattern and/or of the proteins used to define the pattern. If there was an exact match, the protein was set as true positive; otherwise it was set as false positive. Since this validation required human expertise, only the positive hits were analysed.</p>
            <p>Figure <figr fid="F5">5</figr> presents the results for the evaluation of classification power of the 3D patterns. First, the size and composition of the training set (TS) of each pattern is given. Composition is defined according to the type of ligands which are bound by its core proteins. Secondly, the number and composition of true positives (TP) are shown. Finally, false positives (FP), if any, are described.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Composition of Training Sets (TS), True Positives (TP) &amp; False Positives (FP) against all PDB proteins binding either purine based or guanine based ligands</p>
               </caption>
               <text>
                  <p><b>Composition of Training Sets (TS), True Positives (TP) &amp; False Positives (FP) against all PDB proteins binding either purine based or guanine based ligands</b>. X-axis gives the number of binding sites which are present in each set. Their type, i.e. AMP, ADP, ATP, ANP, ACP, GMP, GDP or GTP binding, is represented by different colours (see legend).</p>
               </text>
               <graphic file="1471-2105-8-321-5"/>
            </fig>
            <p>Approximately 33% of ADP and ATP proteins and 15% of AMP proteins matched the 3D patterns, see Table <tblr tid="T4">4</tblr>. These rates correspond to the percentage of binding sites in the PDB50 set, which were originally present in valid clusters when patterns were generated, see Table <tblr tid="T4">4</tblr>. The rate for ATP proteins is, however, higher than expected because their binding sites often match patterns generated from ADP clusters, see Figure <figr fid="F5">5</figr>. Since physicochemical properties of ANP and ACP are very similar to ADP, proteins binding these ligands were very well classified using motifs generated independently from their binding sites. The high percentage of hits against the patterns for ACP binding sites is not significant because the ACP sample is particularly small (29 entries). As a whole, our 3D patterns matched 31% of proteins with adenine based ligands. Since guanine and adenine molecules are chemically very different, proteins with guanine based ligands were not expected to be matched by our adenine patterns. In fact, only two hits were produced from more than 300 of these proteins. Finally, the rate of false positive is very low (4.5%).</p>
            <p>These results were obtained using automatically generated similarity thresholds, which proved quite conservative. By setting these thresholds manually to optimise the number of proteins matched by each pattern, the number of adenine proteins matched was increased to 38% while maintaining a low rate of false positives (6.5%).</p>
            <p>Although some patterns such as AMP0, AxP2, ADP6 and ATP0 mainly detect the very small number of proteins which were used to produce them, other patterns such as ADP2, ADP3, AxP0 and AxP1 are able to hit a large number of proteins with a low number of false positives. These 3D patterns, therefore, show good potential for protein annotation.</p>
            <p>Although DALI would be a better function predictor on this dataset &#8211; 100% success rate -, it relies on a very dense sampling of the known protein structure space. Indeed, the annotation of the 281 protein structures containing adenine based ligands required 114 different structure representatives, which correspond to 40.5% of the number of structures to annotate. Since our 13 3D motifs contain only a combined total of 368 atom positions, there is a 100-fold difference between the amount of data stored to produce those predictions. This supports our claim that our motifs capture atom positions which are key to binding site activity.</p>
            <p>Since our method can be applied on any ligand containing some rigid structure, many other ligand families could be studied. 3D motifs could be generated, for example, from haem and chlorophyll groups, monosaccharides (e.g. glucose, mannose, frucose, galactose and NAG) and other common rigid ligands such as PLP, FMN, PCA and MES since they are present in many PDB entries.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have presented a new metric which permits the classification of proteins according to the atomic environment of binding sites. The only constraint is that their ligands should contain some rigid structure so that they can be superimposed. By studying proteins binding adenine based ligands, we have demonstrated that the core of the generated clusters are biochemically meaningful and can be used to generate useful 3D motifs. We have shown that these motifs are efficient for protein classification and annotation since their false positive rate is low. In addition, they are quite ligand specific. Since our method was able to rediscover a pattern in the case of ADP4 revealing complex evolutionary links between two classes of proteins, we believe our technique could also be used to detect cases of convergent evolution.</p>
         <p>In future work we plan to develop a software tool which would permit fast and efficient parsing of 3D protein structures to detect putative binding sites corresponding to our atom based 3D motifs.</p>
      </sec>
      <sec>
         <st>
            <p>Method</p>
         </st>
         <sec>
            <st>
               <p>Terminology used in this work</p>
            </st>
            <p>Atom &#8211; in this work we only consider non-hydrogen atoms. Their coordinates are retrieved from PDB files.</p>
            <p>Ligand &#8211; molecule interacting with a protein in either a non-covalent or a covalent fashion &#8211; prosthetic group. In this work we only consider ligands containing rigid 3D structures such as aromatic rings.</p>
            <p>Ligand binding site &#8211; subset of protein atoms that are situated within a distance of 5.0 &#197; from at least one ligand atom. Since proteins often contain several binding sites involving the same type of ligand, only one binding site per PDB entry is considered in order not to introduce any bias during the active site clustering process.</p>
            <p>3D pattern &#8211; consensus atom positions generated from superimposition of ligand binding sites.</p>
            <p>3D motif &#8211; 3D pattern associated with a biochemical function. In this work, biochemical function is defined by either a consensus functional annotation or a consensus sequence motif.</p>
         </sec>
         <sec>
            <st>
               <p>Outline of the methodology</p>
            </st>
            <p>Our method comprises the following steps:</p>
            <p>&#8226; generation of ligand-specific training sets of binding sites,</p>
            <p>&#8226; comparison of the binding sites of the training set proteins,</p>
            <p>&#8226; clustering of these proteins according to the results of their binding site comparison</p>
            <p>&#8226; generation of patterns representing each of the clusters.</p>
            <p>Figure <figr fid="F6">6</figr> provides a detailed description of the methodology.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Outline of the methodology used for the generation of 3D patterns and protein annotation</p>
               </caption>
               <text>
                  <p>Outline of the methodology used for the generation of 3D patterns and protein annotation.</p>
               </text>
               <graphic file="1471-2105-8-321-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Protein structure data sets</p>
            </st>
            <p>For the purpose of 3D pattern generation the PDB50%, a set of PDB protein structures trimmed so that no single pair of proteins has sequence identity higher than 50% <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, was used. PDB50% was chosen because it offers a good compromise between providing a sufficient number of PDB entries for each ligand of interest and preventing the dataset to be too biased by close homologs. For each of the main adenine based ligands, namely AMP, ADP and ATP a training set was constructed, representing a set of ligand binding sites extracted from the subset of PDB50% that binds a particular ligand. The sizes of the training sets generated on 6<sup>th </sup>December 2005 are presented in Table <tblr tid="T1">1</tblr>.</p>
            <p>For the purpose of classification evaluation we used a set of protein structures that bind purine based ligands, namely AMP, ADP, ATP, ANP, ACP, GMP, GDP and GTP, selected from the entire PDB set. The numbers of PDB entries for each of these ligands are presented in Table <tblr tid="T4">4</tblr>.</p>
         </sec>
         <sec>
            <st>
               <p>Binding site comparison</p>
            </st>
            <p>Binding sites of the training set were compared to one another in one-to-one comparisons. Binding sites were superimposed by performing rigid transformations between them according to the atom positions of the rigid structure of their ligands. An algorithm developed by Horn <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> was implemented to determine the translation and rotation that will align atoms in one coordinate system to corresponding atoms in another coordinate system, while minimizing the RMSD between the two fitted sets of atoms. Subsequently, for each pairwise comparison those atoms that could not be paired within a given threshold with atoms of the same chemical type (carbon, oxygen, nitrogen or sulphur) were discarded.</p>
            <p>Consensus sites were then used to estimate the similarity between the protein binding sites. The similarity of two sites A and B, S<sub>AB</sub>, was calculated using Kulczynski's metric:</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-8-321-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mrow>
                                 <m:mi>A</m:mi>
                                 <m:mi>B</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mn>0.5</m:mn>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>C</m:mi>
                                          <m:mrow>
                                             <m:mi>A</m:mi>
                                             <m:mi>B</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>T</m:mi>
                                          <m:mi>A</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mfrac>
                                 <m:mo>+</m:mo>
                                 <m:mfrac>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>C</m:mi>
                                          <m:mrow>
                                             <m:mi>A</m:mi>
                                             <m:mi>B</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>T</m:mi>
                                          <m:mi>B</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mfrac>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWudaWgaaWcbaGaemyqaeKaemOqaieabeaakiabg2da9iabicdaWiabc6caUiabiwda1maabmaabaWaaSaaaeaacqWGdbWqdaWgaaWcbaGaemyqaeKaemOqaieabeaaaOqaaiabdsfaunaaBaaaleaacqWGbbqqaeqaaaaakiabgUcaRmaalaaabaGaem4qam0aaSbaaSqaaiabdgeabjabdkeacbqabaaakeaacqWGubavdaWgaaWcbaGaemOqaieabeaaaaaakiaawIcacaGLPaaaaaa@4224@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where C<sub>AB </sub>is the number of atoms within the pairwise consensus ligand binding site and T<sub>A </sub>(resp. T<sub>B</sub>) is the number of atoms in protein A (resp. B) within its ligand binding site.</p>
            <p>The above similarity matrix S was then used to cluster the set of proteins.</p>
         </sec>
         <sec>
            <st>
               <p>Clustering of ligand binding sites</p>
            </st>
            <p>The clustering process is two-fold: first, binding sites are clustered and secondly the generated clusters are pruned to remove outliers. The first stage is achieved by a state of the art general purpose graph-partitioning based algorithm, CLUTO <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>, which has already been used in a variety of bioinformatics applications <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. Since CLUTO is a partition algorithm, its result depends on the number of partitions (k) and the final clusters may contain outliers. However, these potential limitations do not significantly affect our system. Since the aim of the clustering process is the generation of tight protein clusters which will be subsequently used for 3D pattern generation, not all proteins need to belong to a cluster. Therefore, clusters are post processed so that only cores of the most compact clusters are kept, see next paragraph. Moreover, our method allows clusters to be merged at a later stage of the process. Therefore, the choice of k is not a critical step. In this study k was set at 10% of the number of sites in the training set so that the clusters generated were big enough to generate meaningful consensus patterns while their internal similarity remained high. Cluster quality could be monitored since for each cluster CLUTO provides its average internal and external similarities and their associated standard deviations.</p>
            <p>The second stage of the process is a pruning process, which removes outliers from each cluster. Since binding site relationships are expressed by a similarity matrix, outlier detection is achieved by a distance-based technique <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, where outliers are defined as data points whose similarity to the data centroid is lower than a given threshold value. Since the clusters generated by our method are rather small &#8211; less than 100 members &#8211; outliers may impact significantly on the centroid position. Consequently, our method employs an iterative process, which implicitly calculates the new position of the centroid after an outlier is removed from a cluster.</p>
            <p>Within a cluster of n binding sites, we calculate for each member, i, its average similarity, S<sub>i</sub>, with all the other members of the cluster. The binding site with the lowest average similarity is discarded if it is below a given similarity threshold, S<sub>T</sub>.</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-8-321-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mn>1</m:mn>
                              <m:mrow>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                           </m:mfrac>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mi>j</m:mi>
                                    <m:mo>&#8800;</m:mo>
                                    <m:mi>i</m:mi>
                                 </m:mrow>
                                 <m:mi>n</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>S</m:mi>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:mrow>
                                 </m:msub>
                              </m:mrow>
                           </m:mstyle>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWudaWgaaWcbaGaemyAaKgabeaakiabg2da9maalaaabaGaeGymaedabaGaemOBa4MaeyOeI0IaeGymaedaamaaqahabaGaem4uam1aaSbaaSqaaiabdMgaPjabdQgaQbqabaaabaGaemOAaOMaeyypa0JaeGymaeJaeiilaWIaemOAaOMaeyiyIKRaemyAaKgabaGaemOBa4ganiabggHiLdaaaa@4515@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Site i is an outlier, if <inline-formula><m:math name="1471-2105-8-321-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>S</m:mi><m:mi>i</m:mi></m:msub><m:mo>=</m:mo><m:mi>M</m:mi><m:mi>i</m:mi><m:msubsup><m:mi>n</m:mi><m:mrow><m:mi>j</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>n</m:mi></m:msubsup><m:mrow><m:mo>(</m:mo><m:mrow><m:msub><m:mi>S</m:mi><m:mi>j</m:mi></m:msub></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWudaWgaaWcbaGaemyAaKgabeaakiabg2da9iabd2eanjabdMgaPjabd6gaUnaaDaaaleaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGUbGBaaGcdaqadaqaaiabdofatnaaBaaaleaacqWGQbGAaeqaaaGccaGLOaGaayzkaaaaaa@3D8F@</m:annotation></m:semantics></m:math></inline-formula> and <it>S</it><sub><it>i </it></sub>&lt;<it>S</it><sub><it>T</it></sub></p>
         </sec>
         <sec>
            <st>
               <p>Pattern generation</p>
            </st>
            <p>The general idea behind the generation of a 3D pattern representing a given cluster is to superimpose all the binding sites of a given cluster and generate a cluster consensus ligand binding site. Then the 3D pattern is tested against all binding sites of the training set to evaluate if it is representative of the associated cluster. This task is performed by ranking all binding sites according to their similarity with the pattern of interest after superimposition of their ligands. The top hits should consist of the binding sites of the cluster members &#8211; true positives. If there is any false positive, the pattern is optimised so that the rank of the first false positive is the lowest. Finally, when a pattern is generated for a given cluster, its similarity to the last true positive is stored as a conservative threshold which will permit the automatic decision if an active site can be annotated with the properties associated with the cluster.</p>
            <p>Consensus patterns for sequences and secondary structure elements are usually generated by a hierarchical process of pairwise comparisons <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. However, since atom positions consist of continuous values, pairwise comparisons are not transitive. Therefore, a 3D pattern generation can only be generated after performing all pairwise superimpositions of binding sites which are present in a given cluster. After each comparison, only consensus atoms are kept within the pair of binding sites. This is performed using the technique described in the 'binding site comparison' section. At the end of the process, each binding site is only composed of atoms whose positions are similar within the whole cluster. The 3D pattern is then generated by averaging atom positions from all the binding sites.</p>
            <p>The process of generation of a pattern P is described below in pseudo code where n is the number of binding sites, BS, in a given cluster:</p>
            <p>&#160;&#160;&#160;For i = 1 to n-1</p>
            <p>&#160;&#160;&#160;For j = i+1 to n</p>
            <p>&#160;&#160;&#160;&#160;&#160;&#160;Superimpose(BSi, BSj)</p>
            <p>&#160;&#160;&#160;&#160;&#160;&#160;Generate consensus binding sites Ci &amp; Cj</p>
            <p>&#160;&#160;&#160;&#160;&#160;&#160;BSi = Ci</p>
            <p>&#160;&#160;&#160;&#160;&#160;&#160;BSj = Cj</p>
            <p>&#160;&#160;&#160;endFor</p>
            <p>endFor</p>
            <p>For i = 1 to n</p>
            <p>&#160;&#160;&#160;P = BSi/n</p>
            <p>endFor</p>
            <p>A more detailed description of the 3D pattern generation technique is given in <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
            <p>We define a "good quality pattern" as a pattern containing the minimum number of atoms which allows the discrimination of the binding sites of the cluster from all the binding sites of the training set. While cluster consensus patterns contain the minimum number of atoms whose positions are shared among the binding sites of the cluster, they may not be fully discriminative, i.e. there may be at least one false positive ranking higher than the last true positive. In such cases, new less compact patterns are produced to achieve better binding site discrimination. A new cluster pattern is generated using the process previously described without including the binding site which is the furthest away from the cluster centroid. Although this binding site was not involved in the pattern generation, the discrimination power of the pattern is still evaluated for all active site in the cluster. This process is iterated until a fully representative pattern is generated or only two active sites are left for pattern generation. In the latter case, the most representative pattern is retained. Although this process does not ensure that selected patterns provide the best representation of a cluster, in practice they prove extremely good for binding site discrimination (see 'Results and discussion' section).</p>
            <p>Once patterns are generated for all clusters, they are superimposed in a pairwise fashion to calculate their similarity. The clusters associated with highly similar patterns are merged and new 3D patterns are generated for each of the new clusters.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>DRG and JCN designed the clustering and pattern generation methodology. JCN implemented the methodology and processed data sets. PH and JCN performed data analysis. PH produced biological interpretations. All authors contributed to draft the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>SCOP: a structural classification of proteins database for the investigation of sequences and structures</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>247</volume>
            <fpage>536</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1995.0159</pubid>
                  <pubid idtype="pmpid" link="fulltext">7723011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Mapping the protein universe</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>273</volume>
            <fpage>595</fpage>
            <lpage>603</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.273.5275.595</pubid>
                  <pubid idtype="pmpid" link="fulltext">8662544</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Assigning genomic sequences to CATH</p>
            </title>
            <aug>
               <au>
                  <snm>Pearl</snm>
                  <fnm>FMG</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bray</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Sillitoe</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Todd</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Orengo</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <issue>1</issue>
            <fpage>277</fpage>
            <lpage>282</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102424</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592246</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.277</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>E-MSD: an integrated data resource for bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Golovin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Oldfield</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Tate</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Velankar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Boutselakis</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Dimitropoulos</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Fillon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hussain</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ionides</snm>
                  <fnm>JMC</fnm>
               </au>
               <au>
                  <snm>John</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Krissinel</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>McNeil</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Naim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Newman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Pajon</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pineda</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rachedi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Copeland</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sitnov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sobhany</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Suarez-Uruena</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Swaminathan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tagari</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tromm</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vranken</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Henrick</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D211</fpage>
            <lpage>D216</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308812</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681397</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh078</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A model for statistical significance of local similarities in structure</p>
            </title>
            <aug>
               <au>
                  <snm>Stark</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sunyaev</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2003</pubdate>
            <volume>326</volume>
            <fpage>1307</fpage>
            <lpage>1316</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(03)00045-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12595245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data</p>
            </title>
            <aug>
               <au>
                  <snm>Porter</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Bartlett</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D129</fpage>
            <lpage>D133</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308762</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681376</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh028</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Interactive motif and fold recognition in protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Madsen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kleywegt</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>J Appl Cryst</source>
            <pubdate>2002</pubdate>
            <volume>35</volume>
            <fpage>137</fpage>
            <lpage>139</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1107/S0021889802000602</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures</p>
            </title>
            <aug>
               <au>
                  <snm>Liang</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Banatao</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Klein</snm>
                  <fnm>TE</fnm>
               </au>
               <au>
                  <snm>Brutlag</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3324</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168960</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824318</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg553</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A new bioinformatic approach to detect common 3D sites in protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Jambon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Imberty</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Del&#233;age</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Geourjon</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2003</pubdate>
            <volume>52</volume>
            <issue>2</issue>
            <fpage>137</fpage>
            <lpage>45</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10339</pubid>
                  <pubid idtype="pmpid" link="fulltext">12833538</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>PROSITE: a documented database using patterns and profiles as motif descriptors</p>
            </title>
            <aug>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJA</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gattiker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Falquet</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Brief Bioinform</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>265</fpage>
            <lpage>274</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bib/3.3.265</pubid>
                  <pubid idtype="pmpid" link="fulltext">12230035</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>InterPro, progress and status in 2005</p>
            </title>
            <aug>
               <au>
                  <snm>Mulder</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Attwood</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Binns</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bradley</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Copley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Courcelle</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Das</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Haft</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kanapin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Krestyaninova</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lonsdale</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Letunic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Madera</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Maslen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McDowall</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nikolskaya</snm>
                  <fnm>AN</fnm>
               </au>
               <au>
                  <snm>Orchard</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Quevillon</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Selengut</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Silventoinen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Studholme</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Vaughan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CH</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D201</fpage>
            <lpage>5</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540060</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608177</pubid>
                  <pubid idtype="doi">10.1093/nar/gki106</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation</p>
            </title>
            <aug>
               <au>
                  <snm>Livingstone</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1993</pubdate>
            <volume>9</volume>
            <issue>6</issue>
            <fpage>745</fpage>
            <lpage>56</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8143162</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Generation of 3D templates of active sites of proteins with rigid prosthetic groups</p>
            </title>
            <aug>
               <au>
                  <snm>Nebel</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>10</issue>
            <fpage>1183</fpage>
            <lpage>1189</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl040</pubid>
                  <pubid idtype="pmpid" link="fulltext">16473871</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids</p>
            </title>
            <aug>
               <au>
                  <snm>Laskowski</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Chistyakov</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D266</fpage>
            <lpage>D268</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">539955</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608193</pubid>
                  <pubid idtype="doi">10.1093/nar/gki001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The Protein Data Bank</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Westbrook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gilliland</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bhat</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Weissig</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shindyalov</snm>
                  <fnm>IN</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>PE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>235</fpage>
            <lpage>242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102472</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592235</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.235</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Surprising similarities in structure comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Gibrat</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Madej</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>1996</pubdate>
            <volume>6</volume>
            <issue>3</issue>
            <fpage>377</fpage>
            <lpage>385</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(96)80058-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">8804824</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p>
            </title>
            <aug>
               <au>
                  <snm>Higgins</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid" link="fulltext">7984417</pubid>
                  <pubid idtype="doi">10.1093/nar/22.22.4673</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>MUSCLE: multiple sequence alignment with high accuracy and high throughput</p>
            </title>
            <aug>
               <au>
                  <snm>Edgar</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>5</issue>
            <fpage>1792</fpage>
            <lpage>97</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">390337</pubid>
                  <pubid idtype="pmpid" link="fulltext">15034147</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh340</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>T-Coffee: A novel method for multiple sequence alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>205</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4042</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions</p>
            </title>
            <aug>
               <au>
                  <snm>Krissinel</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Henrick</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Acta Cryst</source>
            <pubdate>2004</pubdate>
            <volume>60</volume>
            <issue>Pt 12 Pt 1</issue>
            <fpage>2256</fpage>
            <lpage>2268</lpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>CE-MC: A multiple protein structure alignment server</p>
            </title>
            <aug>
               <au>
                  <snm>Guda</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sheeff</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Shindyalov</snm>
                  <fnm>IN</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W100</fpage>
            <lpage>W103</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441602</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215359</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh464</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>The Jalview Java Alignment Editor</p>
            </title>
            <aug>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>12</volume>
            <fpage>426</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/bioinformatics/btg430</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Crystal Structure of the Atypical Protein Kinase Domain of a TRP Channel with Phosphotransferase Activity</p>
            </title>
            <aug>
               <au>
                  <snm>Yamaguchi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Matsushita</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Naim</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Kuriyan</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Molecular Cell</source>
            <pubdate>2001</pubdate>
            <volume>7</volume>
            <fpage>1047</fpage>
            <lpage>1057</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(01)00256-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">11389851</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Two unrelated families of ATP-dependent enzymes share extensive structural similarities about their cofactor binding sites</p>
            </title>
            <aug>
               <au>
                  <snm>Denossiuk</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Lehtonen</snm>
                  <fnm>JV</fnm>
               </au>
               <au>
                  <snm>Korpela</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Protein Science</source>
            <pubdate>1998</pubdate>
            <volume>7</volume>
            <fpage>1136</fpage>
            <lpage>1146</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9605318</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Phosphatidylinositol phosphate kinese: a link between Protein Kinase and Glutathione Synthase folds</p>
            </title>
            <aug>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>291</volume>
            <fpage>239</fpage>
            <lpage>247</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.2973</pubid>
                  <pubid idtype="pmpid" link="fulltext">10438618</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Closed-form solution of absolute orientation using unit quaternions</p>
            </title>
            <aug>
               <au>
                  <snm>Horn</snm>
                  <fnm>BKP</fnm>
               </au>
            </aug>
            <source>J Optical Soc Am</source>
            <pubdate>1987</pubdate>
            <volume>4</volume>
            <fpage>629</fpage>
            <lpage>642</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>CLUTO a clustering toolkit</p>
            </title>
            <aug>
               <au>
                  <snm>Karypis</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Technical Report 02-017</source>
            <publisher>Dept of Computer Science, University of Minnesota</publisher>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B28">
            <title>
               <p>wCLUTO: A Web-Enabled Clustering Toolkit</p>
            </title>
            <aug>
               <au>
                  <snm>Crow</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Retzel</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>Plant Physiol</source>
            <pubdate>2003</pubdate>
            <volume>133</volume>
            <issue>2</issue>
            <fpage>510</fpage>
            <lpage>516</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">523878</pubid>
                  <pubid idtype="pmpid" link="fulltext">14555780</pubid>
                  <pubid idtype="doi">10.1104/pp.103.024885</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Mining Phenotypes and Informative Genes from Gene Expression Data</p>
            </title>
            <aug>
               <au>
                  <snm>Tang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pei</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proceedings of SIGKDD'03: August 24&#8211;27 2003</source>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Finding Functionally Related Genes by Local and Global Analysis of MEDLINE Abstracts</p>
            </title>
            <aug>
               <au>
                  <snm>Nakken</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proceedings of Search and Discovery in Bioinformatics Workshop: July 29th 2004, Sheffield</source>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Glazko</snm>
                  <fnm>GV</fnm>
               </au>
               <au>
                  <snm>Mushegian</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>5</issue>
            <fpage>R32</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">416468</pubid>
                  <pubid idtype="pmpid" link="fulltext">15128446</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-5-r32</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Clustering of gene expression data using a local shape-based similarity measure</p>
            </title>
            <aug>
               <au>
                  <snm>Balasubramaniyan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>H&#252;llermeier</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Weskamp</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>K&#228;mper</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>7</issue>
            <fpage>1069</fpage>
            <lpage>1077</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti095</pubid>
                  <pubid idtype="pmpid" link="fulltext">15513997</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Effective Pre-processing Strategies for Functional Clustering of a Protein-Protein Interactions Network</p>
            </title>
            <aug>
               <au>
                  <snm>Ucar</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Parthasarathy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Asur</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proceedings of the IEEE 5th Symposium on Bioinformatics &amp; Bioengineering (BIBE05): October 2005</source>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Distance-Based Outliers: Algorithms and Applications</p>
            </title>
            <aug>
               <au>
                  <snm>Knorr</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Tucakov</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>VLDB journals: Very Large Data Bases</source>
            <pubdate>2000</pubdate>
            <volume>8</volume>
            <fpage>237</fpage>
            <lpage>253</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s007780050006</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <aug>
               <au>
                  <cnm>NC-IUBMB</cnm>
               </au>
               <au>
                  <snm>Webb</snm>
                  <fnm>EC</fnm>
               </au>
            </aug>
            <source>Enzyme Nomenclature 1992</source>
            <publisher>San Diego: Academic Press</publisher>
            <pubdate>1992</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>
