<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-161</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>A simple and fast heuristic for protein structure comparison</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Pelta</snm>
               <mi>A</mi>
               <fnm>David</fnm>
               <insr iid="I1"/>
               <email>dpelta@decsai.ugr.es</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Gonz&#225;lez</snm>
               <mi>R</mi>
               <fnm>Juan</fnm>
               <insr iid="I1"/>
               <email>jrgonzalez@decsai.ugr.es</email>
            </au>
            <au id="A3">
               <snm>Moreno Vega</snm>
               <fnm>Marcos</fnm>
               <insr iid="I2"/>
               <email>jmmoreno@ull.es</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Models of Decision and Optimization Research Group, Dept. of Computer Science and Artificial Intelligence, University of Granada, Spain</p>
            </ins>
            <ins id="I2">
               <p>Department of Statistics, Operations Research and Computation (DEIOC), University of La Laguna, Spain</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>161</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/161</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18366735</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-161</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>19</day>
               <month>3</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>25</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>25</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Pelta et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Protein structure comparison is a key problem in bioinformatics. There exist several methods for doing protein comparison, being the solution of the Maximum Contact Map Overlap problem (MAX-CMO) one of the alternatives available. Although this problem may be solved using exact algorithms, researchers require approximate algorithms that obtain good quality solutions using less computational resources than the formers.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We propose a variable neighborhood search metaheuristic for solving MAX-CMO. We analyze this strategy in two aspects: 1) from an optimization point of view the strategy is tested on two different datasets, obtaining an error of 3.5%(over 2702 pairs) and 1.7% (over 161 pairs) with respect to optimal values; thus leading to high accurate solutions in a simpler and less expensive way than exact algorithms; 2) in terms of protein structure classification, we conduct experiments on three datasets and show that is feasible to detect structural similarities at SCOP's family and CATH's architecture levels using normalized overlap values. Some limitations and the role of normalization are outlined for doing classification at SCOP's fold level.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We designed, implemented and tested.a new tool for solving MAX-CMO, based on a well-known metaheuristic technique. The good balance between solution's quality and computational effort makes it a valuable tool. Moreover, to the best of our knowledge, this is the first time the MAX-CMO measure is tested at SCOP's fold and CATH's architecture levels with encouraging results.</p>
               <p>Software is available for download at <url>http://modo.ugr.es/jrgonzalez/msvns4maxcmo</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The comparison of the 3D structures of protein molecules is a challenging problem. The search for effective solution techniques is required because such tools aid scientists in the development of procedures for drug design, in the identification of new types of protein architecture, in the organization of the known universe of protein structures and could assist in the discovery of unexpected evolutionary and functional inter-relations among them.</p>
         <p>Moreover, good protein structures comparison techniques could be also used in the evaluation of <it>ab-initio, threading or homology modeling </it>structure predictions. It is claimed that the comparison of proteins' structures, and subsequent classification (according to similarity) is a fundamental aspect of today's research in important fields of modern Structural Genomics and Proteomics <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>Several types of strategies and methodologies are applied for protein structure comparison and it is out of the scope of this work to perform an exhaustive review. As a showcase, we may cite the use of dynamic programming <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, comparisons of distance matrices <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, graph theory <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, geometrical hashing <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, principle component correlation analysis <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, local and global alignment <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, consensus shapes <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, consensus structures <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, Kolmogorov complexity <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, Fuzzy Contact Map Overlap <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, and comparing proteins as paths in 3D <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The interested reader in the field of structural bioinformatics may refer to <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp> for updated information.</p>
         <p>The Maximum Contact Map Overlap problem (MAX-CMO) is a mathematical model that allows to compare the similarity of two protein structures. This model represents each protein as a contact map where spatially close elements of interest are indicated in a matrix. Then, the objective is to construct an alignment that maximizes certain cost. An alignment indicates a correspondence between the elements (amino acid residues or atoms) of both proteins.</p>
         <p>In these last years, exact algorithms for solving MAX-CMO have been developed. Among them, we should cite the initial work of <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and then extended in <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. More recently, <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp> presented another strategy for optimally solving MAX-CMO. However, we may find several reasons to justify the application of approximate algorithms for MAX-CMO:</p>
         <p>&#8226; the problem of maximizing the overlap between two contact maps is NP-hard <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B23">23</abbr></abbrgrp>, existing particular problem instances, i.e. particular pairs of contact maps, where the exact algorithms may fail to return a solution in reasonable time.</p>
         <p>&#8226; exact algorithms are expensive and hard to code. For example, they may involve (as in <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>) the usage of a local search strategy or even a genetic algorithm for obtaining lower bounds (with their corresponding parameter setting), or a linear programming solver for obtaining upper bounds. Moreover, if a running time limit is established, they may finish without any solution at all.</p>
         <p>&#8226; the availability of exact methods is limited. To the best of our knowledge, just the algorithm presented in <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> is available trough Internet <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, although there is a limitation on the size of submitted problems (no more than 100 residues) and the CPU time given for solving it (a maximum of 10 minutes).</p>
         <p>&#8226; MAX-CMO aims to maximize a purely geometrical relation between graphs so a set of suboptimal solutions may also provide insights in terms of the biological meaning of the alignment.</p>
         <p>&#8226; due to potential errors in the 3D coordinates determination, we may argue against the usefulness of having exact solutions for protein pairs coming from (maybe) erroneous contact maps. As stated in <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, the experimental errors in the determination of the atomic Cartesian coordinates by X-Ray Crystallography or NMR may range from 0.01 to 1.27&#197; which is close to the value of some covalent bonds.</p>
         <p>In this work, we pursue two objectives: firstly, we propose a Variable Neighborhood Search (VNS) strategy for solving MAX-CMO and we show that this strategy allows to obtain near optimal results using reduced computational resources and time.</p>
         <p>Secondly, the role of MAX-CMO for doing clustering and classification has only been done at the SCOP's family level (in the so called "Skolnick's dataset) and we propose to assess if the (normalized) overlap values returned by our strategy offers a proper ranking of structural similarity at other different structural levels.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>The validation of the proposed method is done through two different computational experiments: we compare the VNS approach against the exact algorithm from Xie and Sahinidis <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> in pure optimization terms; then we verify if our VNS is able to obtain a proper ranking among a set of proteins at different levels of structural similarity.</p>
         <sec>
            <st>
               <p>0.1 Is VNS beneficial from an optimization point of view?</p>
            </st>
            <p>In the first computational experiment we compare our VNS implementation against the results from <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. As test bed for comparison, we use two datasets described in <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> (see Table <tblr tid="T1">1</tblr> for details): a) Skolnick, with 40 proteins and 161 optimally solved pairs, and b) Lancia, with 269 proteins and 2702 optimally solved pairs.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Datasets' information. "Pairs" stands for the number of pairwise comparisons performed. The values for "Contacts" corresponds to contact maps at 7&#197;.</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Residues</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Contacts</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dataset</p>
                     </c>
                     <c ca="center">
                        <p>Proteins</p>
                     </c>
                     <c ca="center">
                        <p>Pairs</p>
                     </c>
                     <c ca="center">
                        <p>Min.</p>
                     </c>
                     <c ca="center">
                        <p>Avg.</p>
                     </c>
                     <c ca="center">
                        <p>Max.</p>
                     </c>
                     <c ca="center">
                        <p>Min.</p>
                     </c>
                     <c ca="center">
                        <p>Avg.</p>
                     </c>
                     <c ca="center">
                        <p>Max.</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Lancia</p>
                     </c>
                     <c ca="center">
                        <p>269</p>
                     </c>
                     <c ca="center">
                        <p>2702</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>57,07</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>95,91</p>
                     </c>
                     <c ca="center">
                        <p>137</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Skolnick</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>161</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>158,23</p>
                     </c>
                     <c ca="center">
                        <p>255</p>
                     </c>
                     <c ca="center">
                        <p>265</p>
                     </c>
                     <c ca="center">
                        <p>470,93</p>
                     </c>
                     <c ca="center">
                        <p>815</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Fischer</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>4624</p>
                     </c>
                     <c ca="center">
                        <p>56</p>
                     </c>
                     <c ca="center">
                        <p>211,16</p>
                     </c>
                     <c ca="center">
                        <p>581</p>
                     </c>
                     <c ca="center">
                        <p>147</p>
                     </c>
                     <c ca="center">
                        <p>636,66</p>
                     </c>
                     <c ca="center">
                        <p>1952</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Nh3D</p>
                     </c>
                     <c ca="center">
                        <p>806</p>
                     </c>
                     <c ca="center">
                        <p>58838</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>150,33</p>
                     </c>
                     <c ca="center">
                        <p>759</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>432,83</p>
                     </c>
                     <c ca="center">
                        <p>2438</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>We use the contact map data files provided in <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> for a fair comparison and reproducibility purposes. The maps are based on <it>C</it><sub><it>&#945; </it></sub>and the optimal overlap values were kindly provided by the authors of Ref. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
            <p>We limited the experiment is limited to those cases (protein pairs) where the exact algorithm was able to find the optimum within the time of ten days <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
            <p>Experiments on <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, were conducted on three workstations with a 3.0 Ghz CPU and 1.0 Gb of RAM each while our experiments were run on just one workstation with a 2.2 Ghz CPU (AMD Athlon 64 3500+) and 1Gb of RAM. Xie and Sahinidis have recently improved the results from <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> in terms of computing resources needed <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
            <p>We define three versions of our Multistart VNS (MSVNS), corresponding to different parameter settings for one of the neighborhood structures (see details in the Methods section):</p>
            <p>&#8226; MSVNS1: One neighborhood of type <it>neighborhoodMove </it>and 3 neighborhoods of type <it>neighborhoodAdd</it>, having window sizes of 5%, 10% and 15% respectively.</p>
            <p>&#8226; MSVNS2: One neighborhood of type <it>neighborhoodMove </it>and 3 neighborhoods of type <it>neighborhoodAdd </it>having window sizes of 10%, 20% and 30% respectively.</p>
            <p>&#8226; MSVNS3: One neighborhood of type <it>neighborhoodMove </it>and 3 neighborhoods of type <it>neighborhoodAdd </it>having window sizes of 10%, 30% and 50% respectively.</p>
            <p>The strategy has a parameter that controls the number of internal "restarts": i.e. when no improvement can be done from the incumbent solution, the search is restarted from a new randomly generated one. This value is fixed in 150. At the end of the execution, we measure the <it>error</it>(%) with respect to the optimum value. The results are shown in Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Results over 2702 pairs from Lancia's dataset. The error is measured with respect to the optimum value.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Error (%)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Version</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>Avg.</p>
                     </c>
                     <c ca="center">
                        <p>SD</p>
                     </c>
                     <c ca="center">
                        <p>Median</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>MSVNS1</p>
                     </c>
                     <c ca="center">
                        <p>2702 (100%)</p>
                     </c>
                     <c ca="center">
                        <p>5,8765</p>
                     </c>
                     <c ca="center">
                        <p>7,12280</p>
                     </c>
                     <c ca="center">
                        <p>1,9049</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS2</p>
                     </c>
                     <c ca="center">
                        <p>2702 (100%)</p>
                     </c>
                     <c ca="center">
                        <p>3,9959</p>
                     </c>
                     <c ca="center">
                        <p>5,60979</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS3</p>
                     </c>
                     <c ca="center">
                        <p>2702 (100%)</p>
                     </c>
                     <c ca="center">
                        <p>3,5671</p>
                     </c>
                     <c ca="center">
                        <p>5,21332</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Optimally Solved</p>
                     </c>
                     <c ca="center">
                        <p>MSVNS1</p>
                     </c>
                     <c ca="center">
                        <p>1259 (46,60%)</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                     <c ca="center">
                        <p>0,00000</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS2</p>
                     </c>
                     <c ca="center">
                        <p>1522 (56,33%)</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                     <c ca="center">
                        <p>0,00000</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS3</p>
                     </c>
                     <c ca="center">
                        <p>1577 (58,37%)</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                     <c ca="center">
                        <p>0,00000</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Non-Optimally Solved</p>
                     </c>
                     <c ca="center">
                        <p>MSVNS1</p>
                     </c>
                     <c ca="center">
                        <p>1443 (53,40%)</p>
                     </c>
                     <c ca="center">
                        <p>11,0037</p>
                     </c>
                     <c ca="center">
                        <p>6,21068</p>
                     </c>
                     <c ca="center">
                        <p>11,1111</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS2</p>
                     </c>
                     <c ca="center">
                        <p>1180 (43,67%)</p>
                     </c>
                     <c ca="center">
                        <p>9,1499</p>
                     </c>
                     <c ca="center">
                        <p>4,98958</p>
                     </c>
                     <c ca="center">
                        <p>9,0909</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS3</p>
                     </c>
                     <c ca="center">
                        <p>1125 (41,63%)</p>
                     </c>
                     <c ca="center">
                        <p>8,5674</p>
                     </c>
                     <c ca="center">
                        <p>4,73640</p>
                     </c>
                     <c ca="center">
                        <p>8,3333</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Results over 161 pair from Skolnick's dataset. The error is measured with respect to the optimum value.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Error(%)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Version</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                     <c ca="center">
                        <p>Avg</p>
                     </c>
                     <c ca="center">
                        <p>SD</p>
                     </c>
                     <c ca="center">
                        <p>Median</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>MSVNS1</p>
                     </c>
                     <c ca="center">
                        <p>161 (100%)</p>
                     </c>
                     <c ca="center">
                        <p>7,3950</p>
                     </c>
                     <c ca="center">
                        <p>7,44111</p>
                     </c>
                     <c ca="center">
                        <p>5,5556</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS2</p>
                     </c>
                     <c ca="center">
                        <p>161 (100%)</p>
                     </c>
                     <c ca="center">
                        <p>1,8235</p>
                     </c>
                     <c ca="center">
                        <p>2,50117</p>
                     </c>
                     <c ca="center">
                        <p>0,7375</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS3</p>
                     </c>
                     <c ca="center">
                        <p>161 (100%)</p>
                     </c>
                     <c ca="center">
                        <p>1,6744</p>
                     </c>
                     <c ca="center">
                        <p>2,39488</p>
                     </c>
                     <c ca="center">
                        <p>0,4399</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Optimally Solved</p>
                     </c>
                     <c ca="center">
                        <p>MSVNS1</p>
                     </c>
                     <c ca="center">
                        <p>42 (26,09%)</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                     <c ca="center">
                        <p>0,00000</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS2</p>
                     </c>
                     <c ca="center">
                        <p>62 (38,51%)</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                     <c ca="center">
                        <p>0,00000</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS3</p>
                     </c>
                     <c ca="center">
                        <p>68 (42,24%)</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                     <c ca="center">
                        <p>0,00000</p>
                     </c>
                     <c ca="center">
                        <p>0,0000</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Non-Optimally Solved</p>
                     </c>
                     <c ca="center">
                        <p>MSVNS1</p>
                     </c>
                     <c ca="center">
                        <p>119 (73,91%)</p>
                     </c>
                     <c ca="center">
                        <p>10,005</p>
                     </c>
                     <c ca="center">
                        <p>6,98169</p>
                     </c>
                     <c ca="center">
                        <p>9,5092</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS2</p>
                     </c>
                     <c ca="center">
                        <p>99 (61,49%)</p>
                     </c>
                     <c ca="center">
                        <p>2,9655</p>
                     </c>
                     <c ca="center">
                        <p>2,60624</p>
                     </c>
                     <c ca="center">
                        <p>2,2124</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>MSVNS3</p>
                     </c>
                     <c ca="center">
                        <p>93 (57,76%)</p>
                     </c>
                     <c ca="center">
                        <p>2,8987</p>
                     </c>
                     <c ca="center">
                        <p>2,52732</p>
                     </c>
                     <c ca="center">
                        <p>2,3910</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The main thing to notice from both Tables is that as the windows sizes increases, the average error decreases. The best alternative is MSVNS3 with windows sizes of 10&#8211;30&#8211;50 leading to an average error below 3.6% for Lancia's dataset with 2702 pairs, and below 1.7% for the Skolnick's one. As the median values are much lower than the average, Tables also show the number of pairs that were optimally solved and those where the optimum was not reached. For Lancia's dataset, up to 60% of the pairs can be optimally solved, while in Skolnick, the percentage of optimum was around 40%. Again, the percentages of non-solved pairs diminishes as the windows' sizes increases.</p>
            <p>It is also interesting to analyze the subset of pairs that were not optimally solved. Figure <figr fid="F1">1</figr> shows the distribution of such pairs on the Lancia's dataset over five different ranges of percentage of error, for each of the three VNS versions. Figure <figr fid="F2">2</figr> shows the same graph for Skolnick's dataset.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Distribution of gaps (error) from exacts values (in %) in the set of non optimally solved solutions for Lancia's dataset</p>
               </caption>
               <text>
                  <p>Distribution of gaps (error) from exacts values (in %) in the set of non optimally solved solutions for Lancia's dataset.</p>
               </text>
               <graphic file="1471-2105-9-161-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Distribution of gaps (error) from exacts values (in %) in the set of non optimally solved solutions for Skolnick's dataset</p>
               </caption>
               <text>
                  <p>Distribution of gaps (error) from exacts values (in %) in the set of non optimally solved solutions for Skolnick's dataset.</p>
               </text>
               <graphic file="1471-2105-9-161-2"/>
            </fig>
            <p>It can be seen that the quality of the results increases as the VNS version goes from MSVNS1 to MSVNS3 where both, the number of non-optimally solved solutions and the corresponding error are the smallest. The error is below 11% for more than half of the pairs in all cases for Lancia's dateset. When using MSVNS3 the errors get as low as having 87% of the non-optimal pairs solved with less than 17% of error. For Skolnick, MSVNS3 obtains an error lower than 5% for the 87% of non optimally solved pairs.</p>
            <p>Regards to computational effort needed to achieve these results, Table <tblr tid="T4">4</tblr> reflects the total wall clock time for every variant of MSVNS. The bigger the window sizes, the longer the times. This is because as these sizes increase, the number of potential pairings becomes larger, leading to an expected increase of execution time. However, we consider the tradeoff between solutions quality and computational effort as highly reasonable.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Wall clock time required for each variant of VNS to solve all the pairs from each dataset. The number of pairs was 2702 for Lancia's dataset and 161 for Skolnick's one.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Time</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Version</p>
                     </c>
                     <c ca="center">
                        <p>Lancia</p>
                     </c>
                     <c ca="center">
                        <p>Skolnick</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MSVNS1</p>
                     </c>
                     <c ca="center">
                        <p>4 hs. 12 m.</p>
                     </c>
                     <c ca="center">
                        <p>3 hs. 11 m.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MSVNS2</p>
                     </c>
                     <c ca="center">
                        <p>4 hs. 43 m.</p>
                     </c>
                     <c ca="center">
                        <p>4 hs. 48 m.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MSVNS3</p>
                     </c>
                     <c ca="center">
                        <p>5 hs. 7 m.</p>
                     </c>
                     <c ca="center">
                        <p>5 hs. 44 m.</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>These results confirm that the proposed strategy is a useful tool for solving near optimality MAX-CMO for almost all the evaluated protein pairs.</p>
         </sec>
         <sec>
            <st>
               <p>0.2 Is VNS able to rank properly protein similarity ?</p>
            </st>
            <p>Although the analysis from an optimization point of view is relevant, it is also interesting to check the quality of VNS as a protein structure classifier. In other words, we want to assess if it is really necessary to solve MAX-CMO exactly to perform structure classification.</p>
            <p>Moreover, overlap values are not adequate <it>per se </it>for classification purposes because such values depend on the size of the proteins being compared. Indeed a normalization scheme should be applied and we illustrate that this may play a crucial role in protein classification.</p>
            <p>There is no general agreement on how to do normalization, but at least, three alternatives are available.</p>
            <p>1. <it>norm</it>1(<it>P</it><sub><it>i</it></sub>, <it>P</it><sub><it>j</it></sub>) = <it>overlap</it>(<it>P</it><sub><it>i</it></sub>, <it>P</it><sub><it>j</it></sub>)<it>/min</it>(<it>contacts P</it><sub><it>i</it></sub>, <it>contacts P</it><sub><it>j</it></sub>)</p>
            <p>2. <it>norm</it>2(<it>P</it><sub><it>i</it></sub>, <it>P</it><sub><it>j</it></sub>) = 2 <it>* overlap</it>(<it>P</it><sub><it>i</it></sub>, <it>P</it><sub><it>j</it></sub>)/(<it>contacts P</it><sub><it>i </it></sub>+ <it>contacts P</it><sub><it>j</it></sub>)</p>
            <p>3. <inline-formula><m:math name="1471-2105-9-161-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>n</m:mi><m:mi>o</m:mi><m:mi>r</m:mi><m:mi>m</m:mi><m:mn>3</m:mn><m:mo stretchy="false">(</m:mo><m:msub><m:mi>P</m:mi><m:mi>i</m:mi></m:msub><m:mo>,</m:mo><m:msub><m:mi>P</m:mi><m:mi>j</m:mi></m:msub><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mrow><m:mo>{</m:mo><m:mrow><m:mtable columnalign="left"><m:mtr columnalign="left"><m:mtd columnalign="left"><m:mn>0</m:mn></m:mtd><m:mtd columnalign="left"><m:mrow><m:mtext>if&#160;the&#160;contacts&#160;difference&#160;is&#160;greater&#160;than&#160;</m:mtext><m:mn>75</m:mn><m:mi>%</m:mi></m:mrow></m:mtd></m:mtr><m:mtr columnalign="left"><m:mtd columnalign="left"><m:mrow><m:mi>n</m:mi><m:mi>o</m:mi><m:mi>r</m:mi><m:mi>m</m:mi><m:mn>1</m:mn><m:mo stretchy="false">(</m:mo><m:msub><m:mi>P</m:mi><m:mi>i</m:mi></m:msub><m:mo>,</m:mo><m:msub><m:mi>P</m:mi><m:mi>j</m:mi></m:msub><m:mo stretchy="false">)</m:mo></m:mrow></m:mtd><m:mtd columnalign="left"><m:mrow><m:mtext>otherwise</m:mtext></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOBa4Maem4Ba8MaemOCaiNaemyBa0MaeG4mamJaeiikaGIaemiuaa1aaSbaaSqaaiabdMgaPbqabaGccqGGSaalcqWGqbaudaWgaaWcbaGaemOAaOgabeaakiabcMcaPiabg2da9maaceaabaqbaeaabiGaaaqaaiabicdaWaqaaiabbMgaPjabbAgaMjabbccaGiabbsha0jabbIgaOjabbwgaLjabbccaGiabbogaJjabb+gaVjabb6gaUjabbsha0jabbggaHjabbogaJjabbsha0jabbohaZjabbccaGiabbsgaKjabbMgaPjabbAgaMjabbAgaMjabbwgaLjabbkhaYjabbwgaLjabb6gaUjabbogaJjabbwgaLjabbccaGiabbMgaPjabbohaZjabbccaGiabbEgaNjabbkhaYjabbwgaLjabbggaHjabbsha0jabbwgaLjabbkhaYjabbccaGiabbsha0jabbIgaOjabbggaHjabb6gaUjabbccaGiabiEda3iabiwda1iabcwcaLaqaaiabd6gaUjabd+gaVjabdkhaYjabd2gaTjabigdaXiabcIcaOiabdcfaqnaaBaaaleaacqWGPbqAaeqaaOGaeiilaWIaemiuaa1aaSbaaSqaaiabdQgaQbqabaGccqGGPaqkaeaacqqGVbWBcqqG0baDcqqGObaAcqqGLbqzcqqGYbGCcqqG3bWDcqqGPbqAcqqGZbWCcqqGLbqzaaaacaGL7baaaaa@919A@</m:annotation></m:semantics></m:math></inline-formula></p>
            <p>First and second alternatives were proposed in <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> respectively. Here, we propose the third alternative to avoid the comparison of two structures whit completely different sizes.</p>
            <p>We perform three computational experiments to analyze our proposal. Firstly, we made an all against all comparison in Skolnick's dataset to check wether a clustering can discriminate among 5 SCOP families. Secondly, we test the performance of the strategy to detect similarity at SCOP's fold level, using Fischer's dataset <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. For this experiments, comparison with DaliLite is also performed. Finally, we made a set of queries over the NH3D database <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> to evaluate the capability of a nearest neighbor classification to detect similarity at CATH's architecture level. Comparisons are then made against DaliLite and MatAlign.</p>
            <sec>
               <st>
                  <p>0.2.1 Experiments with Skolnick's dataset</p>
               </st>
               <p>For this experiment we use again Skolnick's dataset from <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>, which includes a "cluster" label for every protein corresponding to different families in SCOP database <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. We perform an all against all comparison among the proteins using the best (MSVNS3) and worst (MSVNS1) versions of our strategy (from an optimization sense). Then we constructed a similarity matrix for each MSVNS configuration using the overlap values normalized with <it>Norm1, Norm2, Norm3</it>. Finally, we apply single and average linkage hierarchical clustering as implemented in R software package <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> with the final objective of evaluating if the original grouping can be recovered from the overlap values or not.</p>
               <p>Both, MSVNS1 and MSVNS3, are able to perfectly recover the original grouping independently of the normalization and clustering algorithms. Figure <figr fid="F3">3</figr> shows particular examples, where single and average linkage clustering are applied over the similarity matrix normalized with <it>Norm1</it>. For visualization purposes, the class number is displayed at the right of the protein name.</p>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>Hierarchical Clustering based on the normalized overlap values (using <it>Norm</it>1) among proteins in Skolnick's dataset</p>
                  </caption>
                  <text>
                     <p>Hierarchical Clustering based on the normalized overlap values (using <it>Norm</it>1) among proteins in Skolnick's dataset. The upper dendrograms (a, b) correspond to single linkage clustering and the lower ones (c, d) to average linkage clustering. Dendrograms on the left (a, c) are for MSVNS1 results while dendrograms on the right (c, d) correspond to MSVNS3 results.</p>
                  </text>
                  <graphic file="1471-2105-9-161-3"/>
               </fig>
               <p>The study performed in this dataset shows that our strategy can replicate the results obtained using exact methods but with less computational effort and a simple strategy. Moreover, this experiment confirms that correct classification may be performed using non-exact Max-CMO values. Both elements are important results <it>per se</it>.</p>
            </sec>
            <sec>
               <st>
                  <p>0.2.2 Experiments on Fischer's dataset</p>
               </st>
               <p>We perform a second experiment using Fischer's dataset (described in Table II from <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>) composed by 68 proteins and comprising several classes and folds. Table <tblr tid="T1">1</tblr> provides information about protein sizes.</p>
               <p>An all-against-all comparison was performed using MSVNS2 and DaliLite <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> with default parameters. In this case, the objective is to analyze the ability of the methods to recognize structural similarities at (SCOP) fold level.</p>
               <p>Figure <figr fid="F4">4</figr> shows the global ROC curves for fold level, while Table <tblr tid="T5">5</tblr> shows the corresponding area under curve (AUC), calculated with SPSS 14.03&#169;. Notably DaliLite achieved the highest AUC's values. However, when we discriminate AUC's values in terms of the fold, as shown in Fig. <figr fid="F5">5</figr>, two notorious elements should be highlighted. First, we found that for some folds DaliLite is not the best, and second, each normalization is able to detect certain types of folds, while failing in others. For example, the IG fold is best detected with <it>Norm</it>1, while this measure gives the lowest AUC value for TIM-barrel fold.</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>ROC curves for every measure on Fischer's dataset at fold level</p>
                  </caption>
                  <text>
                     <p>ROC curves for every measure on Fischer's dataset at fold level.</p>
                  </text>
                  <graphic file="1471-2105-9-161-4"/>
               </fig>
               <tbl id="T5">
                  <title>
                     <p>Table 5</p>
                  </title>
                  <caption>
                     <p>Area Under the Curve (AUC) for each measure over Fischer's dataset at fold level. An all against all comparison is performed among the 68 proteins in the dataset.</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Asymptotic 95% Conf. Interval</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Variable</p>
                        </c>
                        <c ca="center">
                           <p>Area</p>
                        </c>
                        <c ca="center">
                           <p>Std. Error<sup>(<it>a</it>)</sup></p>
                        </c>
                        <c ca="center">
                           <p>Asymptotic Sig.<sup>(<it>b</it>)</sup></p>
                        </c>
                        <c ca="center">
                           <p>Lower Bound</p>
                        </c>
                        <c ca="center">
                           <p>Upper Bound</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>1</p>
                        </c>
                        <c ca="center">
                           <p>0,795</p>
                        </c>
                        <c ca="center">
                           <p>0,016</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0,765</p>
                        </c>
                        <c ca="center">
                           <p>0,826</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>2</p>
                        </c>
                        <c ca="center">
                           <p>0,805</p>
                        </c>
                        <c ca="center">
                           <p>0,016</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0,774</p>
                        </c>
                        <c ca="center">
                           <p>0,836</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>3</p>
                        </c>
                        <c ca="center">
                           <p>0,797</p>
                        </c>
                        <c ca="center">
                           <p>0,016</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0,765</p>
                        </c>
                        <c ca="center">
                           <p>0,830</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>DaliLiteZScore</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0,933</p>
                        </c>
                        <c ca="center">
                           <p>0,011</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0,912</p>
                        </c>
                        <c ca="center">
                           <p>0,954</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p><sup>(<it>a</it>) </sup>Under the nonparametric assumption</p>
                     <p><sup>(<it>b</it>) </sup>Null hypothesis: true area = 0.5</p>
                  </tblfn>
               </tbl>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>AUC values for every measure on Fischer's dataset for every type of fold</p>
                  </caption>
                  <text>
                     <p>AUC values for every measure on Fischer's dataset for every type of fold.</p>
                  </text>
                  <graphic file="1471-2105-9-161-5"/>
               </fig>
               <p>Moreover, when we discriminate the results in terms of the class of the first protein in the pair, we obtain again some interesting results that are shown in Fig. <figr fid="F6">6</figr>. Table <tblr tid="T6">6</tblr> shows the corresponding area under curve (AUC). DaliLite obtained the highest AUC value in just two (a/b, b) out of 5 classes. In the other cases, the highest value is obtained by some of the normalizations based on the overlap returned by MSVNS. For a total of 68 &#215; 68 = 4624 pairwise comparison, DaliLite detected no similarity for 2800 pairs (60.5%). If we consider those pairs with z-score &lt; 1, then the value grew to 3844 (83.1%).</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>AUC values for every measure on Fischer's dataset for every class</p>
                  </caption>
                  <text>
                     <p>AUC values for every measure on Fischer's dataset for every class.</p>
                  </text>
                  <graphic file="1471-2105-9-161-6"/>
               </fig>
               <tbl id="T6">
                  <title>
                     <p>Table 6</p>
                  </title>
                  <caption>
                     <p>Area Under the Curve (AUC) for each measure over Fischer's dataset at class level. An all against all comparison is performed among the 68 proteins in the dataset.</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Asymptotic 95% Conf. Interval</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Variable</p>
                        </c>
                        <c ca="center">
                           <p>Area</p>
                        </c>
                        <c ca="center">
                           <p>Std. Error<sup>(<it>a</it>)</sup></p>
                        </c>
                        <c ca="center">
                           <p>Asymptotic Sig.<sup>(<it>b</it>)</sup></p>
                        </c>
                        <c ca="center">
                           <p>Lower Bound</p>
                        </c>
                        <c ca="center">
                           <p>Upper Bound</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>1</p>
                        </c>
                        <c ca="center">
                           <p>0,601</p>
                        </c>
                        <c ca="center">
                           <p>0,010</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0,582</p>
                        </c>
                        <c ca="center">
                           <p>0,621</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>2</p>
                        </c>
                        <c ca="center">
                           <p>0,678</p>
                        </c>
                        <c ca="center">
                           <p>0,009</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0,661</p>
                        </c>
                        <c ca="center">
                           <p>0,696</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>3</p>
                        </c>
                        <c ca="center">
                           <p>0,637</p>
                        </c>
                        <c ca="center">
                           <p>0,009</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0,619</p>
                        </c>
                        <c ca="center">
                           <p>0,656</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>DaliLiteZScore</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0,772</p>
                        </c>
                        <c ca="center">
                           <p>0,009</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0,755</p>
                        </c>
                        <c ca="center">
                           <p>0,789</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p><sup>(<it>a</it>) </sup>Under the nonparametric assumption</p>
                     <p><sup>(<it>b</it>) </sup>Null hypothesis: true area = 0.5</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>0.2.3 Experiments on Nh3D database</p>
               </st>
               <p>The last test is done using the Nh3D v3.0 dataset <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> of structurally dissimilar proteins. This dataset has been compiled by selecting well resolved representatives from the Topology level of CATH database. These have been been pruned to remove domains that may contain homologous elements, internal duplications and regions with high B-Factor.</p>
               <p>Our aim is to check if MSVNS2 can properly classify structures at CATH's architecture level. The database has 806 topology representatives belonging to 40 architectures. Table <tblr tid="T1">1</tblr> provides information about protein sizes.</p>
               <p>For each architecture (comprising at least 10 topologies) we select the smallest, biggest and average structure in terms of residues and number of contacts, plus another one randomly selected. After removing duplicates, we obtained a set of 73 structures that constitutes the query set [see Additional File <supplr sid="S1">1</supplr>]. Each query is then compared against every structure in the database. Comparisons are also performed with DaliLite <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and a recently proposed method based as well on distance matrices, MatAlign, claimed to be better than DaliLite and CE <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> in certain cases <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. For the former, we use the z-score as similarity measure, while the raw score is used for the later. For MSVNS2, we made the analysis using the three normalization schemes proposed above. The number of internal restarts for MSVNS2 was fixed to 10 to constraint the execution time up the computation.</p>
               <suppl id="S1">
                  <title>
                     <p>Additional File 1</p>
                  </title>
                  <text>
                     <p>Query set. This file contains the structures that constitute the query set for the experiment on the Nh3D dataset. For each architecture (comprising at least 10 topologies) we selected the smallest, biggest and average structure in terms of residues and number of contacts, plus another one randomly selected. After removing duplicates, this set of 73 structures was obtained.</p>
                  </text>
                  <file name="1471-2105-9-161-S1.xls">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <p>Figure <figr fid="F7">7</figr> displays the ROC curve for every method while Table <tblr tid="T7">7</tblr> reports the corresponding "area under curve" (AUC) values. Again, we note that normalization is a key factor and different conclusions may be obtained. When normalization is <it>Norm1</it>, the AUC value is higher than that of DaliLite. Other alternatives obtain lower values. MatAlign obtains the lowest result. It is important to recall that <it>Norm</it>1, <it>Norm</it>2, <it>Norm</it>3 are based on the overlap value returned by our strategy.</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>ROC curves for every measure on NH3D dataset at CATH's architecture level</p>
                  </caption>
                  <text>
                     <p>ROC curves for every measure on NH3D dataset at CATH's architecture level.</p>
                  </text>
                  <graphic file="1471-2105-9-161-7"/>
               </fig>
               <tbl id="T7">
                  <title>
                     <p>Table 7</p>
                  </title>
                  <caption>
                     <p>Area Under the Curve (AUC) for each measure over NH3D database. The experiment consisted on 73 queries over 806 domains. The analysis is performed at CATH's family level.</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2" ca="center">
                           <p>Asymptotic 95% Conf. Interval</p>
                        </c>
                     </r>
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Variable</p>
                        </c>
                        <c ca="center">
                           <p>Area</p>
                        </c>
                        <c ca="center">
                           <p>Std. Error<sup>(<it>a</it>)</sup></p>
                        </c>
                        <c ca="center">
                           <p>Asymptotic Sig.<sup>(<it>b</it>)</sup></p>
                        </c>
                        <c ca="center">
                           <p>Lower Bound</p>
                        </c>
                        <c ca="center">
                           <p>Upper Bound</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>1</p>
                        </c>
                        <c ca="center">
                           <p>0.608</p>
                        </c>
                        <c ca="center">
                           <p>0.005</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0.599</p>
                        </c>
                        <c ca="center">
                           <p>0.618</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>2</p>
                        </c>
                        <c ca="center">
                           <p>0.582</p>
                        </c>
                        <c ca="center">
                           <p>0.005</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0.572</p>
                        </c>
                        <c ca="center">
                           <p>0.591</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p><it>norm</it>3</p>
                        </c>
                        <c ca="center">
                           <p>0.591</p>
                        </c>
                        <c ca="center">
                           <p>0.005</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0.581</p>
                        </c>
                        <c ca="center">
                           <p>0.601</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>DaliLiteZScore</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.596</p>
                        </c>
                        <c ca="center">
                           <p>0.005</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0.586</p>
                        </c>
                        <c ca="center">
                           <p>0.607</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>
                              <it>ScoreMatAlign</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.538</p>
                        </c>
                        <c ca="center">
                           <p>0.005</p>
                        </c>
                        <c ca="center">
                           <p>0.000</p>
                        </c>
                        <c ca="center">
                           <p>0.528</p>
                        </c>
                        <c ca="center">
                           <p>0.548</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p><sup>(<it>a</it>) </sup>Under the nonparametric assumption</p>
                     <p><sup>(<it>b</it>) </sup>Null hypothesis: true area = 0.5</p>
                  </tblfn>
               </tbl>
               <p>If we trace different curves in terms of the architecture of the query, we may find several interesting behaviors. Some examples of ROC curves are displayed in Figure <figr fid="F8">8</figr> where clear differences arise among methods as a function of the query's architecture.</p>
               <fig id="F8">
                  <title>
                     <p>Figure 8</p>
                  </title>
                  <caption>
                     <p>Examples of ROC curves to show how dependent the strategies are on the architecture type of the query</p>
                  </caption>
                  <text>
                     <p>Examples of ROC curves to show how dependent the strategies are on the architecture type of the query.</p>
                  </text>
                  <graphic file="1471-2105-9-161-8"/>
               </fig>
               <p>Figure <figr fid="F9">9</figr> displays the corresponding AUC values for every architecture, excluding those where all methods achieved AUC = 1. If we assume AUC values as a measure of similarity detection "hardness" then, it is clear that this concept is different for every scoring scheme. From this Figure, we note that no single algorithm outperforms the others for every possible query's architecture.</p>
               <fig id="F9">
                  <title>
                     <p>Figure 9</p>
                  </title>
                  <caption>
                     <p>AUC values for every measure and type of query's architecture on NH3D dataset</p>
                  </caption>
                  <text>
                     <p>AUC values for every measure and type of query's architecture on NH3D dataset. It is clear that none of the methods stays on top of the other ones for all the architecture's types.</p>
                  </text>
                  <graphic file="1471-2105-9-161-9"/>
               </fig>
               <p>It should be noted that, from a total of 73 &#215; 806 = 58838 pairwise comparisons, DaliLite detected no similarity for 43833 pairs (74.5%), leading to several false negatives. As an example, for two out of seven queries belonging to architecture 4.10, DaliLite failed to return these queries as the most similar structures in the database. When the process is repeated with MSVNS, the query itself is given the highest value of similarity in the seven cases.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion and Conclusion</p>
         </st>
         <p>In this work we tested a straight and simple implementation of VNS for the MAX-CMO problem which obtains encouraging results.</p>
         <p>&#191;From an optimization point of view, MSVNS obtained overlap values that were very close to the optimal ones, using a simpler strategy and less computational effort than exact algorithms.</p>
         <p>We can mention at least, three ways to obtain further improvements to our method: a) by trying more specialized neighborhood structures, b) by better tuning the parameters' values chosen c) by starting the search from heuristically generated solutions. We also plan to add a preprocessing step to avoid making comparisons between structures that are very different, as DaliLite does. Moreover, due to its speed and simplicity, VNS may be also considered for obtaining lower bounds in the context of exact algorithms. An important element in several bioinformatics problems is the relation between the optimum value of the objective function and the biological relevance of the corresponding solution. In protein structure comparison we should remember that we are dealing with a mathematical model that captures some aspects of the biological problem, being possible to measure protein structure similarity in several ways. For example, up to 37 measures are reviewed in <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Moreover, besides obtaining the highest overlap values, it is also critical to develop strategies able to obtain a proper similarity ranking of proteins.</p>
         <p>Our experiments showed that in terms of SCOP's family and CATH's architecture levels, the (normalized) overlap values seemed to be good enough to capture the similarity. At the level of SCOP's fold, several points should be consider. Although DaliLite outperformed MSVNS2, it does not imply that our method did badly. More research should be done, specially in the area of normalization, because, as we mentioned the use of different normalization schemes may lead to stronger or weaker strengths to detect particular kinds of folds. We should also recall that all of the experiments were done using contact maps with a fixed threshold and it may be the case that for detecting similarity at fold level, a different value would be needed. Wether the performance of DaliLite for detecting similarity at fold level can be achieved or not with a strategy based on the contact maps model remains open.</p>
         <p>Just to conclude, we should mention that the method was accepted to be incorporated on the ProCKSI-Server <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. ProCKSI is a decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information that computes structure similarities using information theory measures. ProCKSI links to a variety of other sources and uses additional methods to rectify and augment its similarity findings. Our MSVNS was chosen as the method to solve MAX-CMO due to its speed and accuracy.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>0.3 The Maximum Contact Map Overlap Problem</p>
            </st>
            <p>The Maximum Contact Map Overlap problem (MAX-CMO) is a mathematical model that allows to compare the similarity of two protein structures. Under this model, each protein is represented as a contact map (a binary matrix) where two residues are said to be in contact if their Euclidean distance in 3D is below a certain threshold &#8476;. Figure <figr fid="F10">10</figr> shows two alternative representations for a contact map.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>Two contact map representations: as binary matrix (left) and as a graph (right)</p>
               </caption>
               <text>
                  <p>Two contact map representations: as binary matrix (left) and as a graph (right).</p>
               </text>
               <graphic file="1471-2105-9-161-10"/>
            </fig>
            <p>A solution for two contact maps is an alignment of residues (i.e a correspondence between residues in the first contact map to residues on the second one). Aligned or paired residues are considered to be equivalent. The pairings are not allowed to cross: if there exists a pairing <it>i </it>&#8596; <it>j </it>that aligns residues <it>i </it>&#8712; <it>P</it><sub>1</sub>, <it>j </it>&#8712; <it>P</it><sub>2</sub>, then it is not allowed that any other pairing of the form <it>a </it>&#8596; <it>b, a </it>&#8805; <it>i</it>, <it>b </it>&#8804; <it>j </it>exists at the same time.</p>
            <p>In MAX-CMO, the value of an alignment between two proteins is given by the number of cycles of length four. This number is called the overlap of the contact maps and the goal is to maximize this value, i.e. the larger this value, the more similar the two proteins.</p>
            <p>An example appears in Fig. <figr fid="F11">11</figr> where two contact maps are shown. Residues 2, 3, 5 and 7 in the upper protein (<it>P</it><sub>1</sub>) are paired with residues <it>1, 2, 3 </it>and <it>6 </it>in the lower one (<it>P</it><sub>2</sub>). The alignment is represented by dotted lines while the protein' contacts are shown with solid ones.</p>
            <fig id="F11">
               <title>
                  <p>Figure 11</p>
               </title>
               <caption>
                  <p>An example of an alignment between two schematic proteins</p>
               </caption>
               <text>
                  <p>An example of an alignment between two schematic proteins. The overlap value is 2 because two cycles of length four arise. First cycle is composed by the arcs (2 &#8712; <it>P</it><sub>1</sub>, 5 &#8712; <it>P</it><sub>1</sub>), (5 &#8712; <it>P</it><sub>1</sub>, 3 &#8712; <it>P</it><sub>2</sub>), (3 &#8712; <it>P</it><sub>2</sub>, 1 &#8712; <it>P</it><sub>2</sub>) and (1 &#8712; <it>P</it><sub>2</sub>, 2 &#8712; <it>P</it><sub>1</sub>). The second cycle has the following four arcs: (3 &#8712; <it>P</it><sub>1</sub>, 7 &#8712; <it>P</it><sub>1</sub>), (7 &#8712; <it>P</it><sub>1</sub>, 6 &#8712; <it>P</it><sub>2</sub>), (6 &#8712; <it>P</it><sub>2</sub>, 2 &#8712; <it>P</it><sub>2</sub>) and (2 &#8712; <it>P</it><sub>2</sub>, 3 &#8712; <it>P</it><sub>1</sub>).</p>
               </text>
               <graphic file="1471-2105-9-161-11"/>
            </fig>
            <p>This particular alignment produces two cycles of length four. First cycle is composed by the arcs (2 &#8712; <it>P</it><sub>1</sub>, 5 &#8712; <it>P</it><sub>1</sub>), (5 &#8712; <it>P</it><sub>1</sub>, 3 &#8712; <it>P</it><sub>2</sub>), (3 &#8712; <it>P</it><sub>2</sub>, 1 &#8712; <it>P</it><sub>2</sub>) and (1 &#8712; <it>P</it><sub>2</sub>, 2 &#8712; <it>P</it><sub>1</sub>). The second cycle has the following four arcs: (3 &#8712; <it>P</it><sub>1</sub>, 7 &#8712; <it>P</it><sub>1</sub>), (7 &#8712; <it>P</it><sub>1</sub>, 6 &#8712; <it>P</it><sub>2</sub>), (6 &#8712; <it>P</it><sub>2</sub>, 2 &#8712; <it>P</it><sub>2</sub>) and (2 &#8712; <it>P</it><sub>2</sub>, 3 &#8712; <it>P</it><sub>1</sub>).</p>
         </sec>
         <sec>
            <st>
               <p>0.4 MultiStart VNS metaheuristic</p>
            </st>
            <p>Variable Neighborhood Search (VNS) metaheuristic was presented in <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. It is essentially a local search method which includes dynamic changes in the neighborhood of the solutions.</p>
            <p>VNS for MAX-CMO aims to find good solutions by adding and removing pairings using different strategies. The scheme of our proposal is shown in Algorithm 1.</p>
            <p>
               <b>Algorithm 1 Our MultiStart VNS algorithm</b>
            </p>
            <p><b>procedure </b><it>MSV NS</it>()</p>
            <p><b>1: for </b>(<it>start </it>= 0; <it>start </it>&lt; = <it>numStarts</it>; <it>start</it>++ <b>) do</b></p>
            <p><b>2: &#160;&#160;&#160;Initialization: Select the set of neighborhood structures </b><it>N</it><sub><it>k</it></sub>, <b>for </b><it>k </it>= 1, ..., <it>k</it><sub><it>max</it></sub>, <b>that will be used in the search; Find an initial solution </b><it>x</it>; <b>Choose a stopping condition;</b></p>
            <p>
               <b>3: &#160;&#160;&#160;repeat</b>
            </p>
            <p><b>4: &#160;&#160;&#160;&#160;&#160;&#160;Set </b><it>k </it>&#8592; 1;</p>
            <p>
               <b>5: &#160;&#160;&#160;&#160;&#160;&#160;repeat</b>
            </p>
            <p><b>6: &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;(a) Shaking: Generate a point </b><it>x' </it><b>at random from the </b><it>k</it><sup><it>th </it></sup><b>neighborhood of </b><it>x</it>(<it>x' </it>&#8712; <it>N</it><sub><it>k </it></sub>(<it>x</it>))<b>;</b></p>
            <p>
               <b>7: &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;(b) Local search: Apply some local search method with </b>
               <it>x' </it>
               <b>as initial solution; Denote with </b>
               <it>x" </it>
               <b>the so obtained local optimum;</b>
            </p>
            <p><b>8: &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;(c) Move or not: If the local optimum </b><it>x" </it><b>is better than the incumbent </b><it>x</it>, <b>move there (</b><it>x </it>&#8592; <it>x"</it><b>), and continue the search with </b><it>N</it><sub>1</sub>(<it>k </it>&#8592; 1)<b>; otherwise, set </b><it>k </it>&#8592; <it>k </it>+ 1<b>;</b></p>
            <p><b>9: &#160;&#160;&#160;&#160;&#160;&#160;until </b><it>k </it>= <it>k</it><sub><it>max</it></sub></p>
            <p>
               <b>10: &#160;&#160;&#160;&#160;&#160;&#160;simplify (</b>
               <it>x</it>
               <b>);</b>
            </p>
            <p>
               <b>11: &#160;&#160;&#160;until stop condition is met</b>
            </p>
            <p>
               <b>12: end for</b>
            </p>
            <p>A basic VNS algorithm usually follows the pattern shown in lines 2&#8211;11 (excluding line 10). Our algorithm extends the basic VNS by incorporating an extra loop (line 1) for restart and a simplification scheme (<it>simplify </it>function at line 10).</p>
            <p>This <it>simplify </it>function avoids the saturation of solutions and gives more chances to the different neighborhoods used to successfully explore the solution space. It is based on the deletion of useless alignments that do not participate on any cycle of length four. For example in Figure <figr fid="F12">12</figr> the pairings 2 &#8596; 1 and 6 &#8596; 7 belong to a cycle (shown in blue), while the pair 3 &#8596; 6 will be deleted by the <it>simplify </it>function. Two neighboorhood structures are used for the "Shaking" part of the VNS algorithm: a neighborhood that randomly moves a pairing (<it>neighborhoodMove</it>); and a neighborhood that adds a random pairing to the alignment (<it>neighborhoodAdd</it>).</p>
            <fig id="F12">
               <title>
                  <p>Figure 12</p>
               </title>
               <caption>
                  <p>Example of the simplify procedure &#8211; Red pairing, from 6 &#8712; <it>P</it><sub>1 </sub>to 3 &#8712; <it>P</it><sub>2 </sub>will be removed</p>
               </caption>
               <text>
                  <p>Example of the simplify procedure &#8211; Red pairing, from 6 &#8712; <it>P</it><sub>1 </sub>to 3 &#8712; <it>P</it><sub>2 </sub>will be removed.</p>
               </text>
               <graphic file="1471-2105-9-161-12"/>
            </fig>
            <p>The function <it>neighborhoodMove </it>moves a pairing as follows:</p>
            <p>1. it randomly chooses a <it>pair pairCM1 </it>&#8596; <it>pairCM2</it>, where <it>pairCM1 </it>is the residue on contact map 1 and <it>pairCM2 </it>is the residue on contact map 2.</p>
            <p>2. Then it finds the nearest paired residues that <it>pairCM1 </it>has both to the left (<it>pairCM1Left</it>) and to the right (<it>pairCM1Right</it>) and the residues in contact map 2 that they are paired to (<it>pairCM2Left </it>and <it>pairCM2Right </it>respectively).</p>
            <p>3. Once these two intervals are determined, the original <it>pairCM1 </it>&#8596; <it>pairCM2 </it>pair is replaced by a <it>pairCM1</it>' &#8596; <it>pairCM2</it>' pair where <it>pairCM1</it>' &#8712; [<it>pairCM1Left + 1, pairCM1Right - 1</it>] and <it>pairCM2</it>' &#8712; [<it>pairCM2Left + 1, pairCM2Right - 1</it>].</p>
            <p>The application of this function keeps the feasibility of the solution. An example is shown in Fig. <figr fid="F13">13</figr> where the pair 3 &#8596; 5 can be moved to a pair from any residue from 3 to 5 in the first contact map, and any residue from 4 to 6 in the second contact map. Finally, the 4 &#8596; 4 pair is chosen and the original 3 &#8596; 5 pair is removed.</p>
            <fig id="F13">
               <title>
                  <p>Figure 13</p>
               </title>
               <caption>
                  <p>Example of neighborhoodMove procedure</p>
               </caption>
               <text>
                  <p>Example of neighborhoodMove procedure. A pairing to move is chosen and a feasibility region is identified (a). An alternative pairing is selected from such region and the original pairing is replaced (b).</p>
               </text>
               <graphic file="1471-2105-9-161-13"/>
            </fig>
            <p>The function <it>neighborhoodAdd </it>adds a random pairing to the solution, proceeding as follows:</p>
            <p>1. It chooses a random, not paired residue (<it>pairCM1</it>) from contact map 1.</p>
            <p>2. The algorithm finds <it>pairCM2Left </it>and <it>pairCM2Right </it>in the same way as <it>neighborhoodMove </it>does.</p>
            <p>3. Instead of just pairing <it>pairCM1 </it>with a residue between <it>pairCM2Left </it>and <it>pairCM2Right</it>, the range of possible pairings is expanded by the size of a <it>window</it>. The new pairing will be <it>pairCM1 </it>&#8211; <it>pairCM2' </it>where <it>pairCM2</it>' &#8712; [<it>pairCM2Left &#8211; window/2, pairCM2Right </it>+ <it>window/</it>2]. Window sizes are expressed as a percentage of the biggest contact map size (i.e a 10% <it>window </it>for two contact maps of sizes 80 and 100, results in a window of size 10 (a ten percent of 100, the biggest size)).</p>
            <p>4. Since the pairing added can potentially result on an unfeasible solution, we delete all conflicting preexisting pairings. By this way, this neighborhood adds a pairing and may also delete portions of the solution, raising the chances of reconstructing them in a better way. We note that as the <it>window </it>parameter gets high, there are more chances of clearing parts of the solution (thus making room for new pairs).</p>
            <p>Figure <figr fid="F14">14</figr> shows an example where the random residue chosen to be paired is the fourth from contact map 1. The feasibility restrictions only allow its pairing with residue number 6 from contact map 2, giving the result shown in a). This pairing increments in 1 the overlap value by creating a cycle with the pairs 2 &#8596; 3 and 4 &#8596; 6. The effect of using the window concept can be seen in b) where it is possible to obtain the pair 4 &#8596; 3. Then, feasibility will be restored by removing pairs 2 &#8596; 3 and 3 &#8596; 5.</p>
            <fig id="F14">
               <title>
                  <p>Figure 14</p>
               </title>
               <caption>
                  <p>Example of neighborhoodAdd procedure</p>
               </caption>
               <text>
                  <p>Example of neighborhoodAdd procedure. Residue 4 &#8712; <it>P</it><sub>2 </sub>is chosen to be aligned. The pairing may be feasible, as shown in (a) or an unfeasible one (b). In this case, feasibility would be restored by deleting the pairings (3 &#8712; <it>P</it><sub>1</sub>, 2 &#8712; <it>P</it><sub>2</sub>) and (5 &#8712; <it>P</it><sub>1</sub>, 3 &#8712; <it>P</it><sub>2</sub>).</p>
               </text>
               <graphic file="1471-2105-9-161-14"/>
            </fig>
            <p>These two neighborhood structures are used to define the neighborhoods of the main VNS's loop. In this work, we propose the following configuration:</p>
            <p>&#8226; k = 1: <it>neighborhoodMove</it>.</p>
            <p>&#8226; k = 2: <it>neighborhoodAdd </it>with a small <it>window </it>size.</p>
            <p>&#8226; k = 3: <it>neighborhoodAdd </it>with a medium <it>window </it>size.</p>
            <p>&#8226; k = 4: <it>neighborhoodAdd </it>with the big <it>window </it>size.</p>
            <p>So, to keep the basic VNS properties and ideas, the neighborhoods based on <it>neighborhoodAdd </it>are always chosen with increasing <it>window </it>sizes as <it>k </it>increases. For example, the strategy MSVNS3 considers <it>small </it>= 10%, <it>medium </it>= 20% and <it>big </it>= 30%. In this case the search starts <it>neighborhoodMove</it>. When VNS cannot improve the overlap value, then the value of <it>k </it>is incremented, and the search continues with <it>neighborhoodAdd </it>and <it>window </it>= 10%. If the failure to improve continued, the process is repeated with <it>neighborhoodAdd </it>using <it>window </it>= 20%. If necessary, VNS will try with <it>neighborhoodAdd </it>using <it>window </it>= 30% and when this last neighborhood can not improve the solution, it produces a restart. The local search part of the algorithm uses a different neighborhood structure. It loops from the first to the last residue of contact map 1 and tries to pair it with every feasible residue of contact map 2, making an alignment with the first one that improves the current solution (in a greedy-like fashion).</p>
            <p>Finally the stopping criterion for each run of the VNS method is either one hundred iterations or twenty iterations without improvements (whatever comes first).</p>
         </sec>
         <sec>
            <st>
               <p>0.5 Time Comparisons</p>
            </st>
            <p>To properly compare the execution time of a set of algorithms, all of them should be ideally compiled and run in the same computational environment. To overcome the lack of source code availability for exact algorithms, we resort to published results. For the case of DaliLite, we did the time comparison after running the algorithm in our local machine.</p>
            <sec>
               <st>
                  <p>0.5.1 Times for Exact Methods</p>
               </st>
               <p>In the approach presented in <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, authors have to setup time limits ranging from 4 seconds to 10&#8211;30 minutes or even more. Their strategy required one day an a half to optimally solve 1233 pairs from Lancia's dataset. They needed three days to reach 1997 instances and nine, to reach the 2702 instances on a single workstation. Recalling Tables <tblr tid="T2">2</tblr> and <tblr tid="T4">4</tblr>, our worst version MSVNS1 needs approx. 4 hours to solve 2702 pairs, achieving the optimum for 1259 pairs. Unfortunately, execution times for Skolnick's dataset are not provided.</p>
               <p>For the approach presented in <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, the code is not available, so we approximate the times for Lancia's dataset looking at the paper:</p>
               <p indent="1"><it>We ran our methods on a set of 269 proteins with 64 to 72 residues and 80 to 140 contacts each, using a contact threshold of 5 &#197;. For the B &amp; C approach we set a maximum time limit of 1 hour or 15 nodes in the search tree per instance the heuristics were applied at every node, and were limited to at most five minutes per node</it>...</p>
               <p>Then, they go on with:</p>
               <p indent="1">
                  <it>The same 597 pairs were then compared using the LR approach. For each instance we allowed a maximum running time of 1 minute For all instances, the upper bound computed by LR was at least as good as that computed by B &amp; C, however, most of the times bonds were equal. Note that we are not finding the best lagrangian multipliers and hence, in principle, our upper bound may be worse than U1. By using the LR we then compared,..., all 36046 pairs. To speed up the computation, we only explored the root node of the search three and we did not apply the greedy heuristic. Note that running the B&amp;C on all these instances, with a time limit of 1 hour/problem would have taken about four years</it>
               </p>
               <p>Based on these paragraphs we may conclude that a practical use of B&amp; C is not possible and heuristics, like LR, are needed. As such it is possible that a different heuristic finds higher overlap values. The approximate comparison time per pair may be 4.8 seconds (48 hs. (a weekend) divided by 36046 pairs), however the performance of the "greedy heuristic" is not provided.</p>
               <p>Although our times are slightly longer than 4.8 sec., we should note that our contact maps had a threshold of 7&#197;, thus having more contacts per map than those in Lancia's paper and it is not clear how LR execution times would be affected by such increase in the number of contacts.</p>
               <p>Moreover, we have a parameter which is linearly related with the speed of the search, namely the number of internal repeats. Time improvement can be easily achieve by setting the number of internal repeats to a low value. Of course, in terms of optimization, the results may degrade, though in terms of classification, it does not produce a significant impact.</p>
            </sec>
            <sec>
               <st>
                  <p>0.5.2 Times for DaliLite</p>
               </st>
               <p>To analyze the execution times of DaliLite and MSVNS, we perform a simple experiment. We retrieve the biggest five proteins (1.10.645, 1.20.210, 3.20.70, 3.60.120, 3.90.1300) in terms of contacts availabe in NH3D database; then we perform an all-against-all comparison scheme to filter out those pairs where DaliLite can not detect enough similarity to proceed.</p>
               <p>We execute once DaliLite, MSVNS1, MSVNS2 and MSVNS3 on the same machine under the same conditions for every of the remaining eight pairs. Results are reported in Table <tblr tid="T8">8</tblr> and they clearly show that our strategy is faster than DaliLite.</p>
               <tbl id="T8">
                  <title>
                     <p>Table 8</p>
                  </title>
                  <caption>
                     <p>Execution Times for MSVNS and DaliLite. Times correspond to 8 pairwise comparisons among the 5 biggest proteins in NH3D database. Runs were done in the same desktop computer.</p>
                  </caption>
                  <tblbdy cols="2">
                     <r>
                        <c ca="center">
                           <p>Method</p>
                        </c>
                        <c ca="right">
                           <p>Time(sec.)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>MSVNS1</p>
                        </c>
                        <c ca="right">
                           <p>357</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>MSVNS2</p>
                        </c>
                        <c ca="right">
                           <p>508</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>MSVNS3</p>
                        </c>
                        <c ca="right">
                           <p>613</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>DaliLite</p>
                        </c>
                        <c ca="right">
                           <p>758</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JRG and JMM designed and implemented the VNS strategy based on the ideas of DP's work <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. DP designed the experiments and JRG ran them. JRG and DP analyzed the results and wrote the paper. All authors have read and approved this final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work is supported by Projects HeuriCosc TIN2005-08404-C04-01, HeuriCode TIN2005-08404-C04-03, both from the Spanish Ministry of Education and Science.</p>
            <p>JRG acknowledges financial support from Project TIC2002-04242-C03-02.</p>
            <p>Authors thank N. Krasnogor and ProCKSi project (BB/C511764/1) for their support.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Mapping the Protein Universe</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>273</volume>
            <fpage>595</fpage>
            <lpage>602</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8662544</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Protein Structure Similarities</p>
            </title>
            <aug>
               <au>
                  <snm>Koehl</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Current Opinion in Structural Biology</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>348</fpage>
            <lpage>353</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11406386</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Protein Structure Comparison: Algorithms and Applications</p>
            </title>
            <aug>
               <au>
                  <snm>Lancia</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Istrail</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Protein Structure Analysis and Design, of Lecture Notes in Bioinformatics</source>
            <publisher>Springer-Verlag</publisher>
            <editor>Guerra C, Istrail S</editor>
            <pubdate>2006</pubdate>
            <volume>2666</volume>
            <fpage>1</fpage>
            <lpage>33</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Protein Structure Comparison using iterated double dynamic programming</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Protein Science</source>
            <pubdate>1999</pubdate>
            <volume>8</volume>
            <fpage>654</fpage>
            <lpage>665</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2144286</pubid>
                  <pubid idtype="pmpid" link="fulltext">10091668</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Protein Structure Comparison by Alignment of Distance Matrices</p>
            </title>
            <aug>
               <au>
                  <snm>Holm</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1993</pubdate>
            <fpage>123</fpage>
            <lpage>138</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8377180</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Optimal Protein Structure Alignment Using Maximum Cliques</p>
            </title>
            <aug>
               <au>
                  <snm>Strickland</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Barnes</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sokol</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Operations Research</source>
            <pubdate>2005</pubdate>
            <volume>53</volume>
            <issue>3</issue>
            <fpage>389</fpage>
            <lpage>402</lpage>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Multiple Structural Alignment and Core Detection by Geometric Hashing</p>
            </title>
            <aug>
               <au>
                  <snm>Leibowitz</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Fligerman</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nussinov</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wolfson</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Procs of 7th Intern. Conference on Intelligent Systems for Molecular Biology ISMB 99</source>
            <publisher>AAAI Press</publisher>
            <pubdate>1999</pubdate>
            <fpage>169</fpage>
            <lpage>177</lpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Protein structure similarity from principle component correlation analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Xiaobo Zhou</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>40</issue>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1386710</pubid>
                  <pubid idtype="pmpid" link="fulltext">16436213</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>LGA: a method for finding 3D similarities in protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Zemla</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3370</fpage>
            <lpage>3374</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168977</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824330</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Finding Consensus Shape for a Protein Family</p>
            </title>
            <aug>
               <au>
                  <snm>Chew</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kedem</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Proceedings of the 18th ACM Symp. on Computational Geometry. Barcelona, Spain</source>
            <publisher>ACM Press, New York</publisher>
            <pubdate>2002</pubdate>
            <fpage>64</fpage>
            <lpage>73</lpage>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Search for Structural Similarity in Proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Leluk</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Konieczny</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Roterman</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>117</fpage>
            <lpage>124</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12499301</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Measuring the Similarity of Protein Structures by Means of the Universal Similarity Metric</p>
            </title>
            <aug>
               <au>
                  <snm>Krasnogor</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Pelta</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>7</issue>
            <fpage>1015</fpage>
            <lpage>1021</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14751983</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A fuzzy sets based generalization of contact maps for the overlap of protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Pelta</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Krasnogor</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bousono-Calzon</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Verdegay</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Hirst</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Burke</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Journal of Fuzzy Sets and Systems</source>
            <pubdate>2005</pubdate>
            <volume>152</volume>
            <fpage>103</fpage>
            <lpage>123</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Representing and comparing protein structures as paths in three-dimensional space</p>
            </title>
            <aug>
               <au>
                  <snm>Zhi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Krishna</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Cao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pevzner</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Godzik</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>460</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1626488</pubid>
                  <pubid idtype="pmpid" link="fulltext">17052359</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <aug>
               <au>
                  <snm>Bourne</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Weissig</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Structural Bioinformatics</source>
            <publisher>Wiley-Liss, Inc</publisher>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B16">
            <aug>
               <au>
                  <snm>Eidhammer</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Jonassen</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis</source>
            <publisher>Wiley</publisher>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Algorithmic Aspects of Protein Structure Similarity</p>
            </title>
            <aug>
               <au>
                  <snm>Goldman</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Istrail</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Papadimitriou</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science</source>
            <pubdate>1999</pubdate>
            <fpage>512</fpage>
            <lpage>522</lpage>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Structural Alignment of Large-Size Proteins via Lagrangian Relaxation</p>
            </title>
            <aug>
               <au>
                  <snm>Caprara</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lancia</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proceedings of the sixth annual international conference on Computational Biology</source>
            <publisher>ACM</publisher>
            <pubdate>2002</pubdate>
            <fpage>100</fpage>
            <lpage>108</lpage>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Self-Generating Metaheuristics in Bioinformatics: The Proteins Structure Comparison Case</p>
            </title>
            <aug>
               <au>
                  <snm>Krasnogor</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Genetic Programming and Evolvable Machines</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>2</issue>
         </bibl>
         <bibl id="B20">
            <title>
               <p>1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap</p>
            </title>
            <aug>
               <au>
                  <snm>Caprara</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Carr</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Istrail</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lancia</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Walenz</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>27</fpage>
            <lpage>52</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15072687</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A Branch-and-reduce algorithm for the contact map overlap problem</p>
            </title>
            <aug>
               <au>
                  <snm>Xie</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Sahinidis</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>Proceedings of RECOMB of Lecture Notes in Bioinformatics, Springer</source>
            <pubdate>2006</pubdate>
            <volume>3909</volume>
            <fpage>516</fpage>
            <lpage>529</lpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>A Reduction-based exact algorithm for the contact map overlap problem</p>
            </title>
            <aug>
               <au>
                  <snm>Xie</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Sahinidis</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>Journal of Computational Biology</source>
            <pubdate>2007</pubdate>
            <volume>14</volume>
            <issue>5</issue>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17683265</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Studies on the Theory and Design Space of Memetic Algorithms</p>
            </title>
            <aug>
               <au>
                  <snm>Krasnogor</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Ph.D. Thesis, Faculty of Computing, Mathematics and Engineering, University of the West of England, Bristol, United Kingdom</source>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B24">
            <title>
               <p>CMOS Online Server for Protein Structure Alignment via Contact Map Overlap Maximization</p>
            </title>
            <url>http://archimedes.cheme.cmu.edu/group/biosoftware.html</url>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Structural quality assurance</p>
            </title>
            <aug>
               <au>
                  <snm>Laskowski</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Structural Bioinformatics</source>
            <publisher>Wiley-Liss, Inc</publisher>
            <editor>Bourne P, Weissig H</editor>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B26">
            <title>
               <p>101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem</p>
            </title>
            <aug>
               <au>
                  <snm>Lancia</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Carr</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Walenz</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Istrail</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>RECOMB '01: Proceedings of the fifth annual international conference on Computational biology</source>
            <publisher>ACM Press, New York</publisher>
            <pubdate>2001</pubdate>
            <fpage>193</fpage>
            <lpage>202</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Assessing the performance of fold recognition methods by means of a comprehensive benchmark</p>
            </title>
            <aug>
               <au>
                  <snm>Fischer</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Elofsson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Pacific Symp on Biocomputing</source>
            <pubdate>1996</pubdate>
            <fpage>300</fpage>
            <lpage>318</lpage>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Nh3D: A reference dataset of non-homologous protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Thiruv</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Quon</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Saldanha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Steipe</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>BMC Structural Biology</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <issue>12</issue>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1182382</pubid>
                  <pubid idtype="pmpid" link="fulltext">16011803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>SCOP: a structural classification of proteins database for the investigation of sequences and structures</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1995</pubdate>
            <volume>247</volume>
            <fpage>536</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7723011</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <aug>
               <au>
                  <cnm>R Development Core Team</cnm>
               </au>
            </aug>
            <source>R: A Language and Environment for Statistical Computing</source>
            <publisher>R Foundation for Statistical Computing, Vienna, Austria</publisher>
            <pubdate>2006</pubdate>
            <url>http://www.R-project.org</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>DaliLite workbench for protein structure comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Liisa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <issue>6</issue>
            <fpage>566</fpage>
            <lpage>567</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10980157</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Protein structure alignment by incremental combinatorial extension (CE) of the optimal path</p>
            </title>
            <aug>
               <au>
                  <snm>Shindyalov</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Protein Engineering</source>
            <pubdate>1998</pubdate>
            <volume>11</volume>
            <issue>9</issue>
            <fpage>739</fpage>
            <lpage>747</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9796821</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>MatAlign: Precise Protein Structure Comparison by Matrix Alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Aung</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>KL</fnm>
               </au>
            </aug>
            <source>Journal of Bioinformatics and Computational Biology</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <issue>6</issue>
            <fpage>1197</fpage>
            <lpage>1216</lpage>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Towards More Meaningful Hierarchical Classification of Aminoacids Scoring Matrices</p>
            </title>
            <aug>
               <au>
                  <snm>May</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proteins: Structure, Function and Genetics</source>
            <pubdate>1999</pubdate>
            <volume>37</volume>
            <fpage>20</fpage>
            <lpage>29</lpage>
         </bibl>
         <bibl id="B35">
            <title>
               <p>ProCKSI: a decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information</p>
            </title>
            <aug>
               <au>
                  <snm>Barthel</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hirst</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Blazewicz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Burke</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Krasnogor</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>416</issue>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2222653</pubid>
                  <pubid idtype="pmpid" link="fulltext">17963510</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Developments in