<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2148-4-33</ui>
   <ji>1471-2148</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Reconstruction of ancestral protein sequences and its applications</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Cai</snm>
               <fnm>Wei</fnm>
               <insr iid="I2"/>
               <email>wcai@biochem.swmed.edu</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Pei</snm>
               <fnm>Jimin</fnm>
               <insr iid="I2"/>
               <email>jpei@chop.swmed.edu</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Grishin</snm>
               <mi>V</mi>
               <fnm>Nick</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>grishin@chop.swmed.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., Dallas, TX. 75390-9050, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., Dallas, TX. 75390-9050, USA</p>
            </ins>
         </insg>
         <source>BMC Evolutionary Biology</source>
         <issn>1471-2148</issn>
         <pubdate>2004</pubdate>
         <volume>4</volume>
         <issue>1</issue>
         <fpage>33</fpage>
         <url>http://www.biomedcentral.com/1471-2148/4/33</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15377393</pubid>
               <pubid idtype="doi">10.1186/1471-2148-4-33</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>04</day>
               <month>6</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>17</day>
               <month>9</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>17</day>
               <month>9</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Cai et al; licensee BioMed Central Ltd.</collab>
         <note>This is an open-access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Modern-day proteins were selected during long evolutionary history as descendants of ancient life forms. <it>In silico </it>reconstruction of such ancestral protein sequences facilitates our understanding of evolutionary processes, protein classification and biological function. Additionally, reconstructed ancestral protein sequences could serve to fill in sequence space thus aiding remote homology inference.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We developed ANCESCON, a package for distance-based phylogenetic inference and reconstruction of ancestral protein sequences that takes into account the observed variation of evolutionary rates between positions that more precisely describes the evolution of protein families. To improve the accuracy of evolutionary distance estimation and ancestral sequence reconstruction, two approaches are proposed to estimate position-specific evolutionary rates. Comparisons show that at large evolutionary distances our method gives more accurate ancestral sequence reconstruction than PAML, PHYLIP and PAUP*. We apply the reconstructed ancestral sequences to homology inference and functional site prediction. We show that the usage of hypothetical ancestors together with the present day sequences improves profile-based sequence similarity searches; and that ancestral sequence reconstruction methods can be used to predict positions with functional specificity.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>As a computational tool to reconstruct ancestral protein sequences from a given multiple sequence alignment, ANCESCON shows high accuracy in tests and helps detection of remote homologs and prediction of functional sites. ANCESCON is freely available for non-commercial use. Pre-compiled versions for several platforms can be downloaded from <url>ftp://iole.swmed.edu/pub/ANCESCON/</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Present-day protein sequences can be used to reconstruct ancestral sequences based on a model of sequence evolution. Such knowledge about ancestral sequences is helpful for understanding the evolutionary processes as well as the functional aspects of a protein family. Existing methods of ancestral sequence reconstruction can be divided into two main categories: Maximum Parsimony (MP) methods <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp> and Maximum Likelihood (ML) methods <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. MP methods do not take into account biased substitution patterns between amino acids or different tree branch lengths, and cannot distinguish those equally parsimonious reconstructions <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. ML methods do not have these limitations and generally give more reliable results than the MP methods <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Yang et al. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> first developed a ML method for ancestral sequence reconstruction. Yang <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> also made a distinction between "joint" reconstruction and "marginal" reconstruction. Joint reconstruction methods intend to find the most likely set of amino acids for all internal nodes at a site, which yields the maximum joint likelihood of the tree <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Marginal reconstruction compares the probabilities of different amino acids at an internal node at a site and selects the amino acid that yields the maximum likelihood for the tree at that site. Marginal reconstruction can also compute probabilities of all other amino acids for that node <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Koshi and Goldstein <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> developed a fast dynamic programming algorithm for marginal reconstruction in the framework of Bayesian statistics, while Pupko et al. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> proposed a fast algorithm for joint reconstruction. The computational complexities for both algorithms scale linearly with the number of sequences. Both marginal and joint reconstruction algorithms are implemented in our program.</p>
         <p>All reconstruction methods require a phylogenetic tree inferred from a given alignment. The quality of the tree is crucial for the reliability of reconstruction. A number of methods exist for phylogenetic inference, such as maximum likelihood <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, distance-based <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and parsimony <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Distance-based methods have the advantage of being simple and are able to handle a large set of sequences. They require evolutionary distances estimated for all the sequence pairs. The most common method to infer phylogeny from distances is based on the neighbor-joining algorithm <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Bruno et al. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> introduced a distance-based phylogeny reconstruction method called "Weighbor", i.e. "weighted neighbor joining", which takes into account the fact that errors in distance estimates are larger for longer distances. Giving similar results, Weighbor is much faster than ML phylogeny reconstruction. It is also better than other methods such as BIONJ <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and parsimony <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, in aspects of "long branches attract" and "long branch distracts" problems <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Weighbor is used in our program for phylogenetic inference.</p>
         <p>Overwhelming evidence exists for substitution rate variation across sites <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. For a protein family, rate heterogeneity reflects the selective pressure imposed by folding, stability and function. Gamma distribution is widely used to model the rate variation among sites <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> because of its simplicity. Nielsen <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> suggested a method for site-by-site estimation of rate factors by a Maximum Likelihood approach. Rate variation among sites has not been taken into account in the early work of ML reconstruction of ancestral sequences <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. Recently, Pupko et al. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> introduced rate variation into joint reconstruction by a branch-and-bound algorithm, assuming a gamma distribution of rates among sites. In our package, two methods are proposed to estimate a rate factor for each site. The first one is based on our observation that the substitution rate at a site is correlated with the conservation of the site. The more conserved the site is in a multiple sequence alignment, the smaller its substitution rate is. This empirical method, the result of which we call Alignment-Based rate factors or <it>&#945;</it><sub><it>AB</it></sub>, relies only on a multiple sequence alignment and a general model of amino acid exchange. The other one is a maximum likelihood method (<it>&#945;</it><sub><it>ML</it></sub>), which requires a tree. In our implementation, we incorporate <it>&#945;</it><sub><it>AB </it></sub>or <it>&#945;</it><sub><it>ML </it></sub>in the joint and marginal reconstruction algorithms <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. <it>&#945;</it><sub><it>AB </it></sub>is also used in the Maximum Likelihood estimation of evolutionary distances <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> for tree inference.</p>
         <p>We implement a method of evolutionary simulation that introduces site-specific rate variations in a natural way by imposing structural and functional constraints <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. We show by simulations that the reconstruction methods can give reasonable results and that the problem of evolutionary distance underestimation <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> is alleviated by considering rate variation across sites.</p>
         <p>Background (or equilibrium) amino acid frequencies (<it>&#960;</it>) are usually estimated from the target set of sequences or from large databases of protein families. Background amino acid frequencies estimated from a small dataset tend to have bias, while amino acid frequencies from large databases may not be suitable for the specific protein family under analysis. Here, we propose a ML method to optimize the amino acid frequency vector <it>&#960;</it>. The optimized <it>&#960; </it>vector can give significant improvement over the likelihood of a alignment.</p>
         <p>Information obtained from ancestral sequence reconstruction is used for two applications: homology detection and prediction of functional sites. For homology detection, ancestral sequences represent an enlargement of the sequence space around native sequences. We demonstrate that adding reconstructed ancestral sequences to a native alignment improves the detection of homologs in database searches.</p>
         <p>A number of methods have been developed to predict functional sites from amino acid sequences <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. One simple way to infer functional sites is by positional conservation of a multiple sequence alignment <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Lichtarge et al. <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> proposed a method called evolutionary trace to predict functional sites by analyzing the conservation of sequence subgroups. Functional divergence during the evolutionary process can be reflected in the variation of amino acid usage across different functional subgroups. We propose a new approach that uses information from ancestral sequence reconstruction to identify sites that are well conserved within individual sub-trees but exhibit variability among different sub-trees. By several examples, we show that these sites frequently contribute to the functional specificity of a protein family.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>We developed a package (ANCESCON) to reconstruct ancestral protein sequences considering rate variation among sites. Rate factors can be estimated either by an empirical method or by a maximum likelihood method. Consideration of rate variation among sites not only improves evolutionary distance estimation, but also gives more accurate ancestral sequence reconstruction. Ancestral sequences are used to improve profile-based sequence similarity searches. We also propose a new approach to predict positions with functional specificity based on the reconstruction of ancestral sequences.</p>
         <sec>
            <st>
               <p>Observed <it>&#945;</it>, Alignment Based Rate Factor <it>&#945; </it>(<it>&#945;</it><sub><it>AB</it></sub>) and Rate Factor <it>&#945; </it>estimated by Maximum Likelihood (<it>&#945;</it><sub><it>ML</it></sub>)</p>
            </st>
            <p>Evolutionary simulations based on a <it>Z</it>-score model introduce rate variation across sites in a natural way by incorporating structural and functional constraints specific for a protein family <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The simulation procedure is a Monte Carlo simulation of the amino acid substitution process. The fixation of substitutions is dictated by a simple scoring function, which is derived from the template structure and an alignment of its homologs. The number of substitutions occurring at each site can be recorded during the simulation process and the observed <it>&#945; </it>at a site equals the number of recorded substitutions at that site divided by the average substitution number for all sites. To reduce sampling variance, an average observed <it>&#945; </it>vector is calculated from 100 simulations.</p>
            <p>For the alignment consisting of all the leaf node sequences generated by the simulation process, an <it>&#945;</it><sub><it>AB </it></sub>vector was calculated according to equation (11) (for details see Methods). An average <it>&#945;</it><sub><it>AB </it></sub>vector was derived from 100 simulations. Correlation coefficient between the average <it>&#945;</it><sub><it>AB </it></sub>vector and the average observed <it>&#945; </it>vector was high (data not shown). However, we found that for large observed <it>&#945; </it>values, the corresponding <it>&#945;</it><sub><it>AB </it></sub>values were smaller. A constant <it>&#946; </it>was introduced to correct this underestimation in equation (11).</p>
            <p>
               <graphic file="1471-2148-4-33-i1.gif"/>
            </p>
            <p>Here, <it>&#945;</it><sub><it>i </it></sub>is Alignment-Based rate factor at site <it>i</it>. <it>K </it>is the number of sites in a given alignment. <it>C</it><sub><it>i </it></sub>is the value assigned to site <it>i </it>(for details see Methods).</p>
            <p>We optimized the <it>&#946; </it>value by fitting the average <it>&#945;</it><sub><it>AB </it></sub>vector and average observed <it>&#945; </it>vector to <it>y </it>= <it>x </it>line. Alignments for three different protein families (trypsin, carboxypeptidase and pdz domain) gave a good empirical estimation for <it>&#946; </it>of about 1.3. The relation between this corrected average <it>&#945;</it><sub><it>AB </it></sub>vector and average observed <it>&#945; </it>vector is shown in Figure <figr fid="F1">1a</figr> for a typical example, the pdz domain (correlation coefficient 0.973).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>a) Correlation between average <it>&#945;</it><sub><it>AB </it></sub>and average observed <it>&#945;</it>. b) Correlation between average <it>&#945;</it><sub><it>ML </it></sub>and average observed <it>&#945;</it></p>
               </caption>
               <text>
                  <p><b>a) Correlation between average <it>&#945;</it><sub><it>AB </it></sub>and average observed <it>&#945;</it>. b) Correlation between average <it>&#945;</it><sub><it>ML </it></sub>and average observed <it>&#945;</it>. </b><it>&#945;</it><sub><it>AB </it></sub>is Alignment-Based rate factor solely depending on the given alignment. <it>&#945;</it><sub><it>ML </it></sub>is rate factor estimated by maximum likelihood method, which requires an alignment and evolutionary tree inferred from the alignment. The protein family used here is the PDZ domain.</p>
               </text>
               <graphic file="1471-2148-4-33-1"/>
            </fig>
            <p>We also estimated an <it>&#945;</it><sub><it>ML </it></sub>vector for each alignment generated from the simulation (for details see Methods). The average <it>&#945;</it><sub><it>ML </it></sub>vector shows good correlation with the average observed <it>&#945; </it>vector (Figure <figr fid="F1">1b</figr>) (correlation coefficient 0.945). <it>&#945;</it><sub><it>AB </it></sub>or <it>&#945;</it><sub><it>ML </it></sub>can be incorporated in likelihood calculation in marginal or joint reconstruction. Table <tblr tid="T1">1</tblr> shows that improvement of logarithm likelihood of the alignment is significant when <it>&#945;</it><sub><it>AB </it></sub>or <it>&#945;</it><sub><it>ML </it></sub>is used.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Difference of logarithm likelihood and CPU time when using different <it>&#945; </it>vectors</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>&#945; </it>= 1.0</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <it>&#945;</it>
                           <sub>
                              <it>AB</it>
                           </sub>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <it>&#945;</it>
                           <sub>
                              <it>ML</it>
                           </sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#916;<it>l</it></p>
                     </c>
                     <c ca="center">
                        <p>P*</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#916;<it>l</it></p>
                     </c>
                     <c ca="center">
                        <p>P*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Logarithm Likelihood</p>
                     </c>
                     <c ca="center">
                        <p>-5324.56</p>
                     </c>
                     <c ca="center">
                        <p>-5087.72</p>
                     </c>
                     <c ca="center">
                        <p>236.84</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                     <c ca="center">
                        <p>-4987.27</p>
                     </c>
                     <c ca="center">
                        <p>337.29</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPU Time (s)<sup>+</sup></p>
                     </c>
                     <c ca="center">
                        <p>213</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>213</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>359</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The alignment tested here is a subset of SH2 family. It includes 44 sequences and each sequence contains 83 amino acids (including gaps).</p>
                  <p>* The likelihood ratio test (LRT) [58] is used to test whether <it>&#945;</it><sub><it>AB </it></sub>and <it>&#945;</it><sub><it>ML </it></sub>are significantly different from <it>&#945; </it>= 1.0. The difference in number of free parameters between <it>&#945;</it><sub><it>AB</it></sub>, <it>&#945;</it><sub><it>ML </it></sub>and <it>&#945; </it>= 1.0 model is 82.</p>
                  <p><sup>+ </sup>CPU times were computed on a Dell PowerEdge 8450 server (CPU 700MHz, RAM 8G).</p>
               </tblfn>
            </tbl>
            <p>Rate variation across sites can be modeled by assuming that the rate factors follow a certain type of statistical distribution. Gamma distribution <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B27">27</abbr></abbrgrp> and its discrete approximations <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> are frequently used for DNA or protein sequences. Rate variation for a protein family reflects different selective pressure at different sites to maintain structure and function. Fewer substitutions are expected to occur in more conserved sites. This hypothesis has prompted us to estimate rate factors (<it>&#945;</it><sub><it>AB</it></sub>) based on sequence conservation in an empirical way. The <it>&#945;</it><sub><it>AB </it></sub>is compared and calibrated using the observed <it>&#945; </it>as standards. Our method of estimating <it>&#945;</it><sub><it>ML </it></sub>is similar to the one proposed by Nielson <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. One problem with site-by-site rate factor estimation is the small sample size at each site, especially with a small alignment. We have used <it>&#945;</it><sub><it>AB </it></sub>to eliminate outliers with very large <it>&#945;</it><sub><it>ML </it></sub>estimates (for details see Methods).</p>
         </sec>
         <sec>
            <st>
               <p>Site-specific rate factors improve distance estimation</p>
            </st>
            <p>Evolutionary distances tend to be underestimated when rate homogeneity among sites is assumed <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. This was tested using the simulation with structural and functional constraints. For the arbitrarily selected tree shown in Figure <figr fid="F2">2</figr>, we obtained leaf node sequences in the simulation and estimated an evolutionary distance for each sequence pair by Maximum Likelihood, either incorporating <it>&#945;</it><sub><it>AB </it></sub>or setting <it>&#945; </it>equal to 1.0 (equation (16)). Evolutionary distances were severely underestimated (average underestimation: 0.894) without considering rate variation among sites (Figure <figr fid="F3">3a</figr>). Introducing <it>&#945;</it><sub><it>AB </it></sub>in the maximum likelihood method gave more accurate distance estimation (Figure <figr fid="F3">3b</figr>), although the distances were still underestimated, especially for small distances (average underestimation: 0.286). We believe that more accurate distances will give more accurate phylogeny reconstruction using "Weighbor" <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Since a tree is required to estimate <it>&#945;</it><sub><it>ML</it></sub>, <it>&#945;</it><sub><it>ML </it></sub>is not incorporated in estimating evolutionary distance.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The tree used to test ancestral sequence reconstruction</p>
               </caption>
               <text>
                  <p><b>The tree used to test ancestral sequence reconstruction. </b>This is an arbitrarily selected evolutionary tree. Evolutionary distances are shown to scale.</p>
               </text>
               <graphic file="1471-2148-4-33-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Comparison of pairwise distances between the rebuilt tree and original tree. a) distance estimation assuming no rate variation among sites; b) distance estimation with <it>&#945;</it><sub><it>AB</it></sub></p>
               </caption>
               <text>
                  <p><b>Comparison of pairwise distances between the rebuilt tree and original tree. a) distance estimation assuming no rate variation among sites; b) distance estimation with <it>&#945;</it><sub><it>AB</it></sub>. </b>The rebuilt tree is inferred from the alignment that is generated by evolutionary simulation performed on the original tree. The original tree is arbitrarily selected.</p>
               </text>
               <graphic file="1471-2148-4-33-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Optimization of equilibrium frequencies</p>
            </st>
            <p>A continuous minimization method by simulated annealing was used to optimize the equilibrium frequency vector <it>&#960;</it>, with the objective function being the logarithm likelihood of the alignment. Our <it>&#960; </it>vector optimization program was tested on four alignments, which were taken from the SH2 and SH3 superfamilies in Pfam database (version 7.3) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Two alignments from the SH2 superfamily have 44 and 87 sequences respectively and both alignment lengths are 83 amino acids (including gaps). The other two alignments from SH3 superfamily have 39 and 94 sequences respectively and both alignment lengths are 57 amino acids (including gaps). For each alignment, we ran optimization 3 times starting from different random initial points. The optimized <it>&#960; </it>vectors did not converge to exactly the same point, but they had a high correlation with each other (always > 0.95) and the difference of logarithm likelihood function values was small (less than 0.1%). The logarithm likelihood of the alignment, using optimized <it>&#960; </it>vector, increased slightly, but significantly (Table <tblr tid="T2">2</tblr>), compared with the logarithm likelihood using the <it>&#960; </it>vector calculated from the alignment.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Difference of logarithm likelihood and CPU time with and without optimization of <it>&#960; </it>vector</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>&#945;</it><sub><it>AB </it></sub>&amp; Calculated <it>&#960;</it></p>
                     </c>
                     <c ca="center">
                        <p><it>&#945;</it><sub><it>AB </it></sub>&amp; Optimized <it>&#960;</it></p>
                     </c>
                     <c ca="center">
                        <p>&#916;<it>l</it></p>
                     </c>
                     <c ca="center">
                        <p>P*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Logarithm Likelihood</p>
                     </c>
                     <c ca="center">
                        <p>-5087.72</p>
                     </c>
                     <c ca="center">
                        <p>-5055.97</p>
                     </c>
                     <c ca="center">
                        <p>31.75</p>
                     </c>
                     <c ca="center">
                        <p>&lt;0.0001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPU Time (s)<sup>+</sup></p>
                     </c>
                     <c ca="center">
                        <p>213</p>
                     </c>
                     <c ca="center">
                        <p>14902</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The alignment tested here is the same alignment used in Table 1. Calculated <it>&#960; </it>means frequency vector calculated from the alignment.</p>
                  <p>* The likelihood ratio test (LRT) [58] is used to test whether optimized <it>&#960; </it>is significantly different from calculated <it>&#960;</it>. The difference in number of free parameters between these two models is 19.</p>
                  <p><sup>+</sup>CPU times were computed on a Dell PowerEdge 8450 server (CPU 700MHz, RAM 8G).</p>
               </tblfn>
            </tbl>
            <p>Optimization of the <it>&#960; </it>vector is time consuming. The running time for reconstruction with or without optimizing <it>&#960; </it>vector is 14,902 seconds and 213 seconds for SH2 alignment (44 sequences), respectively, on a Dell PowerEdge 8450 server (CPU 700MHz, RAM 8G) (Table <tblr tid="T2">2</tblr>). In our program, the default <it>&#960; </it>vector is calculated from the alignment while the user has the option to optimize the <it>&#960; </it>vector for ancestral sequence reconstruction.</p>
         </sec>
         <sec>
            <st>
               <p>Testing reconstruction</p>
            </st>
            <p>Two different methods for simulations of the evolutionary process were used, as described in Methods, to test the reliability of the reconstruction results. In the first simulation method, starting from a randomly generated root sequence, we simulated the evolutionary process to obtain leaf node sequences based on a tree and a rate matrix. This process was repeated 100 times for a given root sequence <it>R </it>to produce 100 alignments consisting of all leaf node sequences. For each of the 100 alignments, we used the marginal reconstruction method to obtain an amino acid probability vector for each site at the root. To reduce sampling variance, the amino acid probability vector was averaged over the 100 simulation trials. At each site, the amino acid with the highest average probability was chosen as our result of the "reconstructed amino acid" at that site. All "reconstructed amino acids" formed the reconstructed sequences <it>R'</it>. There is no difference between <it>R </it>and <it>R'</it>, that is, the accuracy of reconstruction is 100% for the tree shown in Figure <figr fid="F2">2</figr>. For each individual simulation and its reconstruction, we checked the amino acid with the highest probability in the reconstructed probability vector of the root. If it is indeed the "reconstructed amino acid", the prediction for that simulation is correct according to the average reconstructed results. The fraction of individual predictions that are correct according to the average reconstructed results is almost always higher than the average probability of the "reconstructed amino acid", suggesting that the average probability of the "reconstructed amino acid" gives a lower estimation of the reconstruction reliability (Figure <figr fid="F4">4a</figr>).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>a) Correlation between the average probability of "the reconstructed amino acid" and the fraction of correct predictions. b) Correlation between the fraction of correct predictions and average <it>&#945;</it><sub><it>AB </it></sub>at each site</p>
               </caption>
               <text>
                  <p><b>a) Correlation between the average probability of "the reconstructed amino acid" and the fraction of correct predictions. b) Correlation between the fraction of correct predictions and average <it>&#945;</it><sub><it>AB </it></sub>at each site. </b>The protein family used here is the PDZ domain. Red filled points are sites with incorrect reconstruction.</p>
               </text>
               <graphic file="1471-2148-4-33-4"/>
            </fig>
            <p>For the second simulation method, we introduced rate heterogeneity across sites with structural and functional constraints <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. For the same tree, the accuracy of reconstruction was about 90%. Sites with larger substitution rates are expected to have less reliable reconstructions. Figure <figr fid="F4">4b</figr> shows the relationship between the average <it>&#945;</it><sub><it>AB </it></sub>and the fraction of individual predictions that are correct according to the "reconstructed amino acid". Sites with incorrect "reconstructed amino acids" all have large <it>&#945;</it><sub><it>AB </it></sub>values. These values reflect the difficulty of reconstructing sites with large numbers of substitutions. The probabilities of the "reconstructed amino acids" are all small for sites with incorrect reconstructions (less than 0.15), suggesting that the information content of the reconstruction is low.</p>
            <p>The second simulation method was also used to test ANCESCON along with the reconstruction programs from PAML <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, PHYLIP <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and PAUP* <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. All tree topologies used in reconstruction tests were inferred from real alignments. All original root sequences were taken from PDB database <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. We had three different types of alignment testing sets. The first testing set used the same tree topology but different root sequences to generate 100 alignments (for details see Methods). The second testing set used the same root sequence but different tree topologies. The third testing set randomly selected a root sequence and a tree topology to generate 100 alignments. After 100 alignments were generated, we reconstructed the root sequence for each alignment and found the consensus root sequence for the 100 reconstructed root sequences. Finally, the consensus root sequence was compared with the original root sequence to calculate the reconstruction accuracy, i.e. the fraction of correctly reconstructed sites for the root sequence. In addition, for the third test, the paired t-test was used to calculate the one-tail probability between ANCESCON and other three methods. In order to make different tree topologies comparable, those trees were scaled to make the average distance from root to all leaf nodes (<it>d</it><sub><it>a</it></sub>) the same for all trees and equal to the tree of pii1 (a signal transduction protein) (<it>d</it><sub><it>a </it></sub>= 4.23). If <it>d</it><sub><it>a </it></sub>was too small (e.g. 0.5), the reconstruction accuracy was always close to 1 for all reconstruction methods used. The value <it>d</it><sub><it>a </it></sub>= 4.23 was large enough to generate diverse sequences to differentiate 4 different ancestral sequence reconstruction methods.</p>
            <p>For ANCESCON we had 3 different parameter settings, which included site-specific rate factors estimated by maximum likelihood method (<it>&#945;</it><sub><it>ML</it></sub>), Alignment-Based rate factors (<it>&#945;</it><sub><it>AB</it></sub>) and no rate factors (equal rates among sites). Different parameters were also used for the reconstruction programs from PAML and PHYLIP to find their best reconstructions. For PAML, reconstruction was tested with parameter <it>&#945; </it>(rate factor) estimated from alignment and without <it>&#945;</it>. For PHYLIP, 4 different parameter settings were tried, which were combinations of with/without <it>&#945; </it>estimated from alignment by PAML and with/without branch length dwelling in input tree topology. For PAUP*, default settings were used.</p>
            <p>Table <tblr tid="T3">3</tblr> shows a comparison of the reconstruction accuracy for these 4 methods. The reconstruction accuracy of ANCESCON with <it>&#945;</it><sub><it>ML </it></sub>is higher than the other three methods in almost every test. Also the reconstruction accuracy of ANCESCON with <it>&#945;</it><sub><it>AB </it></sub>and without <it>&#945; </it>is comparable with PAML and PHYLIP methods and is much better than PAUP*. For the first testing set, the best average accuracy for ANCESCON is about 0.5, while the best average reconstruction accuracies for PAML, PHYLIP and PAUP* are 0.45, 0.39 and 0.32 respectively. Testing set 2 and 3 produce similar results. Using the paired t-test in the third testing set, we show that ANCESCON method with <it>&#945;</it><sub><it>ML </it></sub>gives significantly better reconstruction than the other 3 methods. Because the site-specific <it>&#945;</it><sub><it>ML </it></sub>is very close to the true mutation rate at a site (Figure <figr fid="F1">1b</figr>), using the site-specific <it>&#945;</it><sub><it>ML </it></sub>can improve our ability to reconstruct the amino acids for ancestral sequences correctly. These reconstruction tests suggest that ANSCESCON may be a better tool to reconstruct ancestral sequences compared to PAML, PHYLIP and PAUP* if the given alignment contains more diverse sequences.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Ancestral sequence reconstruction accuracy by different programs</p>
               </caption>
               <tblbdy cols="13">
                  <r>
                     <c ca="center">
                        <p>Root Seq.</p>
                     </c>
                     <c ca="center">
                        <p>Tree</p>
                     </c>
                     <c ca="center">
                        <p>Leaf Node Num.</p>
                     </c>
                     <c cspan="10" ca="center">
                        <p>Methods</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>ANCESCON</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PAML</p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>PHYLIP $</p>
                     </c>
                     <c ca="center">
                        <p>PAUP*</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="9">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>&#945;</it>
                           <sub>
                              <it>ML</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>&#945;</it>
                           <sub>
                              <it>AB</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-<it>&#945;</it></p>
                     </c>
                     <c ca="center">
                        <p>+<it>&#945;</it></p>
                     </c>
                     <c ca="center">
                        <p>-<it>&#945;</it></p>
                     </c>
                     <c ca="center">
                        <p>+L +<it>&#945;</it></p>
                     </c>
                     <c ca="center">
                        <p>-L +<it>&#945;</it></p>
                     </c>
                     <c ca="center">
                        <p>+L -<it>&#945;</it></p>
                     </c>
                     <c ca="center">
                        <p>-L -<it>&#945;</it></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="13">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1em2</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.45</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                     <c ca="center">
                        <p>0.37</p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="center">
                        <p>0.27</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="center">
                        <p>0.26</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1g9o</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.56</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1rgg</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>0.60</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                     <c ca="center">
                        <p>0.60</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.62</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                     <c ca="center">
                        <p>0.58</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.56</p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1sgt</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.38</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                     <c ca="center">
                        <p>0.27</p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1zm2</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.33</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="center">
                        <p>0.3</p>
                     </c>
                     <c ca="center">
                        <p>0.28</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>0.27</p>
                     </c>
                     <c ca="center">
                        <p>0.16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2a8v</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.62</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.45</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.56</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>0.28</p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>0.36</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.53</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                     <c ca="center">
                        <p>0.38</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3" ca="center">
                        <p>Average accuracy</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.496</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.383</p>
                     </c>
                     <c ca="center">
                        <p>0.390</p>
                     </c>
                     <c ca="center">
                        <p>0.446</p>
                     </c>
                     <c ca="center">
                        <p>0.431</p>
                     </c>
                     <c ca="center">
                        <p>0.354</p>
                     </c>
                     <c ca="center">
                        <p>0.381</p>
                     </c>
                     <c ca="center">
                        <p>0.271</p>
                     </c>
                     <c ca="center">
                        <p>0.393</p>
                     </c>
                     <c ca="center">
                        <p>0.323</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>gef</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.54</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.37</p>
                     </c>
                     <c ca="center">
                        <p>0.38</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="center">
                        <p>0.17</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>LacI</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.66</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.64</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.37</p>
                     </c>
                     <c ca="center">
                        <p>0.49</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>pdz</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.54</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>0.18</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>ph</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.79</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.74</p>
                     </c>
                     <c ca="center">
                        <p>0.75</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="center">
                        <p>0.45</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.43</p>
                     </c>
                     <c ca="center">
                        <p>0.37</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.53</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                     <c ca="center">
                        <p>0.38</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>ptb</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.58</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.43</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.38</p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>0.26</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>sh2</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.61</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                     <c ca="center">
                        <p>0.43</p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                     <c ca="center">
                        <p>0.30</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.20</p>
                     </c>
                     <c ca="center">
                        <p>0.27</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>sh3</p>
                     </c>
                     <c ca="center">
                        <p>43</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.83</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>0.80</p>
                     </c>
                     <c ca="center">
                        <p>0.62</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="center">
                        <p>0.69</p>
                     </c>
                     <c ca="center">
                        <p>0.45</p>
                     </c>
                     <c ca="center">
                        <p>0.66</p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>GST</p>
                     </c>
                     <c ca="center">
                        <p>140</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.76</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                     <c ca="center">
                        <p>@</p>
                     </c>
                     <c ca="center">
                        <p>@</p>
                     </c>
                     <c ca="center">
                        <p>#</p>
                     </c>
                     <c ca="center">
                        <p>#</p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                     <c ca="center">
                        <p>0.38</p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3" ca="center">
                        <p>Average accuracy<sup>&amp;</sup></p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.635</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.524</p>
                     </c>
                     <c ca="center">
                        <p>0.518</p>
                     </c>
                     <c ca="center">
                        <p>0.451</p>
                     </c>
                     <c ca="center">
                        <p>0.421</p>
                     </c>
                     <c ca="center">
                        <p>0.371</p>
                     </c>
                     <c ca="center">
                        <p>0.281</p>
                     </c>
                     <c ca="center">
                        <p>0.325</p>
                     </c>
                     <c ca="center">
                        <p>0.313</p>
                     </c>
                     <c ca="center">
                        <p>0.289</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1em2</p>
                     </c>
                     <c ca="center">
                        <p>pdz</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.45</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.36</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.29</p>
                     </c>
                     <c ca="center">
                        <p>0.43</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>0.4</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1g9o</p>
                     </c>
                     <c ca="center">
                        <p>pii1</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.56</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1rgg</p>
                     </c>
                     <c ca="center">
                        <p>sh2</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.64</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>0.61</p>
                     </c>
                     <c ca="center">
                        <p>0.61</p>
                     </c>
                     <c ca="center">
                        <p>0.56</p>
                     </c>
                     <c ca="center">
                        <p>0.59</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>0.6</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1sgt</p>
                     </c>
                     <c ca="center">
                        <p>gef</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.49</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.36</p>
                     </c>
                     <c ca="center">
                        <p>0.45</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>1zm2</p>
                     </c>
                     <c ca="center">
                        <p>ptb</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.66</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.47</p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.52</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2a8v</p>
                     </c>
                     <c ca="center">
                        <p>ph</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.81</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.78</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.81</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.71</p>
                     </c>
                     <c ca="center">
                        <p>0.74</p>
                     </c>
                     <c ca="center">
                        <p>0.60</p>
                     </c>
                     <c ca="center">
                        <p>0.61</p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>0.65</p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>2ctb</p>
                     </c>
                     <c ca="center">
                        <p>LacI</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.66</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.64</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.37</p>
                     </c>
                     <c ca="center">
                        <p>0.49</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3" ca="center">
                        <p>Average accuracy</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.610</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.510</p>
                     </c>
                     <c ca="center">
                        <p>0.507</p>
                     </c>
                     <c ca="center">
                        <p>0.540</p>
                     </c>
                     <c ca="center">
                        <p>0.529</p>
                     </c>
                     <c ca="center">
                        <p>0.486</p>
                     </c>
                     <c ca="center">
                        <p>0.496</p>
                     </c>
                     <c ca="center">
                        <p>0.367</p>
                     </c>
                     <c ca="center">
                        <p>0.494</p>
                     </c>
                     <c ca="center">
                        <p>0.397</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3" ca="center">
                        <p>Probability<sup>&#916;</sup></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.0026</p>
                     </c>
                     <c ca="center">
                        <p>0.0023</p>
                     </c>
                     <c ca="center">
                        <p>0.0248</p>
                     </c>
                     <c ca="center">
                        <p>0.0328</p>
                     </c>
                     <c ca="center">
                        <p>0.0007</p>
                     </c>
                     <c ca="center">
                        <p>0.0168</p>
                     </c>
                     <c ca="center">
                        <p>0.0001</p>
                     </c>
                     <c ca="center">
                        <p>0.0143</p>
                     </c>
                     <c ca="center">
                        <p>0.0005</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>All root sequences are taken from PDB database and the names listed in the table are PDB IDs.</p>
                  <p>Tree topologies for gef (guanine nucleotide exchange factor), LacI (PurR/LacI family of bacterial transcription factors), pdz, ph, pii1 (a signal transduction protein), ptb, sh2, sh3 and GST (glutathione S-transferase) are inferred from multiple sequence alignments chosen from Pfam database (version 7.3).</p>
                  <p>All tree topologies are generated from real alignments and the distances are rescaled in order to make the trees comparable.</p>
                  <p>The value in this table represents the accuracy of reconstruction, i.e. the fraction of correctly reconstructed sites for the root sequence. The best reconstruction accuracy in each test is shown in bold.</p>
                  <p><it>&#945;</it><sub><it>ML </it></sub>means that the site-specific rate factors were estimated by maximum likelihood method.</p>
                  <p><it>&#945;</it><sub><it>AB </it></sub>means that the site-specific rate factors were estimated by our empirical equation based on the given alignment (for details see Methods).</p>
                  <p>-<it>&#945; </it>means that the rate factors were not considered in reconstruction.</p>
                  <p>+<it>&#945; </it>means that the rate factors were considered in reconstruction.</p>
                  <p>+L means that branch lengths of the input tree were used in reconstruction, while -L means that branch lengths were estimated by the reconstruction program itself.</p>
                  <p>@: tree topology for GST had 140 leaf nodes that were too many for PAML to run through.</p>
                  <p><sup>$</sup>: rate factors estimated by PAML were used by PHYLIP in ancestral sequence reconstruction.</p>
                  <p>#: tree topology for GST had 140 leaf nodes, which were too many for PAML to estimate rate factors for GST.</p>
                  <p><sup>&amp;</sup>:GST is excluded in calculation of the average.</p>
                  <p><sup>&#916;</sup>: paired t-test method [40] was used to estimate the one-tail probability between ANCESCON and the other three reconstruction methods.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Ancestral sequences used in homology detection</p>
            </st>
            <p>Thirty-eight OB (Oligonucleotide/oligosaccharide binding)-fold <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> proteins and ten other alignments (adenylyl kinase, gef, globin, pdz, ph, ptb, ras, sh2, sh3 and subtilase) from the Pfam database (version 7.3) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> were chosen to perform homology detection tests.</p>
            <p>Given an alignment with <it>N </it>sequences, we had four different methods, "BEST", "SECOND BEST", "SHUFFLE" and "RANDOM", to generate another <it>N</it>-1 sequences (for details see Methods). For each combined alignment (2<it>N</it>-1 sequences), PSI-BLAST <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> searches were performed starting from each sequence and seeded with the combined alignment (-B option in the program BLASTPGP, e-value cutoff 0.01), and all found hits were pooled together.</p>
            <p>The benchmark experiment was PSI-BLAST seeded with the native alignment (<it>N </it>sequences). For each type of the four combined alignments, we checked hits not found by the native alignments. New hits were verified to be true positives or false positives by running PSI-BLAST or HMMER <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, followed by manual inspections.</p>
            <p>Using the 48 native alignments, a total of 13973 hits were found by the benchmark. Compared to the benchmark, the "BEST" method detected 120 new homologs and the other three methods found 69, 74 and 9 new homologs, respectively (Figure <figr fid="F5">5</figr>). Among those new homologs, "BEST", "SECOND BEST", "SHUFFLE" and "RANDOM" methods had 3, 2, 6 and 3 false positives, respectively (Figure <figr fid="F5">5</figr>). Also, "BEST", "SECOND BEST", "SHUFFLE" and "RANDOM" methods missed 61, 1070, 60 and 7811 homologs as compared to the benchmark.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Comparison of "BEST", "SECOND BEST", "SHUFFLE" and "RANDOM" methods in the number of new homologs detected when compared with the benchmark experiment</p>
               </caption>
               <text>
                  <p><b>Comparison of "BEST", "SECOND BEST", "SHUFFLE" and "RANDOM" methods in the number of new homologs detected when compared with the benchmark experiment. </b>The methods are defined in "Methods" section. The blue portion of the bar shows the number of true positives. The red portion of the bar shows the number of the false positives.</p>
               </text>
               <graphic file="1471-2148-4-33-5"/>
            </fig>
            <p>Adding non-native sequences to the native alignment results in a change of sequence profile for PSI-BLAST searches. Random sequences can dilute the position-specific amino acid exchange characteristics of native alignments. This effect should not improve the profile. Indeed, few new homologs are found by the "RANDOM" method. However, sequences generated by shuffling each position of the native alignment have the same conservation properties as the native alignment, and the "SHUFFLE" method detects a total of 74 new homologs. Two effects may account for this finding. First, addition of shuffled sequences to the native alignment can slightly change the estimates of pseudocount frequencies of amino acids and thus the position specific scoring matrix <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Second, the new version of PSI-BLAST program uses composition-based statistics with e-value estimation related to the composition of the query sequence <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Each shuffled sequence has its own amino acid composition that is different from the native sequences. This difference can affect the e-values of hits. The "BEST" method detects the most number of new homologs, suggesting that the reconstructed ancestral sequences resemble the native sequences. Ancestral sequences may therefore be more similar to some remote homologs than to the native sequences. The "SECOND BEST" method detects less new homologs than the "BEST" method but more than the "RANDOM" method, suggesting that the second most probable amino acids in reconstruction can still reflect some properties of native sequences. Table <tblr tid="T4">4</tblr> shows homology detection results of OB-fold structures using reconstructed ancestral sequences.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Homology detection results of OB-fold structures using reconstructed ancestral sequences</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>SCOP Superfamily/family</p>
                     </c>
                     <c ca="left">
                        <p>PDB structure</p>
                     </c>
                     <c ca="center">
                        <p>New homologs</p>
                     </c>
                     <c ca="left">
                        <p>NCBI annotation</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nucleic acid-binding proteins/ Anticodon-binding domain</p>
                     </c>
                     <c ca="left">
                        <p>1b7yB, 39&#8211;151</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1b8aA, 1&#8211;102</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1bbuA, 64&#8211;151</p>
                     </c>
                     <c ca="center">
                        <p>13431467</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase II small subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15598836</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III, alpha chain</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1c0aA, 1&#8211;106</p>
                     </c>
                     <c ca="center">
                        <p>11261591</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III, alpha chain</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>11499379</p>
                     </c>
                     <c ca="left">
                        <p>conserved hypothetical protein</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>1169392</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>118794</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>13620707</p>
                     </c>
                     <c ca="left">
                        <p>putative DNA polymerase III, alpha chain</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14194684</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14194702</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14195653</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14195659</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15594924</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III, subunit alpha</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15598836</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III, alpha chain</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15601899</p>
                     </c>
                     <c ca="left">
                        <p>DnaE</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15642243</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III, alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15669005</p>
                     </c>
                     <c ca="left">
                        <p>M. <it>jannaschii </it>predicted coding region MJ0818</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15679404</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase delta small subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3914611</p>
                     </c>
                     <c ca="left">
                        <p>ATP-dependent DNA helicase recG</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1cuk, 1&#8211;64</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1e1oA, 64&#8211;148</p>
                     </c>
                     <c ca="center">
                        <p>11261591</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III, alpha chain XF0204</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14194684</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1fguA, 181&#8211;298</p>
                     </c>
                     <c ca="center">
                        <p>15219507</p>
                     </c>
                     <c ca="left">
                        <p>hypothetical protein</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15230563</p>
                     </c>
                     <c ca="left">
                        <p>putative protein</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15790309</p>
                     </c>
                     <c ca="left">
                        <p>Vng1255c from <it>Halobacterium </it>sp.</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>6166145</p>
                     </c>
                     <c ca="left">
                        <p>DNA polymerase III alpha subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>8778702</p>
                     </c>
                     <c ca="left">
                        <p>T1N15.20</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1fl0A</p>
                     </c>
                     <c ca="center">
                        <p>10957481</p>
                     </c>
                     <c ca="left">
                        <p>hypothetical protein</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1g51A, 1&#8211;104</p>
                     </c>
                     <c ca="center">
                        <p>14520587</p>
                     </c>
                     <c ca="left">
                        <p>hypothetical protein</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14591565</p>
                     </c>
                     <c ca="left">
                        <p>hypothetical protein</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15595886</p>
                     </c>
                     <c ca="left">
                        <p>hypothetical protein</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3914638</p>
                     </c>
                     <c ca="left">
                        <p>ATP-dependent DNA helicase recG</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1otcB, 36&#8211;126</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1quqA, 62&#8211;152</p>
                     </c>
                     <c ca="center">
                        <p>15387767</p>
                     </c>
                     <c ca="left">
                        <p>probable replication protein a 28 Kd subunit</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1qvcA, 1&#8211;114</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nucleic acid-binding proteins/Cold shock DNA-binding domain like</p>
                     </c>
                     <c ca="center">
                        <p>1a62, 48&#8211;125</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1ah9</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1bkb, 75&#8211;139</p>
                     </c>
                     <c ca="center">
                        <p>15790688</p>
                     </c>
                     <c ca="left">
                        <p>translation initiation factor eIF-5A; Eif5a</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1c9oA</p>
                     </c>
                     <c ca="center">
                        <p>6014735</p>
                     </c>
                     <c ca="left">
                        <p>Cold shock protein CspSt</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1csp</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1d7qA</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1mjc</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1rl2</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1sro</p>
                     </c>
                     <c ca="center">
                        <p>15671445</p>
                     </c>
                     <c ca="left">
                        <p>N utilization substance protein A</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15794781</p>
                     </c>
                     <c ca="left">
                        <p>N utilisation substance protein A</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15803711</p>
                     </c>
                     <c ca="left">
                        <p>transcription pausing; L factor</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>2eifA, 73&#8211;132</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nucleic acid-binding proteins/DNA ligase, mRNA capping enzyme, domain2</p>
                     </c>
                     <c ca="left">
                        <p>1a0i, 241&#8211;349</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1dgsA, 315&#8211;400</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1ckmA, 238&#8211;302</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1fviA, 190&#8211;293</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nucleic acid-binding proteins/Phage ssDNA-binding proteins</p>
                     </c>
                     <c ca="left">
                        <p>1gpc</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1gvp</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1pfs</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nucleic acid-binding proteins/RNA polymerase subunit RBP8</p>
                     </c>
                     <c ca="left">
                        <p>1a1d</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Staphylococcal nuclease/Staphylococcal nuclease</p>
                     </c>
                     <c ca="left">
                        <p>1eyd</p>
                     </c>
                     <c ca="center">
                        <p>13422779</p>
                     </c>
                     <c ca="left">
                        <p>aldose 1-epimerase *</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Bacterial enterotoxins/Bacterial AB5 toxins, B units</p>
                     </c>
                     <c ca="left">
                        <p>1c4qA</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>1prtF</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Bacterial enterotoxins/Superantigen toxins</p>
                     </c>
                     <c ca="left">
                        <p>1an8, 19&#8211;94</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TIMP-like/Tissue inhibitor of metalloproteases</p>
                     </c>
                     <c ca="left">
                        <p>1ueaB, 14&#8211;106</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Inorganic pyrophosphatase/ Inorganic pyrophosphatase</p>
                     </c>
                     <c ca="left">
                        <p>2prd</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MOP-like/BiMOP, duplicated molybdate-binding domain</p>
                     </c>
                     <c ca="left">
                        <p>1b9mA, 127&#8211;262</p>
                     </c>
                     <c ca="center">
                        <p>10639288</p>
                     </c>
                     <c ca="left">
                        <p>probable ATP-binding protein</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>10955070</p>
                     </c>
                     <c ca="left">
                        <p>AgtA</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>1175513</p>
                     </c>
                     <c ca="left">
                        <p>Putative ferric transport ATP-binding protein afuC</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15598450</p>
                     </c>
                     <c ca="left">
                        <p>probable ATP-binding component of ABC transporter</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3978166</p>
                     </c>
                     <c ca="left">
                        <p>ATPase FbpC</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>4895001</p>
                     </c>
                     <c ca="left">
                        <p>glucose ABC transporter ATPase *</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Histidine kinase CheA, C-terminal domain/ Histidine kinase CheA, C-terminal domain</p>
                     </c>
                     <c ca="left">
                        <p>1b3qA, 540&#8211;671</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="left">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* Putative false positives as assessed by manual inspection.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Prediction of functional sites</p>
            </st>
            <p>Ten well-studied protein families (adenylyl kinase, gef, globin, pdz, ph, ptb, ras, sh2, sh3 and subtilase) from the Pfam database (version 7.3) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> were selected to test the prediction of functional sites. To define functional sites, we considered residues falling within 5&#197; of any ligand to be functionally important (i.e. AP5 for adenylyl kinase). As a simple quantification of prediction accuracy, we counted the number of predictions that lie within 5&#197; from the ligands and consider these sites to be true positives.</p>
            <p>Our method intends to identify those sites with high similarity within individual sub-trees and high variation among sub-trees. These sites are likely to contribute to functional specificity. Based on a tree partition and the reconstructions at the cutting nodes (details see Methods), we have developed a measure called specificity score (equation (27)). We expect that both highly variable sites and highly conserved sites tend to score low in our method. Ten top-ranking sites were selected as our predicted functional sites for each family. For comparison, we also implemented a simple conservation (SC) method <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, the evolutionary trace (ET) method <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and the conservation difference (CD) method <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> on the 10 protein families. The results are shown in Table <tblr tid="T5">5</tblr>. Here, the results from these three methods tend to include invariant or highly conserved sites while the result from our method scores those sites low. Still, the number of true positives of our method is comparable to other methods for several families. For some protein families, such as gef, pdz and subtilase, our method predicts no fewer functional residues than the other three methods.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Comparison of the true hits among the top 10 predicted sites for ANCESCON, evolutionary trace (ET), simple conservation (SC), and conservation difference (CD) methods</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="center">
                        <p>Protein Family</p>
                     </c>
                     <c ca="center">
                        <p>PDB ID<sup>#</sup></p>
                     </c>
                     <c ca="center">
                        <p>Ligand/ substrate</p>
                     </c>
                     <c ca="center">
                        <p>Number of sites</p>
                     </c>
                     <c ca="center">
                        <p>*</p>
                     </c>
                     <c ca="center">
                        <p>**</p>
                     </c>
                     <c ca="center">
                        <p>***</p>
                     </c>
                     <c ca="center">
                        <p>ANCESCON</p>
                     </c>
                     <c ca="center">
                        <p>ET</p>
                     </c>
                     <c ca="center">
                        <p>SC</p>
                     </c>
                     <c ca="center">
                        <p>CD</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>adkinase</p>
                     </c>
                     <c ca="center">
                        <p>1aky</p>
                     </c>
                     <c ca="center">
                        <p>AP5</p>
                     </c>
                     <c ca="center">
                        <p>188</p>
                     </c>
                     <c ca="center">
                        <p>42</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>9.5</p>
                     </c>
                     <c ca="center">
                        <p>9.1</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>gef</p>
                     </c>
                     <c ca="center">
                        <p>1bkd</p>
                     </c>
                     <c ca="center">
                        <p>H-Ras</p>
                     </c>
                     <c ca="center">
                        <p>245</p>
                     </c>
                     <c ca="center">
                        <p>47</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>globin</p>
                     </c>
                     <c ca="center">
                        <p>1a6g</p>
                     </c>
                     <c ca="center">
                        <p>HEM</p>
                     </c>
                     <c ca="center">
                        <p>147</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>5.5</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>pdz</p>
                     </c>
                     <c ca="center">
                        <p>1be9</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>81</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>ph</p>
                     </c>
                     <c ca="center">
                        <p>1mai</p>
                     </c>
                     <c ca="center">
                        <p>I3P</p>
                     </c>
                     <c ca="center">
                        <p>109</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>ptb</p>
                     </c>
                     <c ca="center">
                        <p>1shc</p>
                     </c>
                     <c ca="center">
                        <p>PTR</p>
                     </c>
                     <c ca="center">
                        <p>157</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>ras</p>
                     </c>
                     <c ca="center">
                        <p>821p</p>
                     </c>
                     <c ca="center">
                        <p>GTN</p>
                     </c>
                     <c ca="center">
                        <p>185</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>5.6</p>
                     </c>
                     <c ca="center">
                        <p>8.7</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>sh2</p>
                     </c>
                     <c ca="center">
                        <p>1a09</p>
                     </c>
                     <c ca="center">
                        <p>ACE</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>sh3</p>
                     </c>
                     <c ca="center">
                        <p>1nlo</p>
                     </c>
                     <c ca="center">
                        <p>ACE</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>subtilase</p>
                     </c>
                     <c ca="center">
                        <p>1av7</p>
                     </c>
                     <c ca="center">
                        <p>SBL</p>
                     </c>
                     <c ca="center">
                        <p>278</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>4.6</p>
                     </c>
                     <c ca="center">
                        <p>3.8</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>#:</sup>Representative protein structure</p>
                  <p>*: Number of sites within 5&#197; to ligand or substrates</p>
                  <p>**: Number of invariant sites, which may contain gaps</p>
                  <p>***: Number of invariant sites within 5 &#197; to ligand or substrates</p>
                  <p>+: C-terminal peptide of protein CRIPT</p>
               </tblfn>
            </tbl>
            <p>Figure <figr fid="F6">6</figr> shows the mapping of our predictions on the structure for the PDZ domain family. In green color is the ligand and in red color are the functional residues predicted by our method. Six of the predicted residues are within 5&#197; to the peptide ligand. Nine of the predicted residues are around the ligand binding area. Only one is distant from the ligand (Figure <figr fid="F6">6</figr>).</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Mapping top 10 predictions by ANCESCON to PDZ domain (PDB ID: 1be9) [50]</p>
               </caption>
               <text>
                  <p><b>Mapping top 10 predictions by ANCESCON to PDZ domain (PDB ID: 1be9) [50]. </b>The color code scheme: ligand is shown in green and the predicted functional residues are shown in red.</p>
               </text>
               <graphic file="1471-2148-4-33-6"/>
            </fig>
            <p>Another example is the family of adenylyl kinases. Our method identified 3 residues within 5 &#197; to the ligand while the other 3 methods identified more such residues, most of which are in highly conserved positions such as the catalytic residues. Highly conserved residues, however, are not selected by our method since our measure is designed to emphasize on sites contributing to specificity. Figure <figr fid="F7">7</figr> shows the N-terminal part of the alignment of adenylyl kinases, with our predictions highlighted in red and orange colors. The evolutionary tree for the alignment is shown in Figure <figr fid="F8">8</figr>. The first cutting layer (for details see Methods) results in two well-separated sub-trees. Functional annotations suggest that they contain enzymes with different substrate preferences: adenylyl kinases and uridylate kinases, respectively. Three residues (27, 54 and 89) from our predictions (red colored in Figure <figr fid="F9">9</figr>) contribute to substrate-binding specificity, as have been noted before in the structural studies of the UMP kinases <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Figure <figr fid="F9">9</figr> highlights our predicted functional residues on the adenylyl kinase protein structure. Most of our predictions fall within the specificity pocket.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>A partial alignment of the N-terminal part of adenylyl kinases</p>
               </caption>
               <text>
                  <p><b>A partial alignment of the N-terminal part of adenylyl kinases. </b>Sites colored in red are our predictions that are within 5&#197; from the ligand. Sites colored in orange are our predictions more than 5&#197; apart from the ligand.</p>
               </text>
               <graphic file="1471-2148-4-33-7"/>
            </fig>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>The evolutionary tree for the adenylyl kinase family generated by "Weighbor"</p>
               </caption>
               <text>
                  <p><b>The evolutionary tree for the adenylyl kinase family generated by "Weighbor". </b>The first cutting layer is shown. Evolutionary distances are shown to scale.</p>
               </text>
               <graphic file="1471-2148-4-33-8"/>
            </fig>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Mapping top 10 predictions by ANCESCON to adenylyl kinase domain (PDB ID: 1aky) [47]</p>
               </caption>
               <text>
                  <p><b>Mapping top 10 predictions by ANCESCON to adenylyl kinase domain (PDB ID: 1aky) [47]. </b>The color code scheme: ligand is shown in green and the predicted functional residues are shown in red.</p>
               </text>
               <graphic file="1471-2148-4-33-9"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>We developed a package (ANCESCON) to reconstruct ancestral protein sequences that takes into account the variation of substitution rates among sites. Two methods were proposed to estimate site-specific evolutionary rates (<it>&#945;</it>), namely Alignment-Based rate factor (<it>&#945;</it><sub><it>AB</it></sub>) and rate factor <it>&#945; </it>estimated by maximum likelihood (<it>&#945;</it><sub><it>ML</it></sub>). Consideration of rate variation among sites can alleviate the underestimation of evolutionary distances. Accuracy of ancestral sequence reconstruction by our method is higher than that of PAML, PHYLIP and PAUP* when the given alignment contains more diverse sequences. We show that reconstructed ancestral sequences help to improve detection of distant homologs and prediction of functional sites with specificity.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Transition probability and likelihood calculations</p>
            </st>
            <p>For all models discussed in this paper, we assume all sites in an alignment evolve independently and according to a homogeneous, stationary and time reversible Markov process. The probability of an amino acid <it>i </it>to be replaced by amino acid <it>j </it>after a time interval <it>t </it>is <it>P</it><sub><it>ij</it></sub>(<it>t</it>). The transition probability matrix of 20 amino acids is written as <b>P</b>(<it>t</it>), which can be calculated as</p>
            <p><b>P</b>(<it>t</it>) = exp(<b>Q</b><it>t</it>) &#160;&#160;&#160; (2)</p>
            <p>Here, <b>Q </b>is the rate matrix. The non-diagonal elements <it>q</it><sub><it>ij </it></sub>are the instantaneous rates of change from amino acid <it>i </it>to amino acid <it>j </it>and diagonal elements <it>q</it><sub><it>ii </it></sub>are such that each matrix row sums up to 0. <b>Q </b>can be calculated by:</p>
            <p><b>Q </b>= <b>S</b>* <it>diag</it>(<it>&#960;</it>) &#160;&#160;&#160; (3)</p>
            <p><b>S </b>is the matrix of amino acid exchangeability parameters <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. <it>&#960;</it><sub><it>i </it></sub>is the equilibrium frequency for amino acid <it>i</it>. Time reversibility implies that <b>S </b>is a symmetric matrix. In our program, the <b>S </b>matrix is taken from Whelan and Goldman <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> and the default <it>&#960; </it>vector is estimated from the given alignment.</p>
            <p><b>Q </b>can be decomposed into eigenvalues (<it>&#955;</it><sub><it>i</it></sub>) and eigenvectors (<b><it>u</it></b><sub><it>i</it></sub>).</p>
            <p>
               <graphic file="1471-2148-4-33-i2.gif"/>
            </p>
            <p><b>U </b>= (<b><it>u</it></b><sub>1</sub>, ..., <b><it>u</it></b><sub>20</sub>) &#160;&#160;&#160; (5)</p>
            <p><it>P</it><sub><it>ij</it></sub>(<it>t</it>) can be calculated using the following equation,</p>
            <p>
               <graphic file="1471-2148-4-33-i3.gif"/>
            </p>
            <p>The likelihood function <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> for an evolutionary tree T shown in Figure <figr fid="F10">10</figr> is:</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>An evolutionary tree topology</p>
               </caption>
               <text>
                  <p><b>An evolutionary tree topology. </b>Nodes C, D, E and F represent given protein sequences, while nodes A and B represent ancestral protein sequences, i.e. unknown sequences. <it>d</it><sub><it>YZ </it></sub>represents the evolutionary distance between nodes Y and Z.</p>
               </text>
               <graphic file="1471-2148-4-33-10"/>
            </fig>
            <p>
               <graphic file="1471-2148-4-33-i4.gif"/>
            </p>
            <p>Here, <graphic file="1471-2148-4-33-i5.gif"/> is the equilibrium frequency of the amino acid at the node A. <graphic file="1471-2148-4-33-i6.gif"/> is the transition probability from the amino acid at node A to the amino acid at node B after an evolutionary distance <it>d</it><sub><it>AB</it></sub>.</p>
            <p>Considering that each site <it>i </it>has a rate factor <it>&#945;</it><sub><it>i </it></sub><abbrgrp><abbr bid="B13">13</abbr><abbr bid="B18">18</abbr></abbrgrp>, we have:</p>
            <p>
               <graphic file="1471-2148-4-33-i7.gif"/>
            </p>
            <p><it>t </it>in equation (6) can be expressed as:</p>
            <p><it>t </it>= <it>&#945;</it>&#183;<it>d </it>&#160;&#160;&#160; (9)</p>
            <p><it>d </it>is the evolutionary distance and <it>&#945; </it>is rate factor. The following restriction on the vector <it>&#945; </it>holds:</p>
            <p>
               <graphic file="1471-2148-4-33-i8.gif"/>
            </p>
            <p>Here, <it>K </it>is the number of sites.</p>
         </sec>
         <sec>
            <st>
               <p>Alignment-Based Rate Factor <it>&#945; </it>(<it>&#945;</it><sub><it>AB</it></sub>) and Rate factor <it>&#945; </it>estimated by Maximum Likelihood (<it>&#945;</it><sub><it>ML</it></sub>)</p>
            </st>
            <p>Our program supports two methods to estimate a rate factor for each site: Alignment-Based rate factor <it>&#945; </it>(<it>&#945;</it><sub><it>AB</it></sub>) and Maximum Likelihood-estimated rate factor <it>&#945; </it>(<it>&#945;</it><sub><it>ML</it></sub>).</p>
            <p>The estimation of <it>&#945;</it><sub><it>AB </it></sub>is empirical and based on the observation that the substitution rate at a site is correlated with the conservation of the site, which, in turn, is correlated with the average transition probability among the amino acids at that site. Conserved sites are dominated by highly similar amino acids and thus have high average transition probabilities among the amino acids. The algorithm to calculate <it>&#945;</it><sub><it>AB </it></sub>is as follows:</p>
            <p>1. Set <it>t </it>equal to 1.0 and use equation (6) to calculate a transition probability matrix <b>P </b>for 20 amino acids. Equation, <graphic file="1471-2148-4-33-i9.gif"/>, is used to compute a symmetric matrix <b>P'</b>.</p>
            <p>2. Calculate the average transition probability for each site and take the reciprocal: <graphic file="1471-2148-4-33-i10.gif"/>, where <graphic file="1471-2148-4-33-i11.gif"/> is the number of non-gapped amino acid pairs in site <it>i </it>and the denominator is the sum over the transition probabilities between all amino acid pairs (<it>j</it>,<it>k</it>) at a site <it>i</it>.</p>
            <p>3. For invariant sites, <it>C</it><sub><it>i </it></sub>is set to 0 to make it consistent with the Maximum Likelihood estimation.</p>
            <p>4. Equation (11) is used to calculate <it>&#945;</it><sub><it>AB</it></sub>, so that equation (10) holds.</p>
            <p>
               <graphic file="1471-2148-4-33-i12.gif"/>
            </p>
            <p>If an evolutionary tree is assumed for the alignment, we can estimate the <it>&#945;</it><sub><it>ML </it></sub>factors by maximizing the likelihood (equation (8)) for each site:</p>
            <p>
               <graphic file="1471-2148-4-33-i13.gif"/>
            </p>
            <p>If some sites are highly variable, the <it>&#945;</it><sub><it>ML </it></sub>at those sites can be very large, as has been previously noticed <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. We consider these rate factors to be outliers. For these sites, we have observed that likelihood changes very little over a wide range of the <it>&#945; </it>values. An empirical method is used to reduce the values of <it>&#945;</it><sub><it>ML </it></sub>outliers, guided by the <it>&#945;</it><sub><it>AB </it></sub>values. a <it>Z</it>-score of the ratio of <it>&#945;</it><sub><it>ML </it></sub>to <it>&#945;</it><sub><it>AB </it></sub>is calculated for each site except invariant sites:</p>
            <p>
               <graphic file="1471-2148-4-33-i14.gif"/>
            </p>
            <p>
               <graphic file="1471-2148-4-33-i15.gif"/>
            </p>
            <p>Here, <graphic file="1471-2148-4-33-i16.gif"/> is the ratio of <graphic file="1471-2148-4-33-i17.gif"/> to <graphic file="1471-2148-4-33-i18.gif"/> for site <it>i</it>; <graphic file="1471-2148-4-33-i19.gif"/> is the number of sites excluding the invariant sites. If <it>Z</it><sub><it>i </it></sub>is greater than 3, it is reduced to 3 by decreasing the value of <graphic file="1471-2148-4-33-i17.gif"/>. We repeat this procedure until no <it>Z</it><sub><it>i </it></sub>for any site <it>i </it>is greater than 3. After removing the outliers, we scale the <graphic file="1471-2148-4-33-i17.gif"/> values so that equation (10) holds.</p>
         </sec>
         <sec>
            <st>
               <p>Amino acid frequency vector <it>&#960; </it>optimization</p>
            </st>
            <p>Two methods are implemented to estimate the equilibrium frequency vector <it>&#960;</it>, one derived directly from the given alignment (Alignment-Based <it>&#960; </it>or <it>&#960;</it><sub><it>AB</it></sub>) and the other estimated by Maximum Likelihood (<it>&#960;</it><sub><it>ML</it></sub>). The likelihood for the entire alignment is a function of <it>&#960; </it>with 19 variables. A continuous minimization method by simulated annealing <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> is used to optimize <it>&#960;</it>, with the objective function being the logarithm likelihood of the alignment. The simulated annealing is computationally intensive and is the major reason for the long CPU time given in Table <tblr tid="T2">2</tblr>.</p>
         </sec>
         <sec>
            <st>
               <p>Distance matrix calculation and tree inference</p>
            </st>
            <p>A Maximum Likelihood approach is used to estimate the evolutionary distances among sequences, either considering rate variation across sites or not. The logarithm likelihood for replacing one protein sequence (A) with another protein sequence (B) after an evolutionary distance <it>d </it>can be written as:</p>
            <p>
               <graphic file="1471-2148-4-33-i20.gif"/>
            </p>
            <p>Here, <graphic file="1471-2148-4-33-i21.gif"/> is the equilibrium frequency for the amino acid at site <it>j </it>in sequence A. <graphic file="1471-2148-4-33-i22.gif"/> is the transition probability from amino acid at site <it>j </it>in sequence A to amino acid at site <it>j </it>in sequence B after an evolutionary distance <it>&#945; </it><sub><it>j</it></sub>&#183;<it>d</it>. <it>&#945;</it><sub><it>j </it></sub>is 1 if all sites are assumed to evolve at the same rate; otherwise the <it>&#945;</it><sub><it>AB </it></sub>at site <it>j </it>is used for <it>&#945;</it><sub><it>j</it></sub>.</p>
            <p>An estimate of the evolutionary distance between two sequences is obtained by maximizing the likelihood function of equation (15):</p>
            <p>
               <graphic file="1471-2148-4-33-i23.gif"/>
            </p>
            <p>Equation (16) can be solved by the bisection root-finding method <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>.</p>
            <p>After the distance matrix is calculated, the "Weighbor" method, i.e. weighted neighbor joining, is used to infer an evolutionary tree <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Ancestral sequence reconstruction</p>
            </st>
            <p>Two methods are implemented to reconstruct ancestral sequences. One is a marginal reconstruction method <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, and the other is a joint reconstruction method <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Below are their brief descriptions.</p>
         </sec>
         <sec>
            <st>
               <p>The marginal reconstruction method <abbrgrp><abbr bid="B4">4</abbr></abbrgrp></p>
            </st>
            <p>We calculate <it>P</it>(<it>A</it><sub><it>r</it></sub>|{<it>A</it><sub><it>l</it></sub>}<it>T</it>), which is the conditional probability of amino acid <it>A</it><sub><it>r </it></sub>at the root, given leaf node amino acid set {<it>A</it><sub><it>l</it></sub>} and a tree <it>T</it>. Since time reversibility is assumed, any internal node can serve as a root. Using Bayes' theorem, we have:</p>
            <p>
               <graphic file="1471-2148-4-33-i24.gif"/>
            </p>
            <p>Here, <it>P</it>(<it>A</it><sub><it>r</it></sub>) is used here instead of <it>P</it>(<it>A</it><sub><it>r</it></sub>|<it>T</it>) because the frequency of the root amino acid <it>A</it><sub><it>r</it></sub>, i.e. <it>&#960;</it><sub><it>r</it></sub>, does not depend on tree <it>T</it>. <it>P</it>({<it>A</it><sub><it>l</it></sub>}|<it>A</it><sub><it>r</it></sub><it>T</it>) is the conditional probability of the known amino acids at the leaf nodes, given <it>T </it>and <it>A</it><sub><it>r</it></sub>. <it>P</it>({<it>A</it><sub><it>l</it></sub>}|<it>T</it>) does not depend on <it>A</it><sub><it>r</it></sub>, so it is calculated as a normalization constant for <it>P</it>(<it>A</it><sub><it>r</it></sub>|{<it>A</it><sub><it>l</it></sub>},<it>T</it>) terms over all 20 possible values of <it>A</it><sub><it>r </it></sub>to make the sum equal to 1.</p>
            <p>For Figure <figr fid="F10">10</figr>, <it>P</it>({<it>A</it><sub><it>l</it></sub>}|<it>A</it><sub><it>r</it></sub><it>T</it>) can be expanded as:</p>
            <p>
               <graphic file="1471-2148-4-33-i25.gif"/>
            </p>
            <p>Here, <graphic file="1471-2148-4-33-i6.gif"/> is the transition probability from amino acid at node A to amino acid at node B after an evolutionary distance <it>d</it><sub><it>AB</it></sub>. Equation (18) can be calculated using a recursive method suggested by Felsenstein <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
            <p>If rate factors are used in the reconstruction of the root sequence, we have:</p>
            <p>
               <graphic file="1471-2148-4-33-i26.gif"/>
            </p>
            <p>Here, <it>&#945;</it><sub><it>i </it></sub>could be either <it>&#945;</it><sub><it>AB </it></sub>or <it>&#945;</it><sub><it>ML </it></sub>at site <it>i</it>. <it>P</it>(<it>A</it><sub><it>C</it></sub>,<it>A</it><sub><it>D</it></sub>,<it>A</it><sub><it>E</it></sub>,<it>A</it><sub><it>F </it></sub>| <it>A</it><sub><it>A</it></sub>,<it>T</it>)<sub><it>i </it></sub>is the conditional probability <it>P</it>(<it>A</it><sub><it>C</it></sub>,<it>A</it><sub><it>D</it></sub>,<it>A</it><sub><it>E</it></sub>,<it>A</it><sub><it>F </it></sub>| <it>A</it><sub><it>A</it></sub>,<it>T</it>) at site <it>i</it>.</p>
         </sec>
         <sec>
            <st>
               <p>The joint reconstruction method <abbrgrp><abbr bid="B5">5</abbr></abbrgrp></p>
            </st>
            <p>The objective of a joint reconstruction method is to find the combination of amino acids for an internal node set {<it>A</it><sub><it>i</it></sub>} that maximize the conditional probability of this amino acid combination, given the leaf node amino acid set {<it>A</it><sub><it>l</it></sub>} and a tree <it>T</it>, <it>P</it>({<it>A</it><sub><it>i</it></sub>}|{<it>A</it><sub><it>l</it></sub>},<it>T</it>). Using the Bayes' theorem, we have:</p>
            <p>
               <graphic file="1471-2148-4-33-i27.gif"/>
            </p>
            <p>Because <it>P</it>({<it>A</it><sub><it>l</it></sub>}|<it>T</it>) is the same for all amino acid combination at internal node set {<it>A</it><sub><it>i</it></sub>} this problem becomes finding the maximum of <it>P</it>({<it>A</it><sub><it>l</it></sub>}|{<it>A</it><sub><it>i</it></sub>},<it>T</it>) *<it>P</it>({<it>A</it><sub><it>i</it></sub>}).</p>
            <p>The details of a fast algorithm to solve equation (20) can be found in Pupko et al. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. We also incorporated site-specific rate factors in this algorithm, in a similar way as equation (19)</p>
         </sec>
         <sec>
            <st>
               <p>Gaps</p>
            </st>
            <p>Due to difficulties with the probabilistic models of gaps, a simplified empirical approach is used to alleviate the problem. We assume that gaps are "supersede" letters. Gaps are considered for each site independently. If a leaf node has a gap instead of an amino acid at a site, this node will be deleted from the tree for this site. After dealing with leaves, we check all internal nodes for children. If an internal node has no children or only one child due to the leaf removal because of gaps, it will be removed from the tree and a gap will be assumed as its reconstructed state.</p>
         </sec>
         <sec>
            <st>
               <p>Simulations of evolutionary process</p>
            </st>
            <p>Two methods of simulating amino acid substitution process were used to test the reliability of reconstruction, rate factors and evolutionary distance estimation. The first simulation method was based on a homogeneous time reversible Markov model. The parameters from Whelan and Goldman <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> were chosen for our model, including the equilibrium frequency vector <it>&#960; </it>and the <b>S </b>matrix. Given the length of a branch from a parent node to one of its child nodes and the amino acid for the parent node, we simulated the substitution process to generate an amino acid for the child node based on the transition probabilities that were calculated using equation (6). For the arbitrarily selected tree shown in Figure <figr fid="F2">2</figr>, we first generated a random sequence of 100 amino acids as the root sequence based on the amino acid frequencies from Whelan and Goldman <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. We then simulated the random substitution process to obtain all leaf node sequences. This simulation was repeated 100 times. The resulting 100 alignments were used to test the reliability of the reconstruction result. In this simulation, each site evolved independently according to the same tree topology and branch lengths, thus there was no rate heterogeneity across sites.</p>
            <p>The second simulation method, based on a <it>Z</it>-score model, introduced rate variation across sites by using structural and functional information for a specific protein family <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. We selected three protein families for the <it>Z</it>-score simulations under structural and functional constraints: pdz domain (Protein DataBank (PDB) ID: 1g9o) <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, trypsin (PDB ID: 1sgt) <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> and carboxypeptidase A (PDB ID: 2ctb) <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Given a rooted tree, the native sequence with known structure was used as the root sequence. Simulations were made along the tree to generate sequences at any internal node or leaf node. If the evolutionary distance from a parent node to a child node was <it>d</it>, the child sequence was obtained after <it>l</it>*<it>d </it>accepted substitutions starting from the parent sequence, where <it>l </it>is protein sequence length. Simulations of the substitution process were repeated 100 times. For each site, the number of accepted substitutions was recorded and averaged over 100 simulations. Rate factors (observed <it>&#945;</it>), representing site mutability, were calculated from these average substitution numbers, such that the average of rate factors is 1 (equation (10)). 100 simulated alignments were used to test the rate factor estimators (<it>&#945;</it><sub><it>AB </it></sub>and <it>&#945;</it><sub><it>ML</it></sub>), distance calculation methods and ancestral sequence reconstruction.</p>
         </sec>
         <sec>
            <st>
               <p>Homology detection</p>
            </st>
            <sec>
               <st>
                  <p>Testing dataset</p>
               </st>
               <p>38 OB (Oligonucleotide/oligosaccharide binding)-fold <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> proteins with known structures were selected for homology detection test. OB-fold has a 5-stranded <it>&#946;</it>-barrel structure. In the SCOP (Structure Classification of Proteins) database (version 1.55) <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, there are 7 OB-fold superfamilies. The superfamily of nucleic acid binding proteins is the most populated. Diversity of many OB-fold homologs extends beyond detection by automatic PSI-BLAST searches. Multiple sequence alignments of native sequences were obtained from PSI-BLAST searches starting from the 38 OB-fold sequences with known structures. We also selected 10 alignments (adenylyl kinase, gef, globin, pdz, ph, ptb, ras, sh2, sh3 and subtilase) from the Pfam database (version 7.3) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> for homology detection test.</p>
            </sec>
            <sec>
               <st>
                  <p>Four different methods</p>
               </st>
               <p>For each alignment with <it>N </it>sequences, ancestral sequences for the <it>N</it>-1 internal nodes were reconstructed. The idea is to test whether adding more sequences to a native alignment can help homology detection. Four types of combined alignments were generated, adding different sets of <it>N</it>-1 sequences to the native alignment. In the first case, the added sequence at each internal node consisted of amino acids with the largest probability at each position. In the second case, the added sequences were made up of amino acids with the second largest probability. In the third case, we shuffled the native alignment at each position while keeping the gap pattern as in the native alignment. After shuffling, we added <it>N</it>-1 sequences resulted from the shuffling to the native alignment. In the fourth case, <it>N</it>-1 random sequences were generated with the overall amino acid frequencies of the native alignment. These four methods are named "BEST", "SECOND BEST", "SHUFFLE" and "RANDOM", respectively.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Prediction of functional sites</p>
            </st>
            <p>Our objective is to find sites that are well conserved within each sub-tree, but show high variability between different sub-trees. These sites are likely to contribute to functional specificity <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>.</p>
            <sec>
               <st>
                  <p>Sequence datasets</p>
               </st>
               <p>Multiple sequence alignments of ten protein families were chosen from the Pfam database (version 7.3) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. These families are: adenylyl kinase (adkinase) (representing structure PDB ID: 1aky; its ligand or substrate: AP5) <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, guanine nucleotide exchange factor (gef) (1bkd; H-Ras) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, globin (1a6g; HEM) <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, pdz domain (1be9; C-terminal peptide of protein CRIPT) <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, ph domain (1mai; I3P) <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, ptb domain (1shc; PTR) <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>, ras (821p; GTN) <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>, sh2 domain (1a09; ACE) <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, sh3 domain (1nlo; ACE) <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> and subtilase (1av7; SBL) <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. Most of these alignments contain many sequences. We pruned and clustered the sequences in each alignment according to the length and diversity. Representative sequences were kept and used for tree inference and ancestral sequence reconstruction. This procedure was done in three steps: 1) removing fragments, 2) single-linkage clustering and 3) complete-linkage clustering, as described below.</p>
               <p>1. For each family, there is a template sequence with known structure. The sequences, which cover less than 75% of the non-gapped positions in the template sequence with amino acids, were considered to be fragments and discarded.</p>
               <p>2. A sequence identity matrix was calculated for the remaining sequences. A single linkage clustering was done to form sequence groups at sequence identity threshold 0.8. For each group, we chose the longest sequence as a representative, discarding other members. This step reduced redundancy in the dataset.</p>
               <p>3. An average sequence identity was calculated for the remaining sequences. We used this average identity as a threshold for complete linkage clustering to form new sequence groups. Four groups with the largest sequence numbers were chosen to form our new alignment. Any group with the same number of sequences as the fourth group was also included in the new alignment. The purpose of this step is to keep the major sequence subgroups of a family while leaving out highly divergent sequences that might be deleterious for tree inference.</p>
            </sec>
            <sec>
               <st>
                  <p>Rooting</p>
               </st>
               <p>The "Weighbor" method gives an unrooted tree. For our purpose of predicting functional sites, we need to find a point on the tree that serves as the root. We used a least-squares modification of the midpoint rooting procedure to define the root <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Tree partitioning</p>
               </st>
               <p>The tree was partitioned into sub-trees at several levels and compared the amino acid usages within each sub-tree and among the sub-trees. For this partitioning, we "cut" the tree into a fixed number of equal-distanced layers, using the midpoint as the root (Figure <figr fid="F11">11</figr>). Several criteria were tried for selecting the distance between adjacent layers. Empirically we found that a simple partition of the tree into 5 layers usually gave the best results. If the average distance from the root to all leaf nodes is <it>d</it><sub><it>r</it></sub>, then the distance between adjacent layers is <it>d</it><sub><it>r</it></sub>/5 (Figure <figr fid="F11">11</figr>). Each place of a "cut" between the layers corresponds to a certain ancestral sequence. We term the location of a "cut" as a "cutting" node. The marginal reconstruction method was used to reconstruct amino acid probability vectors for all the cutting nodes (Figure <figr fid="F11">11</figr>). The reconstructed probability vector of a cutting node reflects the amino acid usages of the sub-tree under it.</p>
               <fig id="F11">
                  <title>
                     <p>Figure 11</p>
                  </title>
                  <caption>
                     <p>An example showing the different cutting layers in a rooted tree</p>
                  </caption>
                  <text>
                     <p><b>An example showing the different cutting layers in a rooted tree. </b><it>d</it><sub><it>r </it></sub>is the average distance from the root to all leaf nodes. Nodes <it>i </it>and <it>j </it>are neighboring cutting nodes.</p>
                  </text>
                  <graphic file="1471-2148-4-33-11"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Calculating specificity score for each site</p>
               </st>
               <p>We use {<it>L</it><sub><it>K</it></sub>} to represent the set of cutting nodes for layer <it>L</it><sub><it>K</it></sub>, <it>K </it>= 0,1,5. {<it>L</it><sub><it>0</it></sub>} is the root and <it>L</it><sub>1 </sub>is the closest layer to the root, etc.</p>
               <p>A dissimilarity score between any neighboring cutting node pair is calculated. The definition of a neighboring cutting node pair (<it>i</it>, <it>j</it>) (Figure <figr fid="F11">11</figr>) is:</p>
               <p>1. <it>i </it>&#8712; {<it>L</it><sub><it>K</it></sub>}</p>
               <p>2. <it>j </it>&#8712; {<it>L</it><sub><it>K</it>+1</sub>}</p>
               <p>3. Node <it>i </it>is an ancestor of node <it>j </it>(all points on the path from <it>j </it>to root node are ancestors of node <it>j</it>), so that the distance between <it>i </it>and <it>j </it>is exactly <it>d</it><sub><it>r</it></sub><it>/5</it>. Each cutting node has only one ancestral cutting node neighbor.</p>
               <p>The dissimilarity score for cutting node <it>j </it>and its ancestral cutting node neighbor <it>i, </it>i.e. <it>anc</it>(<it>j</it>), at site <it>m </it>is defined as:</p>
               <p>
                  <graphic file="1471-2148-4-33-i28.gif"/>
               </p>
               <p><graphic file="1471-2148-4-33-i29.gif"/> and <graphic file="1471-2148-4-33-i30.gif"/> are the reconstructed probabilities of amino acid <it>A </it>at cutting node <it>j </it>and its ancestral cutting node neighbor <it>i</it>(<it>anc</it>(<it>j</it>)), respectively.</p>
               <p>Let <graphic file="1471-2148-4-33-i31.gif"/>, <it>K </it>= 1,...,5 &#160;&#160;&#160; (22)</p>
               <p>Here, <graphic file="1471-2148-4-33-i32.gif"/> is the average dissimilarity score for layer <it>K </it>. <it>N</it><sub><it>K </it></sub>is the number of cutting nodes in layer <it>K</it>.</p>
               <p>The specificity score is defined as:</p>
               <p>
                  <graphic file="1471-2148-4-33-i33.gif"/>
               </p>
               <p><graphic file="1471-2148-4-33-i34.gif"/> reflects the difference of amino acid compositions among the major sub-trees defined by the first layer. <graphic file="1471-2148-4-33-i35.gif"/> to <graphic file="1471-2148-4-33-i36.gif"/> reflect the average difference of amino acid compositions within each sub-tree. If the amino acids are highly conserved within each sub-tree but show variability among the sub-trees, <graphic file="1471-2148-4-33-i35.gif"/> to <graphic file="1471-2148-4-33-i36.gif"/> are small and <graphic file="1471-2148-4-33-i34.gif"/> is large, leading to a large value of <it>S</it><sub><it>m</it></sub>. We set <it>S</it><sub><it>m </it></sub>to 0 for invariant sites. We sort the sites by their specificity scores and choose the 10 top scoring sites as our predicted functional sites. Those predicted functional sites that lie within 5 &#197; from the ligand(s) are considered to be true positives.</p>
            </sec>
            <sec>
               <st>
                  <p>Comparison with other methods</p>
               </st>
               <p>We compared our method with three other methods for prediction of functional sites. The first method (Simple Conservation or SC) is based on sequence conservation. Highly conserved sites are considered to be functional. For each family, we sorted the sites by positional conservation <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and chose the 10 top-ranking sites as the predictions. There might be ties for sites. For example, if there were 5 sites tied at the tenth conservation value and only one of them was within 5&#197; from the ligand(s), then its contribution to the total number of "correct predictions" was 1/5. The second method is the evolutionary trace (ET) method <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, which partitions a sequence identity dendrogram into sub-trees at varying sequence identity thresholds. Sites that are invariant within each individual sub-tree are picked as functional sites. A higher identity threshold gives rise to more sub-trees and, since conserved sites are more frequent in the sub-trees with smaller sizes, lead to more predicted sites. ET analysis was performed from a low identity threshold to higher thresholds until the number of predicted sites was 10 or just above 10 (in the cases of ties). Ties were resolved similarly to the simple conservation method. The third method (conservation difference or CD) is based on the conservation differences between a native alignment and an alignment derived from the <it>Z</it>-score sequence design <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The basic idea was to differentiate sites conserved due to structural stability and sites conserved due to function. Since the pairwise potential in the <it>Z</it>-score design tends to weaken the conservation caused by function, functionally conserved sites tend to have a large conservation difference between the native alignment and the alignment of designed sequences. We chose 10 top ranking sites sorted by conservation difference as predictions by CD.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>NVG conceived and initiated the study. All authors took part in developing methods and designing experiments. WC wrote the source code and JP analyzed the data. All authors read and approved the final manuscripts.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Lisa Kinch, James Wrabl and Hua Cheng for their useful comments. This work was supported by the NIH grant GM67165 to NVG.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Toward defining the course of evolution: minimum change for a specific tree topology</p>
            </title>
            <aug>
               <au>
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Syst Zool</source>
            <pubdate>1971</pubdate>
            <volume>20</volume>
            <fpage>406</fpage>
            <lpage>416</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Minimum evolution fits to a given tree</p>
            </title>
            <aug>
               <au>
                  <snm>Hartigan</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Biometrics</source>
            <pubdate>1973</pubdate>
            <volume>29</volume>
            <fpage>53</fpage>
            <lpage>65</lpage>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A new method of inference of ancestral nucleotide and amino acid sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1995</pubdate>
            <volume>141</volume>
            <fpage>1641</fpage>
            <lpage>1650</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8601501</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Probabilistic reconstruction of ancestral protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Koshi</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1996</pubdate>
            <volume>42</volume>
            <fpage>313</fpage>
            <lpage>320</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8919883</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A fast algorithm for joint reconstruction of ancestral amino acid sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Pupko</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Pe'er</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Graur</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>890</fpage>
            <lpage>896</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10833195</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1997</pubdate>
            <volume>44 Suppl 1</volume>
            <fpage>S139</fpage>
            <lpage>146</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9071022</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>PAML: a phylogenetic analysis by maximum likelihood. Version 2.0e</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Pennsylvania State University, University Park</source>
            <pubdate>1995</pubdate>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Evolutionary trees from DNA sequences: a maximum likelihood approach</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1981</pubdate>
            <volume>17</volume>
            <fpage>368</fpage>
            <lpage>376</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7288891</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The neighbor-joining method: a new method for reconstructing phylogenetic trees</p>
            </title>
            <aug>
               <au>
                  <snm>Saitou</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1987</pubdate>
            <volume>4</volume>
            <fpage>406</fpage>
            <lpage>425</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3447015</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction</p>
            </title>
            <aug>
               <au>
                  <snm>Bruno</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Socci</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>189</fpage>
            <lpage>197</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10666718</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data</p>
            </title>
            <aug>
               <au>
                  <snm>Gascuel</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1997</pubdate>
            <volume>14</volume>
            <fpage>685</fpage>
            <lpage>695</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9254330</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Fitting discrete probability distributions to evolutionary events</p>
            </title>
            <aug>
               <au>
                  <snm>Uzzell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Corbin</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1971</pubdate>
            <volume>172</volume>
            <fpage>1089</fpage>
            <lpage>1896</lpage>
            <xrefbib>
               <pubid idtype="pmpid">5574514</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1993</pubdate>
            <volume>10</volume>
            <fpage>1396</fpage>
            <lpage>1401</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8277861</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Construction of phylogenetic trees</p>
            </title>
            <aug>
               <au>
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Margoliash</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1967</pubdate>
            <volume>155</volume>
            <fpage>279</fpage>
            <lpage>284</lpage>
            <xrefbib>
               <pubid idtype="pmpid">5334057</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The estimate of total nucleotide substitutions from pairwise differences is biased</p>
            </title>
            <aug>
               <au>
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Philos Trans R Soc Lond B Biol Sci</source>
            <pubdate>1986</pubdate>
            <volume>312</volume>
            <fpage>317</fpage>
            <lpage>324</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2870524</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A simple method for estimating the parameter of substitution rate variation among sites</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1997</pubdate>
            <volume>14</volume>
            <fpage>1106</fpage>
            <lpage>1113</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9364768</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Taking variation of evolutionary rates between sites into account in inferring phylogenies</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2001</pubdate>
            <volume>53</volume>
            <fpage>447</fpage>
            <lpage>455</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s002390010234</pubid>
                  <pubid idtype="pmpid" link="fulltext">11675604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Site-by-site estimation of the rate of substitution and the correlation of rates in mitochondrial DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>1997</pubdate>
            <volume>46</volume>
            <fpage>346</fpage>
            <lpage>353</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11975345</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: Application to the evolution of five gene families</p>
            </title>
            <aug>
               <au>
                  <snm>Pupko</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Pe'er</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Graur</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>1116</fpage>
            <lpage>1123</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.8.1116</pubid>
                  <pubid idtype="pmpid" link="fulltext">12176835</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Modeling amino acid replacement</p>
            </title>
            <aug>
               <au>
                  <snm>Muller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>761</fpage>
            <lpage>776</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270050514918</pubid>
                  <pubid idtype="pmpid" link="fulltext">11382360</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Using protein design for homology detection and active site searches</p>
            </title>
            <aug>
               <au>
                  <snm>Pei</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dokholyan</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Shakhnovich</snm>
                  <fnm>EI</fnm>
               </au>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>11361</fpage>
            <lpage>11366</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">208762</pubid>
                  <pubid idtype="pmpid" link="fulltext">12975528</pubid>
                  <pubid idtype="doi">10.1073/pnas.2034878100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Evalution of Several Methods for Estimating Phylogenetic Trees When Substitution Rates Differ over Nucleotide Sites</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1995</pubdate>
            <volume>40</volume>
            <fpage>689</fpage>
            <lpage>697</lpage>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Searching for functional sites in protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Curr Opin Chem Biol</source>
            <pubdate>2004</pubdate>
            <volume>8</volume>
            <fpage>3</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cbpa.2003.11.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15036149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Evolutionary predictions of binding surfaces and interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Lichtarge</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sowa</snm>
                  <fnm>ME</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>21</fpage>
            <lpage>27</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(02)00284-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">11839485</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>AL2CO: calculation of positional conservation in a protein sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Pei</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>700</fpage>
            <lpage>712</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.8.700</pubid>
                  <pubid idtype="pmpid" link="fulltext">11524371</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>An evolutionary trace method defines binding surfaces common to protein families</p>
            </title>
            <aug>
               <au>
                  <snm>Lichtarge</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>FE</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>257</volume>
            <fpage>342</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1996.0167</pubid>
                  <pubid idtype="pmpid" link="fulltext">8609628</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees</p>
            </title>
            <aug>
               <au>
                  <snm>Tamura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1993</pubdate>
            <volume>10</volume>
            <fpage>512</fpage>
            <lpage>526</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8336541</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1994</pubdate>
            <volume>39</volume>
            <fpage>306</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00178256</pubid>
                  <pubid idtype="pmpid">7932792</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The Pfam protein families database</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cerruti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Etwiller</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Howe</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>276</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99071</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752314</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.276</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>PAML: a program package for phylogenetic analysis by maximum likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>555</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9367129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>PHYLIP (Phylogeny Inference Package), version 3.6b</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Department of Genetics, University of Washington, Seattle</source>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B32">
            <title>
               <p>PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.</p>
            </title>
            <aug>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Sinauer Associates, Sunderland, Massachusetts</source>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The Protein Data Bank</p>
            </title>
            <aug>
               <au>
                  <snm>Berman</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Westbrook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gilliland</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bhat</snm>
                  <fnm>TN</fnm>
               </au>
               <au>
                  <snm>Weissig</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shindyalov</snm>
                  <fnm>IN</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>PE</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>235</fpage>
            <lpage>242</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102472</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592235</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.235</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>OB(oligonucleotide/oligosaccharide binding)-fold: common structural and functional solution for non-homologous sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Embo J</source>
            <pubdate>1993</pubdate>
            <volume>12</volume>
            <fpage>861</fpage>
            <lpage>867</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8458342</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Profile hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>755</fpage>
            <lpage>763</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid>
                  <pubid idtype="pmpid" link="fulltext">9918945</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements</p>
            </title>
            <aug>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Shavirin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Spouge</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>2994</fpage>
            <lpage>3005</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55814</pubid>
                  <pubid idtype="pmpid" link="fulltext">11452024</pubid>
                  <pubid idtype="doi">10.1093/nar/29.14.2994</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Substrate specificity and assembly of the catalytic center derived from two structures of ligated uridylate kinase</p>
            </title>
            <aug>
               <au>
                  <snm>Muller-Dieckmann</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Schulz</snm>
                  <fnm>GE</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>246</volume>
            <fpage>522</fpage>
            <lpage>530</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1994.0104</pubid>
                  <pubid idtype="pmpid" link="fulltext">7877173</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach</p>
            </title>
            <aug>
               <au>
                  <snm>Whelan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>691</fpage>
            <lpage>699</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11319253</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Numerical Recipes in C : The Art of Scientific Computing</p>
            </title>
            <aug>
               <au>
                  <snm>Press</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Teukolsky</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Vetterling</snm>
                  <fnm>WT</fnm>
               </au>
               <au>
                  <snm>Flannery</snm>
                  <fnm>BP</fnm>
               </au>
            </aug>
            <pubdate>1992</pubdate>
            <fpage>451</fpage>
            <lpage>455</lpage>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Crystal structure of the PDZ1 domain of human Na(+)/H(+) exchanger regulatory factor provides insights into the mechanism of carboxyl-terminal leucine recognition by class I PDZ domains</p>
            </title>
            <aug>
               <au>
                  <snm>Karthikeyan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Leung</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Birrane</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Webster</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ladias</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>308</volume>
            <fpage>963</fpage>
            <lpage>973</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4634</pubid>
                  <pubid idtype="pmpid" link="fulltext">11352585</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Refined crystal structure of Streptomyces griseus trypsin at 1.7 A resolution</p>
            </title>
            <aug>
               <au>
                  <snm>Read</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>MN</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1988</pubdate>
            <volume>200</volume>
            <fpage>523</fpage>
            <lpage>551</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3135412</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>High-resolution structure of the complex between carboxypeptidase A and L-phenyl lactate</p>
            </title>
            <aug>
               <au>
                  <snm>Teplyakov</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Acta Crystallogr D Biol Crystallogr</source>
            <pubdate>1993</pubdate>
            <volume>49</volume>
            <fpage>534</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1107/S0907444993007267</pubid>
                  <pubid idtype="pmpid" link="fulltext">15299490</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>SCOP: a structural classification of proteins database for the investigation of sequences and structures</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>247</volume>
            <fpage>536</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1995.0159</pubid>
                  <pubid idtype="pmpid" link="fulltext">7723011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Analysis and prediction of functional sub-types from protein sequence alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Hannenhalli</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>303</volume>
            <fpage>61</fpage>
            <lpage>76</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4036</pubid>
                  <pubid idtype="pmpid" link="fulltext">11021970</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors</p>
            </title>
            <aug>
               <au>
                  <snm>Mirny</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Gelfand</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>321</volume>
            <fpage>7</fpage>
            <lpage>20</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)00587-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12139929</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>High-resolution structures of adenylate kinase from yeast ligated with inhibitor Ap5A, showing the pathway of phosphoryl transfer</p>
            </title>
            <aug>
               <au>
                  <snm>Abele</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Schulz</snm>
                  <fnm>GE</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1995</pubdate>
            <volume>4</volume>
            <fpage>1262</fpage>
            <lpage>1271</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7670369</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>The structural basis of the activation of Ras by Sos</p>
            </title>
            <aug>
               <au>
                  <snm>Boriack-Sjodin</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Margarit</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Bar-Sagi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kuriyan</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1998</pubdate>
            <volume>394</volume>
            <fpage>337</fpage>
            <lpage>343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/28548</pubid>
                  <pubid idtype="pmpid" link="fulltext">9690470</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Crystal structures of myoglobin-ligand complexes at near-atomic resolution</p>
            </title>
            <aug>
               <au>
                  <snm>Vojtechovsky</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Berendzen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sweet</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Schlichting</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Biophys J</source>
            <pubdate>1999</pubdate>
            <volume>77</volume>
            <fpage>2153</fpage>
            <lpage>2174</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10512835</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Crystal structures of a complexed and peptide-free membrane protein-binding domain: molecular basis of peptide recognition by PDZ</p>
            </title>
            <aug>
               <au>
                  <snm>Doyle</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sheng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>MacKinnon</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1996</pubdate>
            <volume>85</volume>
            <fpage>1067</fpage>
            <lpage>1076</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)81307-0</pubid>
                  <pubid idtype="pmpid">8674113</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Structure of the high affinity complex of inositol trisphosphate with a phospholipase C pleckstrin homology domain</p>
            </title>
            <aug>
               <au>
                  <snm>Ferguson</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Lemmon</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Schlessinger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sigler</snm>
                  <fnm>PB</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1995</pubdate>
            <volume>83</volume>
            <fpage>1037</fpage>
            <lpage>1046</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(95)90219-8</pubid>
                  <pubid idtype="pmpid">8521504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Structure and ligand recognition of the phosphotyrosine binding domain of Shc</p>
            </title>
            <aug>
               <au>
                  <snm>Zhou</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Ravichandran</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Olejniczak</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Petros</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Meadows</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Sattler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Harlan</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Wade</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Burakoff</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Fesik</snm>
                  <fnm>SW</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1995</pubdate>
            <volume>378</volume>
            <fpage>584</fpage>
            <lpage>592</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/378584a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">8524391</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Three-dimensional structures and properties of a transforming and a nontransforming glycine-12 mutant of p21H-ras</p>
            </title>
            <aug>
               <au>
                  <snm>Franken</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Scheidig</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Krengel</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Rensland</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lautwein</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Geyer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Scheffzek</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Goody</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Kalbitzer</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Pai</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Wittinghofer</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1993</pubdate>
            <volume>32</volume>
            <fpage>8411</fpage>
            <lpage>8420</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8357792</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Peptide ligands of pp60(c-src) SH2 domains: a thermodynamic and structural study</p>
            </title>
            <aug>
               <au>
                  <snm>Charifson</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Shewchuk</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Rocque</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Hummel</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Jordan</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Mohr</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Pacofsky</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Peel</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Rodriguez</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sternbach</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Consler</snm>
                  <fnm>TG</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1997</pubdate>
            <volume>36</volume>
            <fpage>6283</fpage>
            <lpage>6293</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi970019n</pubid>
                  <pubid idtype="pmpid" link="fulltext">9174343</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Molecular basis for the binding of SH3 ligands with non-peptide elements identified by combinatorial synthesis</p>
            </title>
            <aug>
               <au>
                  <snm>Feng</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kapoor</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Shirai</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Combs</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Schreiber</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Chem Biol</source>
            <pubdate>1996</pubdate>
            <volume>3</volume>
            <fpage>661</fpage>
            <lpage>670</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1074-5521(96)90134-9</pubid>
                  <pubid idtype="pmpid">8807900</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Differences in binding modes of enantiomers of 1-acetamido boronic acid based protease inhibitors: crystal structures of gamma-chymotrypsin and subtilisin Carlsberg complexes</p>
            </title>
            <aug>
               <au>
                  <snm>Stoll</snm>
                  <fnm>VS</fnm>
               </au>
               <au>
                  <snm>Eger</snm>
                  <fnm>BT</fnm>
               </au>
               <au>
                  <snm>Hynes</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Martichonok</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Pai</snm>
                  <fnm>EF</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1998</pubdate>
            <volume>37</volume>
            <fpage>451</fpage>
            <lpage>462</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi971166o</pubid>
                  <pubid idtype="pmpid" link="fulltext">9425066</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Evolution of aminoacyl-tRNA synthetases--analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events</p>
            </title>
            <aug>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>689</fpage>
            <lpage>710</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10447505</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Statistical tests of models of DNA substitution</p>
            </title>
            <aug>
               <au>
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1993</pubdate>
            <volume>36</volume>
            <fpage>182</fpage>
            <lpage>198</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7679448</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
