<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2148-6-29</ui>
   <ji>1471-2148</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Keane</snm>
               <mi>M</mi>
               <fnm>Thomas</fnm>
               <insr iid="I1"/>
               <email>thomaskeane@gmail.com</email>
            </au>
            <au id="A2">
               <snm>Creevey</snm>
               <mi>J</mi>
               <fnm>Christopher</fnm>
               <insr iid="I2"/>
               <email>chris.creevey@gmail.com</email>
            </au>
            <au id="A3">
               <snm>Pentony</snm>
               <mi>M</mi>
               <fnm>Melissa</fnm>
               <insr iid="I3"/>
               <email>M.Pentony@cs.ucl.ac.uk</email>
            </au>
            <au id="A4">
               <snm>Naughton</snm>
               <mi>J</mi>
               <fnm>Thomas</fnm>
               <insr iid="I4"/>
               <email>tom.naughton@nuim.ie</email>
            </au>
            <au id="A5">
               <snm>Mclnerney</snm>
               <mi>O</mi>
               <fnm>James</fnm>
               <insr iid="I1"/>
               <email>james.o.mcinerney@nuim.ie</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Co. Kildare, Ireland</p>
            </ins>
            <ins id="I2">
               <p>Bork Group, EMBL Heidelberg, Heidelberg, Germany</p>
            </ins>
            <ins id="I3">
               <p>Department of Computer Science, University College London, Gower Street, London, UK</p>
            </ins>
            <ins id="I4">
               <p>Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, Ireland</p>
            </ins>
         </insg>
         <source>BMC Evolutionary Biology</source>
         <issn>1471-2148</issn>
         <pubdate>2006</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>29</fpage>
         <url>http://www.biomedcentral.com/1471-2148/6/29</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16563161</pubid>
               <pubid idtype="doi">10.1186/1471-2148-6-29</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>19</day>
               <month>12</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>24</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>24</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Keane et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>For a number of years phylogenetic construction has been considered to be a problem of statistical inference. One of the most popular methods of inferring phylogenetic relationships is maximum likelihood (ML). It has often been considered that one of the advantages of ML over parsimony based methods is that it allows for the use of different models of evolution depending on the dataset being examined. Therefore knowing the process of evolution and being able to construct realistic models of evolution is the foundation for being able to infer accurate phylogenetic relationships among species. Currently one of the major challenges in phylogenetics is to accurately model the process of nucleotide or amino acid substitution and to choose among our set of models in order to infer accurate phylogenies. Felsenstein <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> was the first to show in simulations that overly simplistic models that underestimate multiple substitutions can result in inconsistency during phylogeny estimation in certain situations (referred to as the 'Felsenstein zone'). Further simulation studies have shown that even when using ML analysis, underestimation of nucleotide substitutions (as assumed by simpler models) leads to long-branch attraction and inconsistency in the Felsenstein zone <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. These results have also been duplicated in real datasets where the use of inadequate models can lead to long-branch attraction <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. However it was also shown in simulations that violations of the model can also favour the true tree in certain situations (often referred to as the 'Farris zone' or 'inverse-Felsenstein zone') <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. It was later shown in simulations and real data that this can only happen in a very limited number of cases and in general using overly simplistic models should be avoided <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. One fact that is most certainly true is that the accurate estimation of node support is strongly affected by the use of simplistic models in simulated and real datsets <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. It is clear that unless we can be totally sure that a dataset fits into one of the categories mentioned above, then the use of overly simplistic (or incorrect) substitution models can negatively bias our phylogenies.</p>
         <p>Almost all models of amino acid replacement assume that all amino acids sites evolve independently according to the same Markov process. It is assumed that the Markov process is stationary and homogeneous, so that all rates of substitution are constant across time. Each of the protein substitution models consists of a 20 &#215; 20 instantaneous rate matrix which includes the set of original amino acid frequencies (<it>&#960;</it><sub><it>i</it></sub>) obtained from the dataset that was used to generate the model. The (<it>&#960;</it><sub><it>i</it></sub>) values represent the equilibrium or stationary frequencies of the 20 amino acids and the matrices are often modified to include the set of observed frequencies in the dataset being examined. Models that take into account the observed amino acid frequencies are often denoted by the '+F' suffix <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. If we assume that the substitution process is reversible then the number of free parameters is reduced to from 399 to 189. However due the computational burden imposed by optimising all 189 free parameters of the instantaneous rate matrix with large datasets and the risk of overfitting the matrix parameters when analysing small datasets has meant that most phylogeny programs rely on empirical models of protein evolution. Dayhoff et al. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> were the first to develop a general model of amino acid substitution using the limited amount of sequence data available at the time. Since then, several additional models have been developed from other datasets and using different techniques, such as maximum parsimony and maximum likelihood <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>There has been a great deal of research into various techniques for performing model selection on nucleotide data <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. In the past, three measures have been used to select the best-fit substitution model. The hierarchical likelihood ratio test (hLRT) consists of a tree hierarchy where the best-fit model is selected by performing a number of pairwise likelihood ratio tests and navigating the tree to arrive at the final model <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. However the hLRT is only suitable for models which can be defined as subsets of each other, therefore is generally only applied to nucleotide model selection. For example, the F81 model <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> is a subset of the HKY model <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> with the transition and transversion rates set to be equal. As the different amino acid matrices do not have any free parameters, it is not possible to define a similar tree hierarchy as with nucleotide models. The Akaike information criterion (AIC) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and Bayesian information criterion (BIC) <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> belong to a different class of model selection measures that compare all of the models simultaneously according to some measure of fitness. Although these measures have been used for many years in nucleotide model selection, only recently have programs such as MODELGENERATOR <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and Prottest <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> been specifically developed to perform statistical analyses of the complete set of available amino acid substitution models.</p>
         <p>Until now many phylogenetic analyses of multiple datasets from a fixed set of taxa have assumed a single substitution model for all sets of homologs (e.g. <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>). We argue that if it is assumed that a single amino acid matrix is the best-fit matrix for all genes in a dataset, then there is the possibility that the method may encounter situations like the one mentioned previously and produce suboptimal phylogenies. We have performed experiments using real datasets in order to determine if it is correct to build phylogenies from different genes using the same amino acid matrix. This issue has never been examined before and our results show a large differences in best-fit protein models within all of the datasets analysed. The results presented in this paper raise the question of whether we should be performing full protein model selection analyses prior to any amino acid phylogeny estimation.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>To investigate the potentially harmful effects of a non-statistical approach to choosing protein models, we built two phylogenies with two arbitrarily selected protein models using a single gene family alignment consisting of 7 taxa (3580 characters in length) taken from the dataset of Philip et al. <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Figure <figr fid="F1">1</figr> shows how it is possible to obtain two different tree topologies both with equally high bootstrap support by arbitrarily choosing the protein substitution model. In situations like this, the high bootstrap support values of both trees might indicate that either one of these alternate trees is optimal. However, bootstrap values can be misleading (as bootstrapping is performed using the same model that was used to construct the phylogeny) and cannot be used to infer information about the suitability of the substitution model <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Therefore without detailed prior knowledge of the phylogenetic relationships or first determining the best-fit model from the set of available models, it is difficult to determine the optimal ML tree.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Alternative Trees</p>
            </caption>
            <text>
               <p><b>Alternative Trees</b>. Two different trees (with bootstrap support values based on 100 replicates) constructed from a single gene family [34] with different protein models using Phyml v2.4.4 [53]. Tree (a) was produced using the MtREV matrix [15] and Tree (b) was produced using the WAG matrix [18].</p>
            </text>
            <graphic file="1471-2148-6-29-1"/>
         </fig>
         <p>The likelihood is calculated as the probability of obtaining the data (multiple sequence alignment) given the model of evolution (substitution model and phylogeny). Ideally we would prefer to use the true tree when performing our model selection as this would remove any conflicting signals from an incorrect base tree. However on real datasets the true tree is unknown so we must use some approximation of the true tree for the model selection procedure in order to estimate the model parameters <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. In the following sections, we provide the results of simulations to investigate the effects of using varying base trees, alignments lengths, among-site rate variation (ASRV) parameters, and amino acid frequencies in amino acid alignments on the selection of protein models. For all of the simulations, we used the same 20 taxon clocklike tree (see Figure <figr fid="F2">2</figr>) used by Posada and Crandall <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> to generate our simulated alignments. We also took a number of previously published real datasets for which the model of amino acid substitution is suspected to follow a particular matrix (due to the source of the data) in order to test the accuracy of the selection method. We then examined the extent of amino acid matrix variation among three large sets of orthologs from each of the Domains of Life.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Base Tree</p>
            </caption>
            <text>
               <p><b>Base Tree</b>. The true tree used to generate all of the simulated alignments.</p>
            </text>
            <graphic file="1471-2148-6-29-2"/>
         </fig>
         <sec>
            <st>
               <p>Base tree sensitivity</p>
            </st>
            <p>The results of the simulations using different base trees (true, random, and NJ-JTT tree) for the model selection procedure are presented in Table <tblr tid="T1">1</tblr>. The most important observation from the table is that the recovery rates are significantly reduced for almost all models when a random tree is used compared to either the true tree or the NJ tree. However there is one notable exception to this where many of the +I+G alignments display slightly higher recovery rates with a random tree than with either the true tree or NJ tree. Further analysis of our results (not shown) shows that this appears to be due to the fact that when a random tree is used, the model selection procedure tends to generally favour over-parameterised models which is consistent with the findings of a previous study on nucleotide sequences <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. There is very little difference between using the true tree and an NJ tree which confirms previous findings for nucleotides that a relatively good tree is sufficient for estimating accurate model parameters <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. It is also very interesting to note that the correct amino acid matrix was selected in almost every case (data not shown) regardless of the base tree, indicating that the only area of uncertainty in these simulations is the correct choice of ASRV.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Base Tree Simulations. Results of simulated datasets when a random, NJ-JTT, and the true tree are used as the base tree for the model selection procedure and the sequence length is 500 characters. Each entry is the number of times out of 100 replicates the correct model was selected by each measure.</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Random</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>NJ-JTT</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>True</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+G</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>67</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I+G</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+G</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+G</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I+G</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+I</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+G</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>59</p>
                     </c>
                     <c ca="center">
                        <p>62</p>
                     </c>
                     <c ca="center">
                        <p>78</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>81</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+I+G</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+I</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="center">
                        <p>70</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+I</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+G</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+I+G</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                     <c ca="center">
                        <p>78</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>We next examined the difference in the models selected using the likelihood values from the quick NJ-JTT base tree and those of fully optimised ML phylogenies produced using all of the individual models (see methods). There is very little difference (&lt;10%) between the model selection accuracy when model selection was carried out using a full ML tree search using each available model and the models selected by the quicker NJ-JTT method (see Table <tblr tid="T2">2</tblr>). For the proteobacteria dataset, the NJ-JTT model selection procedure differed to the full ML analysis selections in fewer than 7% of cases, with most of the different selections being due to selecting the same amino acid rate matrix and different ASRV parameters. The NJ-JTT model selection procedure and ML analysis achieved closer to full agreement in the archaea dataset, where the model predictions given by the two procedures differed in fewer than 5% of cases. There was a similar pattern with the vertebrate dataset where the NJ-JTT model selection procedure differed in fewer than 9% of cases compared to the full ML analysis procedure. Table <tblr tid="T2">2</tblr> shows that in the majority of cases where different models were selected, the same amino acid matrix was selected with the difference being due to different selections of optimal ASRV parameters.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Full ML Comparison. A comparison of the models selected from the likelihood values obtained from a full ML tree search using all models and the likelihood values using the default NJ-JTT base tree. The column 'Identical' indicates the number of times (out of 100 alignments) both procedures selected the same model. The column titled 'Rate' indicates cases when the same amino acid matrix and a different ASRV was selected. The column titled 'Matrix' indicates cases when the a different amino acid matrix was selected.</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>BIC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dataset</p>
                     </c>
                     <c ca="center">
                        <p>Identical</p>
                     </c>
                     <c ca="center">
                        <p>Rate</p>
                     </c>
                     <c ca="center">
                        <p>Matrix</p>
                     </c>
                     <c ca="center">
                        <p>Identical</p>
                     </c>
                     <c ca="center">
                        <p>Rate</p>
                     </c>
                     <c ca="center">
                        <p>Matrix</p>
                     </c>
                     <c ca="center">
                        <p>Identical</p>
                     </c>
                     <c ca="center">
                        <p>Rate</p>
                     </c>
                     <c ca="center">
                        <p>Matrix</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Proteobacteria</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Archaea</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Vertebrate</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Sequence length</p>
            </st>
            <p>One of the factors that is believed to affect the results of the nucleotide model selection is sequence length <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. We wanted to investigate what effect (if any) sequence length would have on amino acid model selection by performing the model selection procedure on varying length alignments. Table <tblr tid="T3">3</tblr> shows the recovery rates of each of the three measures (AIC<sub>1</sub>, AIC<sub>2</sub>, and BIC) for the three different alignment lengths (100, 500, and 1000 characters). As expected, the rates for the longer sequences are increased compared to the shorter sequences. One noticeable feature with the 100 character dataset is that the number of times the correct model was selected when a +I+G ASRV was present was significantly reduced for all matrices. Further examination of the results shows that this is almost always due to the model selection procedure picking the +G version of the model. This is due to the fact that the difference in likelihoods between the +I+G and +G models is quite small at short sequence lengths and not significant enough for the measures to prefer the more parameterised +I+G models. In these cases, we have observed that the <it>&#945; </it>parameter of the gamma distribution is generally estimated to be less than 0.2 in order to accommodate the invariable sites. When the sequence length is increased to 1000 characters, the difference between the likelihoods of the +G and +I+G models increases and is enough for the model selection measures to prefer the +I+G models. As the BIC takes into account sample size (sequence length), it is not affected to the same extent by this phenomenon (see Table <tblr tid="T3">3</tblr>). Again the correct amino acid substitution matrix was selected in almost every case regardless of sequence length (data not shown).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Alignment Length Simulations. Results of the simulated datasets for alignments of 100, 500, and 1000 characters in length. Each entry is the number of times out of 100 replicates the correct model was selected by each measure (using the default NJ-JTT base tree).</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>100</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>500</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>1000</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+G</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I+G</p>
                     </c>
                     <c ca="center">
                        <p>44</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+G</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+G</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I+G</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+I</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+G</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>78</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+I+G</p>
                     </c>
                     <c ca="center">
                        <p>48</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+I</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+I</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+G</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+I+G</p>
                     </c>
                     <c ca="center">
                        <p>50</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Among-site rate variation parameters</p>
            </st>
            <p>ASRV parameters can vary greatly in real datasets therefore it is important to investigate if the model selection procedure is affected in any way by varying ASRV's. Table <tblr tid="T4">4</tblr> shows the results of the simulations where the gamma shape parameter was varied. The most noticeable trend in the table is the reduction in recovery rates of the +G simulations with higher values of <it>&#945;</it>. A closer examination of the results shows that this is due to the model selection measures incorrectly selecting the +I+G ASRV where the true ASRV is +G. It is quite noticeable that the BIC is the least affected measure as it associates a much higher cost for adding more parameters to the model than either of the AIC metrics. Therefore we attribute this reduction in accuracy to be a property of the AIC. This phenomenon is also matched by better results in the +I+G simulations as the values of <it>&#945; </it>are increased. This increase in accuracy is consistent with our earlier result (the gamma parameter being significantly reduced in order to incorporate invariable sites at the short sequence length) meaning that at high values of <it>&#945;</it>, such as 1.0 or 2.0, the separate invariable sites parameter is explicitly required by the model to account for the proportion of invariable sites. Just like in the previous tables, the correct amino acid matrix was selected in almost every case.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Gamma Distribution Simulations. Results of simulations when the <it>&#945; </it>parameter of the gamma distribution was varied between 0.5, 1.0, and 2.0. The sequence length was kept constant at 500 characters and the proportion of invariable sites was 0.2. Each entry is the number of times out of 100 replicates that the correct model was selected.</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p><it>&#945; </it>= 0.5</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p><it>&#945; </it>= 1.0</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p><it>&#945; </it>= 2.0</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>BLOSUM62+G</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>62</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>BLOSUM62+I+G</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+G</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+G</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>33</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>60</p>
                     </c>
                     <c ca="center">
                        <p>66</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I+G</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+G</p>
                     </c>
                     <c ca="center">
                        <p>78</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>43</p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                     <c ca="center">
                        <p>76</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+I+G</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>46</p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                     <c ca="center">
                        <p>76</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+G</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>70</p>
                     </c>
                     <c ca="center">
                        <p>76</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>70</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+I+G</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I+G</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>38</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>74</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+I</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+G</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>78</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>JTT+I+G</p>
                     </c>
                     <c ca="center">
                        <p>48</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+I</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MtREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+I</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+G</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>WAG+I+G</p>
                     </c>
                     <c ca="center">
                        <p>50</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Amino acid frequency perturbation</p>
            </st>
            <p>Each of the protein substitution models consists of an instantaneous rate matrix (Q) which includes a set of original amino acid frequencies (<it>&#960;</it><sub><it>i</it></sub>) obtained from the dataset that was used to generate the model. If we use the observed amino acid frequency parameters of the dataset being examined (denoted by the '+F' suffix) instead, then we include 19 extra free parameters when evaluating each model. We were interested in investigating what effect the change in amino acid frequency proportions would have on the model selection procedure and whether the corresponding '+F' versions of the models would be selected. We would expect our model selection procedure to be robust enough to select the corresponding amino acid matrix despite the variation in amino acid frequencies. Table <tblr tid="T5">5</tblr> shows that when the original model amino acid frequencies were randomly perturbed, there was a definite trend among all of the model selection measures to select the corresponding '+F' version of the particular model over the original models. The recovery rates are extremely high across all categories with only a few exceptions. When amino acid frequencies deviate from the default amino acid frequencies of a particular model, there is a trend towards the '+F' version of the same model.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Amino Acid Frequency Simulations. Results of the simulated datasets where the original amino acid frequencies are randomly perturbed by up to 10% from the original values and the alignment length is 500 characters. Each entry indicates the number of times out of 100 replicates the correct model was selected by each measure.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+F</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>JTT+F</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I+F</p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>JTT+I+F</p>
                     </c>
                     <c ca="center">
                        <p>67</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>94</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+G+F</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G+F</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>JTT+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+F</p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+F</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I+F</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+I+F</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+G+F</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+G+F</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>87</p>
                     </c>
                     <c ca="center">
                        <p>84</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+F</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>WAG+F</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I+F</p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                     <c ca="center">
                        <p>98</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>WAG+I+F</p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+G+F</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>WAG+G+F</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>WAG+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Expected model selections</p>
            </st>
            <p>Some of the amino acid substitution matrices were developed specifically for use with certain types of datasets. For example, the MtREV <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and MtMam <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> models were developed from different mitochondrial datasets. The RtREV matrix was developed specifically for use with retroviral and reverse transcriptase datasets <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Indeed the RtREV authors presented a study showing how the RtREV matrix consistently produced higher likelihoods than other matrices such as JTT and WAG for specific datasets. Consequently it is expected that these matrices will perform quite well during model selection when applied to datasets of similar origin to the original datasets used to develop these models. The results of the model selection procedure for a number of datasets where the substitution process is known are outlined in Table <tblr tid="T6">6</tblr>. We have provided a column that describes the expected amino acid matrix (based on previously published information on each of the alignments). It is clear that there is a noticeable bias in each of the datasets towards some form of the expected amino acid matrix.</p>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Real Dataset Analysis. Results of the model selection on the specialised datasets (see the references for full descriptions of the individual datasets). Amino acid matrix expectations are based on previously published information about the sequences ([19, 54, 55] and LANL [56]).</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="center">
                        <p>Dataset</p>
                     </c>
                     <c ca="center">
                        <p>Source</p>
                     </c>
                     <c ca="center">
                        <p>Expected</p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>1</sub></p>
                     </c>
                     <c ca="center">
                        <p>AIC<sub>2</sub></p>
                     </c>
                     <c ca="center">
                        <p>BIC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>mtCDNApri</p>
                     </c>
                     <c ca="center">
                        <p>Yang [54]</p>
                     </c>
                     <c ca="center">
                        <p>MtMam</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+I+G</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+G</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+G</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>mtCDNAape</p>
                     </c>
                     <c ca="center">
                        <p>Yang [54]</p>
                     </c>
                     <c ca="center">
                        <p>MtMam</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+F</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+F</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+F</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>70pep_nogap</p>
                     </c>
                     <c ca="center">
                        <p>Reyes <it>et al</it>. [55]</p>
                     </c>
                     <c ca="center">
                        <p>MtMam</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+I+G</p>
                     </c>
                     <c ca="center">
                        <p>MtMam+I+G</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>BETA</p>
                     </c>
                     <c ca="center">
                        <p>Dimmic <it>et al</it>. [19]</p>
                     </c>
                     <c ca="center">
                        <p>RtREV</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>ENDO</p>
                     </c>
                     <c ca="center">
                        <p>Dimmic <it>et al</it>. [19]</p>
                     </c>
                     <c ca="center">
                        <p>RtREV</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>GAGGAM</p>
                     </c>
                     <c ca="center">
                        <p>Dimmic <it>et al</it>. [19]</p>
                     </c>
                     <c ca="center">
                        <p>JTT</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G+F</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G+F</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G+F</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>GAGHIV</p>
                     </c>
                     <c ca="center">
                        <p>Dimmic <it>et al</it>. [19]</p>
                     </c>
                     <c ca="center">
                        <p>JTT</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G+F</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G+F</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G+F</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>GAMMA</p>
                     </c>
                     <c ca="center">
                        <p>Dimmic <it>et al</it>. [19]</p>
                     </c>
                     <c ca="center">
                        <p>RtREV</p>
                     </c>
                     <c ca="center">
                        <p>CPREV+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>LENTI</p>
                     </c>
                     <c ca="center">
                        <p>Dimmic <it>et al</it>. [19]</p>
                     </c>
                     <c ca="center">
                        <p>RtREV</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>SPUMA</p>
                     </c>
                     <c ca="center">
                        <p>Dimmic <it>et al</it>. [19]</p>
                     </c>
                     <c ca="center">
                        <p>RtREV</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>NONLTR</p>
                     </c>
                     <c ca="center">
                        <p>Dimmic <it>et al</it>. [19]</p>
                     </c>
                     <c ca="center">
                        <p>RtREV</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+I+G+F</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>SIVPOLPRO</p>
                     </c>
                     <c ca="center">
                        <p>LANL</p>
                     </c>
                     <c ca="center">
                        <p>RtREV</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G+F</p>
                     </c>
                     <c ca="center">
                        <p>RtREV+G</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Model variation among multi-gene datasets</p>
            </st>
            <p>Figure <figr fid="F3">3</figr> shows that for the 2135 proteobacteria orthologs the WAG matrix was selected for approximately 46% of the genes, the RtREV matrix was optimal for the second largest proportion (21%), and a large number of other models best described the other 33% of the genes. The vertebrate dataset (Figure <figr fid="F4">4</figr>) displays a different pattern to the bacterial dataset with the JTT matrix making up 57% of the best-fit models and the WAG matrix making up a much smaller proportion (19%) of models. The most dominant substitution matrix in the archaea dataset (Figure <figr fid="F5">5</figr>) is the RtREV matrix (33%) with the WAG (29%) matrix close behind with the rest of the genes fitting a selection of other models. It is interesting to note the default scoring model used by ClustalW (Blosum62) to align the sequences did not feature very often in the set of optimal matrices. This suggests that the scoring model used for the alignment procedure does not bias the selection of the optimal substitution matrix. A global examination of the figures shows that no single model emerges from the rankings as the overall best overall model for any of the datasets.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Proteobacteria Dataset</p>
               </caption>
               <text>
                  <p><b>Proteobacteria Dataset</b>. A break-down of the set of best-fit protein models for the proteobacteria dataset.</p>
               </text>
               <graphic file="1471-2148-6-29-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Vertebrate Dataset</p>
               </caption>
               <text>
                  <p><b>Vertebrate Dataset</b>. A break-down of the set of best-fit protein models for the vertebrate dataset.</p>
               </text>
               <graphic file="1471-2148-6-29-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Archaea Dataset</p>
               </caption>
               <text>
                  <p><b>Archaea Dataset</b>. A break-down of the set of best-fit protein models for the archaea dataset.</p>
               </text>
               <graphic file="1471-2148-6-29-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Model selection and tree accuracy</p>
            </st>
            <p>Table <tblr tid="T7">7</tblr> shows that when we generated simulated alignments with one particular model and then built phylogenies using each of the other available models, the Robinson-Foulds (RF) distances <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> were either equal or worse than when we built phylogenies using the same model that was used to generate the alignments. These results show that in simulations the choice of protein model has a definite effect the topology of the inferred tree.</p>
            <tbl id="T7">
               <title>
                  <p>Table 7</p>
               </title>
               <caption>
                  <p>Tree Accuracy Simulations. Results of the simulated tree accuracy test where alignments were generated with a particular model and then phylogenies were built using all of the other available models. Each entry is the average scaled Robinson-Foulds (RF) distance [40] over the trees inferred using the alternative models. This test was repeated 10 times for each model and the values in brackets are the RF distances from the true tree when phylogenies were inferred using the model that generated the alignment. Phyml [53] was used to build all trees.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>RF Distance</p>
                     </c>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>RF Distance</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum</p>
                     </c>
                     <c ca="center">
                        <p>0.03 (0.03)</p>
                     </c>
                     <c ca="center">
                        <p>JTT</p>
                     </c>
                     <c ca="center">
                        <p>0.05 (0.05)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I</p>
                     </c>
                     <c ca="center">
                        <p>0.02 (0.02)</p>
                     </c>
                     <c ca="center">
                        <p>JTT+I</p>
                     </c>
                     <c ca="center">
                        <p>0.05 (0.04)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+G</p>
                     </c>
                     <c ca="center">
                        <p>0.08 (0.06)</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G</p>
                     </c>
                     <c ca="center">
                        <p>0.04 (0.03)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.05 (0.05)</p>
                     </c>
                     <c ca="center">
                        <p>JTT+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.12 (0.11)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV</p>
                     </c>
                     <c ca="center">
                        <p>0.05 (0.04)</p>
                     </c>
                     <c ca="center">
                        <p>MtREV</p>
                     </c>
                     <c ca="center">
                        <p>0.06 (0.05)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I</p>
                     </c>
                     <c ca="center">
                        <p>0.09 (0.04)</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+I</p>
                     </c>
                     <c ca="center">
                        <p>0.08 (0.08)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+G</p>
                     </c>
                     <c ca="center">
                        <p>0.06 (0.05)</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>0.07 (0.06)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.07 (0.06)</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.12 (0.1)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff</p>
                     </c>
                     <c ca="center">
                        <p>0.07 (0.07)</p>
                     </c>
                     <c ca="center">
                        <p>WAG</p>
                     </c>
                     <c ca="center">
                        <p>0.02 (0.02)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I</p>
                     </c>
                     <c ca="center">
                        <p>0.06 (0.05)</p>
                     </c>
                     <c ca="center">
                        <p>WAG+I</p>
                     </c>
                     <c ca="center">
                        <p>0.04 (0.04)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+G</p>
                     </c>
                     <c ca="center">
                        <p>0.06 (0.06)</p>
                     </c>
                     <c ca="center">
                        <p>WAG+G</p>
                     </c>
                     <c ca="center">
                        <p>0.1 (0.1)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.05 (0.04)</p>
                     </c>
                     <c ca="center">
                        <p>WAG+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.04 (0.04)</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>In real datasets, the true tree is unknown and therefore it is impossible to know with certainty if we have found the true tree. One possible indication as to whether the choice of model is improving the inferred phylogenies might be to take a large dataset of orthologs and measure the level of congruence among the inferred trees. It would be expected that the congruence among the trees would increase as the optimal models are used to build the trees. We took our proteobacteria dataset (2135 orthologs) and built phylogenies using fixed amino acid matrices and also built phylogenies using the optimal protein model for each alignment. Table <tblr tid="T8">8</tblr> shows that for the proteobacteria dataset when the optimal models were used to infer the trees, the median RF distance was lower than using a fixed model in the majority of cases.</p>
            <tbl id="T8">
               <title>
                  <p>Table 8</p>
               </title>
               <caption>
                  <p>Proteobacteria Tree Accuracy Analysis. The scaled Robinson-Foulds (RF) distances [40] of the trees produced from the Proteobacteria dataset using fixing a model used to build trees from each alignment. The values reported are the median and average distance computed by comparing every tree against every other tree. When the optimal set of models were used the median was 0.22 and the average was 0.34. Phyml [53] was used to build all trees.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>Median RF</p>
                     </c>
                     <c ca="center">
                        <p>Mean RF</p>
                     </c>
                     <c ca="center">
                        <p>Model</p>
                     </c>
                     <c ca="center">
                        <p>Median RF</p>
                     </c>
                     <c ca="center">
                        <p>Mean RF</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>JTT</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>JTT+I</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>JTT+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Blosum+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>JTT+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>MtREV</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+I</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>CPREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>MtREV+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff</p>
                     </c>
                     <c ca="center">
                        <p>0.2</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>WAG</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>WAG+I</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+G</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>WAG+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Dayhoff+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>WAG+I+G</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We have studied the influence of various factors on protein model selection. Our simulations have confirmed previous work showing that the model selection procedure performs quite accurately using an approximate tree for model selection. One of the most interesting results that we have shown using real datasets is that less than 9% of the time was a different matrix selected using a full ML analysis than those selected using a quick NJ-JTT method. This further strengthens the recent results presented by Sullivan <it>et al</it>. <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> showing that successive-approximation methodologies to phylogeny estimation does not suffer from any significant loss in accuracy. Our simulations have also shown that protein model selection is not as sensitive as nucleotide model selection to sequence length differences. Recovery rates remain relatively constant over different sequence lengths with the only exception to this being at short sequence lengths when the difference in likelihood values can result in an overly-simplistic model being selected (+G instead of +I+G). We have also shown that when amino acid frequencies deviate from the default amino acid frequencies of each model, there is a clear trend towards the '+F' version of each model. This phenomenon was also observed in the results of the real dataset analysis presented in Table <tblr tid="T5">5</tblr> with many of the real datasets being best described by '+F' versions of the expected models. One constant trend across all of the simulations we have performed is that the correct amino acid matrix is selected by both measures close to 100% of the time regardless of factors such as base tree accuracy, sequence length, ASRV variances, or amino acid frequencies.</p>
         <p>It should be emphasized that many of the current set of models of amino acid or nucleotide substitution make many unrealistic assumptions such as reversibility, amino acid composition stationarity, and homogeneous substitution rates. However much work is currently taking place to develop methods to loosen many of these restrictions <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. While the focus of this work has been to demonstrate the usefulness of performing protein model selection, it must be stated that model selection measures can only provide information on which of the given set of models best-fits the data and cannot give any indication of how close a particular model is to reality.</p>
         <p>We have highlighted an example where two highly-supported and topologically different phylogenies were produced from the same alignment using two arbitrarily selected amino acid substitution matrices (see Fig. <figr fid="F1">1</figr>). The likelihood values of the two trees are -30722 for the MtREV tree and -28996 for the WAG tree with the extremely high bootstrap support values providing evidence that the observed trees are not due to a stochastic error (e.g. the treesearch getting stuck in a local optima). To further rule out any source of stochastic error, the corresponding likelihoods for the MtREV tree with the WAG matrix is -29288 and -30959 for the WAG tree with the MtREV matrix, thereby confirming that both matrices favor different tree topologies. A tree constructed using the optimal model for this alignment (RtREV+I+G+F) agrees with the WAG tree. At first glance, our particular example may seem slightly unrealistic as we have used a mitochondrial model to construct a tree from nuclear genes. However, as we have shown, one of the best models for proteobacterial and archaeal genes is frequently (22% and 33% of the time respectively) a model that was derived from retroviral Pol proteins. Therefore, <it>ad hoc </it>model selection, even when using arguments about the origin of the model (nuclear versus organelle, or some such) are still <it>ad hoc </it>arguments. The maximum likelihood principle suggests the use of the best of the available models and in some cases, the best performing model can be surprising.</p>
         <p>The results of our cross-domain substitution model analysis are interesting as there are noticeable differences in the groups of models selected by each dataset with no single matrix emerging as the best for any of the datasets. The large diversity of amino acid matrices cannot come as a great surprise as it would seem intuitively unreasonable to assume that a very large group of independently evolving gene families from a fixed taxon set followed an identical amino acid substitution pattern. Perhaps one of the most significant findings is that the RtREV matrix <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> features so prominently in both the proteobacteria and archaea datasets. The WAG matrix <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, for instance was derived from a globular protein dataset and was shown to produce higher likelihood values in general, compared with the JTT matrix for the dataset from which it was derived. This seemed to indicate that choosing a matrix based upon the method or the data used to derive the matrix might be a good idea. However, our finding that for so many alignments from cellular life, the best matrix was derived from viral sequences is surprising and the consequence is that <it>ad hoc </it>arguments for choice of matrix may not reasonable.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this study, we have analysed the ability of the AIC and the BIC to select the appropriate evolutionary model in cases where the model is known. We have shown that both methods are suitable for this purpose. We have also shown that none of the currently available models is universally preferred for all alignments and that there is considerable variation in the substitution process across protein families. What we have not attempted to show is that for any given alignment the selected model is the actual model that gave rise to the observed data. However, on the basis of our results we can speculate on the appropriateness of the models. Considering that a viral model is one of the most preferred models for these cellular sequences, perhaps none of the models are really capturing the data. The models are homogeneous across the tree and this is likely to be a simplification. Therefore, even though we have produced a robust method of model selection, it is likely that the models themselves need to be improved.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>The AIC is a popular model selection measure that attempts to strike a balance between the goodness-of-fit and complexity of a model. The AIC is calculated by</p>
         <p><it>AIC</it><sub>1 </sub>= -2 ln <it>L</it><sub><it>i </it></sub>+ 2<it>N</it><sub><it>i</it></sub>, &#160;&#160;&#160; (1)</p>
         <p>where <it>N</it><sub><it>i </it></sub>is the number of free parameters in model <it>i </it>and <it>L</it><sub><it>i </it></sub>is the likelihood value of model <it>i</it>. Posada and Crandall <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> presented evidence to show that the more empirically tuned AIC<sub>2 </sub>can sometimes be more accurate at determining the correct nucleotide substitution model. It is calculated by replacing the 2N<sub><it>i </it></sub>term with 5N<sub><it>i </it></sub>thus further penalising models of greater complexity. The BIC is another model selection measure and is equivalent to selecting the model with the maximum posterior probability and is calculated from</p>
         <p><it>BIC </it>= -2 ln <it>L</it><sub><it>i </it></sub>+ <it>N</it><sub><it>i </it></sub>ln <it>n</it>, &#160;&#160;&#160; (2)</p>
         <p>where <it>n </it>is the sample size (sequence length). The AIC and BIC select the best model by choosing the model with the minimum criterion value. The main difference between the three measures is that the AIC<sub>2 </sub>and BIC tend to select simpler models than the AIC<sub>1 </sub>because they penalise the addition of further model parameters more than the AIC<sub>1 </sub><abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. If the models that rank highest for a given dataset all include a certain ASRV parameter, then the AIC and BIC will essentially become an ordering with respect to the likelihood values.</p>
         <p>We have recently developed a protein model selection program called MODELGENERATOR <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. It initially constructs a neighbor-joining (NJ) tree using an arbitrary model (default is Jukes-Cantor <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> for nucleotides and JTT <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> for proteins) in order to get an initial base tree for comparison of models. For each model examined, the branch lengths of the tree and model parameters are optimised independently using the PAL library <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The program supports 10 amino acid matrices and 14 nucleotide models with either a proportion of invariable sites (+I), gamma shape ASRV (+G), combined invariable and gamma distribution (+I+G), and for amino acids the observed amino acid frequencies (+F). When all matrix and ASRV permutations are considered, a total of 56 nucleotide and 80 protein models can be derived. In the following subsections, we outline how we investigated the effects of various properties of amino acid alignments on the three non-nested model selection measures (AIC<sub>1</sub>, AIC<sub>2</sub>, BIC) when applied to protein model selection. For all of the simulations, we used the same 20 taxon clocklike tree used by Posada and Crandall <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> and the program Seq-Gen vl.3.2 <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> to generate all simulated protein alignments. The presence or absence of a molecular clock in the base tree has been shown to have a negligible effect on the model selection procedure <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. For all of the simulated and real data tests performed below, MODELGENERATOR was not constrained <it>a priori </it>and the full set of 80 protein models was examined during every execution.</p>
         <sec>
            <st>
               <p>Base tree sensitivity</p>
            </st>
            <p>In order to compare the sensitivity of protein model selection to the accuracy of the base tree, we generated 2400 individual alignments of 500 characters in length using each of the protein models available in Seq-Gen (100 alignments per model) fixing the proportion of invariable sites at 0.2 and the <it>&#945; </it>parameter of the gamma distribution to 0.5 where appropriate and then performed model selection using three different base trees &#8211; the true tree, an NJ-JTT tree, and a randomly generated tree.</p>
            <p>To further investigate the effect of using a distance-based tree for comparison rather than the fully optimised ML tree of each model, we obtained three real datasets from each of the Domains of life. The first dataset consists of 2135 gene families obtained from 25 complete proteobacteria genomes. The homologs were identified by performing all-against-all blast searches <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> of the 25 fully completed genomes with an e-value cutoff of 10<sup>-7</sup>. The sequences were aligned using ClustalW 1.81 <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> with the parameters unchanged from their default settings. The alignments were manually edited to remove badly aligned areas and large gapped areas. The second dataset consisted of amino acid sequences of 16 archaeal genomes retrieved from the COGENT database <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> and one (<it>Haloarcula marismortui</it>) from the NCBI. We identified gene families where all members of the family were capable of identifying all other members of the family during database searches (with an e-value cutoff of 10<sup>-7</sup>). Each of these families consisted of between 4 and 16 taxa and were aligned using ClustalW 1.81 using the default settings <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. The final dataset is a previously published set of 118 vertebrate gene families which included representatives of all the major vertebrate groups obtained from the HOVERGEN database <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> with each alignment consisting of between 4 and 58 taxa <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. For each dataset, we took a subset of 100 alignments and used Phyml <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> to construct fully optimised ML phylogenies with each of the available protein models and recorded the final likelihood of each individual phylogeny. We limited the ML analysis to 100 randomly-chosen alignments from each dataset due to excessive execution times for the full ML analyses. We took the final likelihood values produced by Phyml and determined the best-fit model and then compared the selected models with those produced by the NJ-JTT model selection procedure.</p>
         </sec>
         <sec>
            <st>
               <p>Sequence length</p>
            </st>
            <p>We generated 100 replicate alignments of each of the protein models available in Seq-Gen consisting of 100, 500, and 1000 characters in length. For these tests, we fixed the proportion of invariable sites at 0.2 and the <it>&#945; </it>parameter of the gamma distribution to 0.5 where appropriate. We performed model estimation using MODELGENERATOR and recorded the model selected by each of the available tests (AIC1, AIC2, BIC).</p>
         </sec>
         <sec>
            <st>
               <p>Rate-distribution parameters</p>
            </st>
            <p>In order to investigate the possible effect of varying ASRV parameters, we generated a number of different simulated datasets (100 replicate alignments per model) with a fixed sequence length of 500 characters and varied the <it>&#945; </it>parameter of the gamma ASRV between 0.5, 1.0, and 2.0 (corresponding to high, medium, and low rate heterogeneity).</p>
         </sec>
         <sec>
            <st>
               <p>Amino acid frequency perturbation</p>
            </st>
            <p>In order to create these simulated '+F' alignments, we took the original amino acid frequencies of each model and randomly perturbed each of the individual amino acid frequencies by up to 10% change from its original value in each model (ensuring that the summation of the new set of frequencies remained 1.0) and then used Seq-Gen to generate an alignment using the new set of amino acid frequencies according to the substitution process of the individual model (see algorithm in Figure <figr fid="F6">6</figr>). We generated a total of 2400 alignments of each of the protein models available in Seq-Gen (100 alignments per model) consisting of 500 characters in length in this way so that each individual alignment has a unique set of amino acid frequencies. This analysis more accurately simulates real datasets where the amino acid frequency proportions may differ significantly from the corresponding best-fit model's amino acid frequencies.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Pseudo Code</p>
               </caption>
               <text>
                  <p><b>Pseudo Code</b>. The algorithm used to generate the simulated +F alignments can be described in pseudocode as follows. The function random returns a random number greater than the first argument and less than the second argument.</p>
               </text>
               <graphic file="1471-2148-6-29-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Expected model selection</p>
            </st>
            <p>We obtained the two primate mitochondrial datasets that are included as example datasets in Paml 3.14 <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, namely the files mtCDNApri.aa and mtCDNAape.nuc (translated the nucleotide sequences to amino acids). We also obtained the amino acid sequences of a recent study examining congruence among mammalian mitochondrial and nuclear genes <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. We downloaded a copy of the complete test dataset used in the creation and testing of the RtREV matrix <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. We also obtained a Pol alignment from the 2003 HIV and SIV alignments database <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> and performed model selection on all of the sequences mentioned.</p>
         </sec>
         <sec>
            <st>
               <p>Model variation among empirical datasets</p>
            </st>
            <p>For this test, we used the full set of sequences from the three real datasets of the each Domain of life (as described above). We performed model prediction for each alignment in the datasets in order to assess the extent of model differences within the gene families.</p>
         </sec>
         <sec>
            <st>
               <p>Model selection and tree accuracy</p>
            </st>
            <p>To test for the effect of <it>ad hoc </it>model selection on tree accuracy, we performed an analysis on both simulated and real data. In the simulated analysis, we generated alignments of 500 characters in length using all of the amino acid matrices and rate distributions setting the proportion of invariable sites to 0.3 and the <it>&#945; </it>parameter of the gamma distribution to 0.5 where appropriate. The base tree in Figure <figr fid="F2">2</figr> was used to generate all alignments. We then analysed each alignment using all of the possible models except the one used to generate the alignment. We repeated this test 10 times for each model and computed the average scaled RF distance <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> from all the inferred trees to the true tree for each model. We then build phylogenies using the same model as was used to generate each alignment and reported the RF distance to the true tree also.</p>
            <p>In an attempt to analyse the effect of <it>ad hoc </it>model selection on tree accuracy on real data, we took the proteobacteria dataset (2135 orthologs) and built phylogenies by using each of the available models as a fixed model for the entire dataset. We recorded the median and average of the all-against-all RF distances of the trees using Clann v3.0.3 <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. For all possible pairs of trees, Clann prunes the taxa of the trees so that only the common taxa are left and then computes the scaled RF distance.</p>
         </sec>
         <sec>
            <st>
               <p>Supplementary data</p>
            </st>
            <p>All of the simulated and real alignments mentioned in the paper are available for download from <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>TMK and JOM initially formulated the idea for the manuscript. CJC and MMP provide some of the real datsets used in the analyses. TMK developed the software, performed the experiments, and drafted the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Peter Foster and Zheng Yang for providing valuable comments on the manuscript. We wish to acknowledge James Cotton and Rod Page for making their vertebrate dataset available for our analysis. We thank Davide Pisani and Jennifer Commins for helping create the figures. We would like to acknowledge the financial support of the Irish Research Council for Science, Engineering and Technology (IRCSET). The authors wish to acknowledge the SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Cases in which parsimony and compatibility methods will be positively misleading</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Syst Zool</source>
            <pubdate>1978</pubdate>
            <volume>27</volume>
            <fpage>401</fpage>
            <lpage>410</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Success of maximum likelihood phylogeny inference in the four-taxon case</p>
            </title>
            <aug>
               <au>
                  <snm>Gaut</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1995</pubdate>
            <volume>12</volume>
            <fpage>152</fpage>
            <lpage>162</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7877489</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Should we use model-based methods for phylogenetic inference when we know assumptions about among-site rate variation and nucleotide substitution pattern are violated?</p>
            </title>
            <aug>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2001</pubdate>
            <volume>50</volume>
            <fpage>723</fpage>
            <lpage>729</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/106351501753328848</pubid>
                  <pubid idtype="pmpid" link="fulltext">12116942</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA</p>
            </title>
            <aug>
               <au>
                  <snm>Anderson</snm>
                  <fnm>FE</fnm>
               </au>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Mol Phylogenet Evol</source>
            <pubdate>2004</pubdate>
            <volume>33</volume>
            <fpage>440</fpage>
            <lpage>451</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ympev.2004.06.015</pubid>
                  <pubid idtype="pmpid" link="fulltext">15336677</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics</p>
            </title>
            <aug>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>J Mamm Evol</source>
            <pubdate>1997</pubdate>
            <volume>4</volume>
            <fpage>477</fpage>
            <lpage>486</lpage>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Success of parsimony in the four-taxon case: long-branch repulsion by likelihood in the Farris zone</p>
            </title>
            <aug>
               <au>
                  <snm>Siddall</snm>
                  <fnm>ME</fnm>
               </au>
            </aug>
            <source>Cladistics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>209</fpage>
            <lpage>220</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1111/j.1096-0031.1998.tb00334.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods</p>
            </title>
            <aug>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Waddell</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Foster</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2001</pubdate>
            <volume>50</volume>
            <fpage>525</fpage>
            <lpage>539</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/106351501750435086</pubid>
                  <pubid idtype="pmpid" link="fulltext">12116651</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Topological bias and inconsistency of maximum likelihood using wrong models</p>
            </title>
            <aug>
               <au>
                  <snm>Bruno</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1999</pubdate>
            <volume>16</volume>
            <fpage>564</fpage>
            <lpage>566</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10331281</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The effects of nucleotide substitution model assumptions on estimates of non-parametric bootstrap support</p>
            </title>
            <aug>
               <au>
                  <snm>Buckley</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Cunningham</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>394</fpage>
            <lpage>405</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11919280</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Model misspeciflcation and probabilistic tests of topology: Evidence from empirical data sets</p>
            </title>
            <aug>
               <au>
                  <snm>Buckley</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Cunningham</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2002</pubdate>
            <volume>51</volume>
            <fpage>509</fpage>
            <lpage>523</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/10635150290069922</pubid>
                  <pubid idtype="pmpid" link="fulltext">12079647</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Phylogenetic relationships among eutherian orders estimated from inferred sequences of mitochondrial proteins: instability of a tree based on a single gene</p>
            </title>
            <aug>
               <au>
                  <snm>Cao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Adachi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Janke</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1994</pubdate>
            <volume>39</volume>
            <issue>5</issue>
            <fpage>519</fpage>
            <lpage>552</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00173421</pubid>
                  <pubid idtype="pmpid">7807540</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A model of evolutionary change in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Dayhoff</snm>
                  <fnm>MO</fnm>
               </au>
               <au>
                  <snm>Eck</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Atlas of Protein Sequence and Structure</source>
            <publisher>Washington D.C.: National Biomedical Research Foundation</publisher>
            <editor>Dayhoff MO</editor>
            <pubdate>1972</pubdate>
            <fpage>89</fpage>
            <lpage>99</lpage>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Amino acid substitution matrices from protein blocks</p>
            </title>
            <aug>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>JG</fnm>
               </au>
            </aug>
            <source>Proc Nail Acad Sci USA</source>
            <pubdate>1992</pubdate>
            <volume>89</volume>
            <fpage>10915</fpage>
            <lpage>10919</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The rapid generation of mutation data matrices from protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1992</pubdate>
            <volume>8</volume>
            <fpage>275</fpage>
            <lpage>282</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1633570</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Model of amino acid substitution in proteins encoded by mitochondrial DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Adachi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1996</pubdate>
            <volume>42</volume>
            <fpage>459</fpage>
            <lpage>468</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00013324</pubid>
                  <pubid idtype="pmpid" link="fulltext">8642615</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Models of amino acid substitution and applications to Mitochondrial protein evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1998</pubdate>
            <volume>15</volume>
            <fpage>1600</fpage>
            <lpage>1611</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9866196</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Modeling amino acid replacement</p>
            </title>
            <aug>
               <au>
                  <snm>Muller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Comp Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>761</fpage>
            <lpage>776</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1089/10665270050514918</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach</p>
            </title>
            <aug>
               <au>
                  <snm>Whelan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>691</fpage>
            <lpage>699</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11319253</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>rtREV: a substitution matrix for inference of retrovirus and reverse transcriptase phylogeny</p>
            </title>
            <aug>
               <au>
                  <snm>Dimmic</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Rest</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Mindell</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2002</pubdate>
            <volume>55</volume>
            <fpage>65</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-001-2304-y</pubid>
                  <pubid idtype="pmpid" link="fulltext">12165843</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Estimating Amino Acid Substitution Models: A Comparison of Dayhoff's Estimator, the Resolvent Approach and a Maximum Likelihood Method</p>
            </title>
            <aug>
               <au>
                  <snm>Muller</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Spang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>8</fpage>
            <lpage>13</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11752185</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A transition probability model for amino acid substitutions from blocks</p>
            </title>
            <aug>
               <au>
                  <snm>Veerassamy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tillier</snm>
                  <fnm>ERM</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <fpage>997</fpage>
            <lpage>1010</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/106652703322756195</pubid>
                  <pubid idtype="pmpid" link="fulltext">14980022</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Different versions of the Dayhoff rate matrix</p>
            </title>
            <aug>
               <au>
                  <snm>Kosiol</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>193</fpage>
            <lpage>199</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi005</pubid>
                  <pubid idtype="pmpid" link="fulltext">15483331</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Model selection and model averaging in phylogenetics: advantages of the AIC and Bayesian approaches over likelihood ratio tests</p>
            </title>
            <aug>
               <au>
                  <snm>Posada</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Buckley</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2004</pubdate>
            <volume>53</volume>
            <fpage>793</fpage>
            <lpage>808</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/10635150490522304</pubid>
                  <pubid idtype="pmpid" link="fulltext">15545256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Phylogeny estimation and hypothesis testing using maximum likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Cao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Adachi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Janke</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Annu Rev Ecol Syst</source>
            <pubdate>1997</pubdate>
            <volume>28</volume>
            <fpage>437</fpage>
            <lpage>466</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1146/annurev.ecolsys.28.1.437</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Evolutionary trees from DNA sequences: A maximum likelihood approach</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1981</pubdate>
            <volume>17</volume>
            <fpage>368</fpage>
            <lpage>376</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF01734359</pubid>
                  <pubid idtype="pmpid">7288891</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Dating the human-ape splitting by a molecular clock of mitochondrial DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kishino</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yano</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1985</pubdate>
            <volume>22</volume>
            <fpage>160</fpage>
            <lpage>174</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02101694</pubid>
                  <pubid idtype="pmpid">3934395</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>A new look at the statistical model identification</p>
            </title>
            <aug>
               <au>
                  <snm>Akaike</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>IEEE Trans Aut Control</source>
            <pubdate>1974</pubdate>
            <volume>19</volume>
            <fpage>716</fpage>
            <lpage>723</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/TAC.1974.1100705</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Estimating the dimension of a model</p>
            </title>
            <aug>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Annal Stat</source>
            <pubdate>1978</pubdate>
            <volume>6</volume>
            <fpage>461</fpage>
            <lpage>464</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>MODELGENERATOR download</p>
            </title>
            <url>http://bioinf.nuim.ie/software/modelgenerator/</url>
         </bibl>
         <bibl id="B30">
            <title>
               <p>ProtTest: Selection of best-fit models of protein evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Abascal</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Zardoya</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Posada</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>9</issue>
            <fpage>2104</fpage>
            <lpage>2105</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti263</pubid>
                  <pubid idtype="pmpid" link="fulltext">15647292</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox</p>
            </title>
            <aug>
               <au>
                  <snm>Brochier</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Forterre</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gribaldo</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R17</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395767</pubid>
                  <pubid idtype="pmpid" link="fulltext">15003120</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-3-r17</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Chloroplast phylogeny indicates that bryophytes are monophyletic</p>
            </title>
            <aug>
               <au>
                  <snm>Nishiyama</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Kugita</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sinclair</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Sugita</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sugiura</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wakasugi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yamada</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yoshinaga</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yamaguchi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ueda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hasebe</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <issue>10</issue>
            <fpage>1813</fpage>
            <lpage>1819</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh203</pubid>
                  <pubid idtype="pmpid" link="fulltext">15240838</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Phylogenomics of Eukaryotes: Impact of Missing Data on Large Alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Philippe</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Snell</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Bapteste</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Holland</snm>
                  <fnm>PWH</fnm>
               </au>
               <au>
                  <snm>Casane</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>1740</fpage>
            <lpage>1752</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh182</pubid>
                  <pubid idtype="pmpid" link="fulltext">15175415</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>The Opisthokonta and the Ecdysozoa may not be Glades: Stronger Support for the Grouping of Plant and Animal than for Animal and Fungi and Stronger Support for the Coelomata than Ecdysozoa</p>
            </title>
            <aug>
               <au>
                  <snm>Philip</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Creevey</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>McInerney</snm>
                  <fnm>JO</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>1175</fpage>
            <lpage>1184</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi102</pubid>
                  <pubid idtype="pmpid" link="fulltext">15703245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Comparison of models for nucleotide substitution used in maximum likelihood phylogenetic estimation</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Friday</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1994</pubdate>
            <volume>11</volume>
            <fpage>316</fpage>
            <lpage>324</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8170371</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Selecting the Best-Fit Model of Nucleotide Substitution</p>
            </title>
            <aug>
               <au>
                  <snm>Posada</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Crandall</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2001</pubdate>
            <volume>50</volume>
            <issue>4</issue>
            <fpage>580</fpage>
            <lpage>601</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/106351501750435121</pubid>
                  <pubid idtype="pmpid" link="fulltext">12116655</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The effect of topology on estimates of among-site rate variation</p>
            </title>
            <aug>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Holsinger</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1996</pubdate>
            <volume>42</volume>
            <fpage>308</fpage>
            <lpage>312</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF02198857</pubid>
                  <pubid idtype="pmpid" link="fulltext">8919882</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Evaluating the performance of a Sucessive-Approximations Approach to Parameter Optimization in Maximum-Likelihood Phylogeny Estimation</p>
            </title>
            <aug>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Abdo</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Joyce</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <issue>6</issue>
            <fpage>1386</fpage>
            <lpage>1392</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi129</pubid>
                  <pubid idtype="pmpid" link="fulltext">15758203</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Accounting for Uncertainty in the Tree Topology Has Little Effect on the Decision-Theoretic Approach to Model Selection in Phylogeny Estimation</p>
            </title>
            <aug>
               <au>
                  <snm>Abdo</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Minin</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Joyce</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>691</fpage>
            <lpage>703</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi050</pubid>
                  <pubid idtype="pmpid" link="fulltext">15548751</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Comparison of phylogenetic trees</p>
            </title>
            <aug>
               <au>
                  <snm>Robinson</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Foulds</snm>
                  <fnm>LR</fnm>
               </au>
            </aug>
            <source>Math BioSci</source>
            <pubdate>1981</pubdate>
            <volume>53</volume>
            <fpage>131</fpage>
            <lpage>47</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0025-5564(81)90043-2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach</p>
            </title>
            <aug>
               <au>
                  <snm>Sanderson</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>101</fpage>
            <lpage>109</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11752195</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Modeling compositional heterogeneity</p>
            </title>
            <aug>
               <au>
                  <snm>Foster</snm>
                  <fnm>PG</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2004</pubdate>
            <volume>53</volume>
            <fpage>485</fpage>
            <lpage>495</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/10635150490445779</pubid>
                  <pubid idtype="pmpid" link="fulltext">15503675</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Accuracy of Rate Estimation Using Relaxed-Clock Models with a Critical Focus on the Early Metazoan Radiation</p>
            </title>
            <aug>
               <au>
                  <snm>Ho</snm>
                  <fnm>SYW</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Drummond</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2005</pubdate>
            <volume>22</volume>
            <fpage>1355</fpage>
            <lpage>1363</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msi125</pubid>
                  <pubid idtype="pmpid" link="fulltext">15758207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Why likelihood?</p>
            </title>
            <aug>
               <au>
                  <snm>Forster</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Sober</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Likelihood and Evidence</source>
            <publisher>Chicago: University of Chicago Press</publisher>
            <editor>Taper M, Lele S</editor>
            <pubdate>1972</pubdate>
            <fpage>89</fpage>
            <lpage>99</lpage>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Evolution of protein molecules</p>
            </title>
            <aug>
               <au>
                  <snm>Jukes</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Cantor</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Mammalian protein metabolism</source>
            <publisher>New York: Academic Press</publisher>
            <editor>Munro HM</editor>
            <pubdate>1969</pubdate>
            <fpage>21</fpage>
            <lpage>132</lpage>
         </bibl>
         <bibl id="B46">
            <title>
               <p>PAL: An object-oriented programming library for molecular evolution and phylogenetics</p>
            </title>
            <aug>
               <au>
                  <snm>Drummond</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Strimmer</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>662</fpage>
            <lpage>663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.7.662</pubid>
                  <pubid idtype="pmpid" link="fulltext">11448888</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees</p>
            </title>
            <aug>
               <au>
                  <snm>Rambaut</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Crassly</snm>
                  <fnm>NC</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>235</fpage>
            <lpage>238</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9183526</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
         </bibl>
         <bibl id="B49">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid">7984417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>COGENT: a flexible data environment for computational genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Janssen</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Audit</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Cases</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Goldovsky</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Harte</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kunin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ouzounis</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1451</fpage>
            <lpage>1452</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg161</pubid>
                  <pubid idtype="pmpid" link="fulltext">12874064</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>HOVERGEN: a database of homologous vertebrate genes</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Gouy</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>2360</fpage>
            <lpage>2365</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">523695</pubid>
                  <pubid idtype="pmpid">8036164</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Going nuclear: gene family evolution and vertebrate phylogeny reconciled</p>
            </title>
            <aug>
               <au>
                  <snm>Cotton</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Page</snm>
                  <fnm>RDM</fnm>
               </au>
            </aug>
            <source>Proc R Soc Lond B Biol Sci</source>
            <pubdate>2001</pubdate>
            <volume>269</volume>
            <fpage>1555</fpage>
            <lpage>1561</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1098/rspb.2002.2074</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Guindon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gascuel</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2003</pubdate>
            <volume>52</volume>
            <fpage>696</fpage>
            <lpage>704</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/10635150390235520</pubid>
                  <pubid idtype="pmpid" link="fulltext">14530136</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>PAML: a program package for phylogenetic analysis by maximum likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>555</fpage>
            <lpage>556</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9367129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Congruent Mammalian Trees from Mitochondrial and Nuclear Genes Using Bayesian Methods</p>
            </title>
            <aug>
               <au>
                  <snm>Reyes</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gissi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Catzeflis</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Nevo</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Saccone</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>397</fpage>
            <lpage>403</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh033</pubid>
                  <pubid idtype="pmpid" link="fulltext">14660685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>HIV and SIV Database at Los Almos National Laboratory</p>
            </title>
            <url>http://hiv-web.lanl.gov</url>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Clann: investigating phylogenetic information through supertree analyses</p>
            </title>
            <aug>
               <au>
                  <snm>Creevey</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>McInerney</snm>
                  <fnm>JO</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>3</issue>
            <fpage>390</fpage>
            <lpage>392</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti020</pubid>
                  <pubid idtype="pmpid" link="fulltext">15374874</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>Supplementary Data</p>
            </title>
            <url>http://bioinf.nuim.ie/supplementary/models/</url>
         </bibl>
      </refgrp>
   </bm>
</art>
