<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2148-8-276</ui>
   <ji>1471-2148</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Relationships of gag-pol diversity between <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements and the three kings hypothesis</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Llorens</snm>
               <fnm>Carlos</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>carlos.llorens@uv.es</email>
            </au>
            <au id="A2">
               <snm>Fares</snm>
               <mi>A</mi>
               <fnm>Mario</fnm>
               <insr iid="I3"/>
               <email>faresm@tcd.ie</email>
            </au>
            <au id="A3">
               <snm>Moya</snm>
               <fnm>Andres</fnm>
               <insr iid="I1"/>
               <insr iid="I4"/>
               <email>andres.moya@uv.es</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Institut Cavanilles de Biodiversitat i Biolog&#237;a Evolutiva, Universitat de Val&#232;ncia, Pol&#237;gono de la coma S/N, Paterna, Valencia, Spain</p>
            </ins>
            <ins id="I2">
               <p>Biotechvana, Parc Cientific, Universitat de Valencia, Paterna, Lab 16D Pol&#237;gono de la coma S/N, Paterna, Valencia, Spain</p>
            </ins>
            <ins id="I3">
               <p>Department of Genetics, University of Dubl&#237;n, Trinity Collage Dubl&#237;n, Dubl&#237;n 2, Ireland</p>
            </ins>
            <ins id="I4">
               <p>CIBER de Epidemiolog&#237;a y Sal ud P&#250;blica (CIBERESP), Spain</p>
            </ins>
         </insg>
         <source>BMC Evolutionary Biology</source>
         <issn>1471-2148</issn>
         <pubdate>2008</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>276</fpage>
         <url>http://www.biomedcentral.com/1471-2148/8/276</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18842133</pubid>
               <pubid idtype="doi">10.1186/1471-2148-8-276</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>23</day>
               <month>3</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>08</day>
               <month>10</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>08</day>
               <month>10</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Llorens et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The origin of vertebrate retroviruses (<it>Retroviridae</it>) is yet to be thoroughly investigated, but due to their similarity and identical gag-pol (and env) genome structure, it is accepted that they evolve from <it>Ty3/Gypsy </it>LTR retroelements the retrotransposons and retroviruses of plants, fungi and animals. These 2 groups of LTR retroelements code for 3 proteins rarely studied due to the high variability &#8211; gag polyprotein, protease and GPY/F module. In relation to 3 previously proposed <it>Retroviridae </it>classes I, II and II, investigation of the above proteins conclusively uncovers important insights regarding the ancient history of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We performed a comprehensive study of 120 non-redundant <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements. Phylogenetic reconstruction inferred based on the concatenated analysis of the gag and pol polyproteins shows a robust phylogenetic signal regarding the clustering of OTUs. Evaluation of gag and pol polyproteins separately yields discordant information. While pol signal supports the traditional perspective (2 monophyletic groups), gag polyprotein describes an alternative scenario where each <it>Retroviridae </it>class can be distantly related with one or more <it>Ty3/Gypsy </it>lineages. We investigated more in depth this evidence through comparative analyses performed based on the gag polyprotein, the protease and the GPY/F module. Our results indicate that contrary to the traditional monophyletic view of the origin of vertebrate retroviruses, the <it>Retroviridae </it>class I is a molecular fossil, preserving features that were probably predominant among <it>Ty3/Gypsy </it>ancestors predating the split of plants, fungi and animals. In contrast, classes II and III maintain other phenotypes that emerged more recently during <it>Ty3/Gypsy </it>evolution.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The 3 <it>Retroviridae </it>classes I, II and III exhibit phenotypic differences that delineate a network never before reported between <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements. This new scenario reveals how the diversity of vertebrate retroviruses is polyphyletically recurrent into the <it>Ty3/Gypsy </it>evolution, i.e. older than previously thought. The simplest hypothesis to explain this finding is that classes I, II and III trace back to at least 3 <it>Ty3/Gypsy </it>ancestors that emerged at different evolutionary times prior to protostomes-deuterostomes divergence. We have called this "the three kings hypothesis" concerning the origin of vertebrate retroviruses.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Attention was first drawn to the <it>Retroviridae </it>when HTLV-1 was characterized as pathogenic in humans <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. They further increased in significance with the discovery of HIV-1, the retrovirus responsible for AIDS in humans <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. These 2 retroviruses represent only a small part of <it>Retroviridae </it>diversity, which can be divided in seven genera; <it>Alpha-, Beta-, Gamma-, Delta-, Epsilon-, Spumaretroviridae </it>and <it>Lentiviridae </it>(according to ICTV classification <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>). Based on their strategy of transmission, the <it>Retroviridae </it>can also be classified as endogenous retroviruses when they enter the germ lines of hosts and are vertically transmitted; or as exogenous retroviruses, when they can be transmitted horizontally from one host into another via infection. Most recent trends in <it>Retroviridae </it>taxonomy <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> group endogenous and exogenousretroviruses into 3 major classes designated as I, II and III. Both classifications are complementary as class I comprises gamma- and epsilonretroviruses; class II includes lentiviruses, delta-, alpha- and betaretroviruses; and class III groups spumaretroviruses with ERV-L retroelements. The ancient history of the <it>Retroviridae </it>is yet to be thoroughly investigated, but due to their similarity and identical gag-pol (and env) genome structure, it is usually assumed that they evolve from the <it>Ty3/Gypsy </it>LTR retroelements of plants, fungi and animals <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. The traditional view suggested by pol polyprotein domains such as the RT <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>, RNAse H <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, and INT <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B16">16</abbr></abbrgrp> used to resolve the phylogeny, delineates a common <it>Ty3/Gypsy </it>origin for all vertebrate retroviruses. Nevertheless little is known about this scenario because RT, RNAse H and INT analyses appear unable of agreeing on a precise well-supported <it>Ty3/Gypsy </it>root for the <it>Retroviridae</it>. In an attempt to bring light on this topic, we investigated 120 non-redundant <it>Ty3/Gypsy </it>and <it>Retroviridae </it>taxa based on the phylogenetic analysis of both gag and pol polyproteins. Our results revealed conflicting phylogenetic signals between these 2 polyproteins. From that point, we aimed to investigate more in depth this evidence through comparative analyses performed based on 3 independent proteins rarely considered by prior studies due to their variability &#8211; the gag polyprotein, the PR and the GPY/F module. Our study reveals taxonomic differences among the 3 <it>Retroviridae </it>classes, and an evolutionary network that distantly relates each class with one or more <it>Ty3/Gypsy </it>lineages. This observation appears to be at odds with the traditional monophyletic view suggested by prior approaches to determining the origin of vertebrate retroviruses, but requires further study. In light of this new perspective, we introduce here a new hypothesis for debate and further evaluation. Our hypothesis argues that classes I, II and III probably trace back to at least 3 independent <it>Ty3/Gypsy </it>ancestors. We call this the <it>three kings hypothesis</it>.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Consistency of lineages but conflicting phylogenetic signals between gag and pol polyproteins in the <it>Ty3/Gypsy </it>and <it>Retroviridae </it>evolutionary history</p>
            </st>
            <p>In a prior study <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, we used the inferred phylogenetic reconstruction of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements based on both gag and polpolyproteins as the criterion to create phylogenetically informative HMM profiles <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Figure <figr fid="F1">1A</figr> shows a radial version of this tree, which clearly supports the usually accepted monophyly of the <it>Ty3/Gypsy </it>and <it>Retroviridae </it>groups and all their assumed lineages (clades, genera and classes) <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. This view of the origin of <it>Retroviridae </it>indicates that these retroviruses had a common origin, e.g. a <it>Ty3/Gypsy </it>LTR retrotransposon (for more information in this topic, see <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and references therein). Interestingly, inferred gag-pol tree suggests a putative <it>Retroviridae </it>root in the <it>Ty3/Gypsy </it>evolutionary history, which according to this new analysis, is close to <it>Micropia/Mdg3 </it>clade <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> and other <it>Ty3/Gypsy </it>lineages described in bilateria genomes. This perspective suggests that the first <it>Retroviridae </it>ancestor emerged before or during the split between protostomes and deuterostomes together with several <it>Ty3/Gypsy </it>lineages, which apparently have distant counterparts (<it>Athila </it>and <it>Tat </it>clades <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>) in the genomes of plants. Taking into account that the <it>Retroviridae </it>are true viruses capable of escaping their hosts, this scenario might also be traced back to an ancient horizontal transference from protostomes to vertebrates and the colonization of the vertebrate genomes by these viral agents from that point on. However these two alternatives, whilst equally exciting perspectives, should be re-evaluated based on the separate analysis of gag and pol polyproteins. The phylogenetic analysis of the pol polyprotein (Figure <figr fid="F1">1B</figr>) is consistent with gag-pol tree, due to the grouping of the taxa into clusters. In fact, the bootstrap robustness of the different clades and genera reported by gag-pol tree comes from the strong pol phylogenetic signal. This means that the pol signal is the essential analytical substrate responsible for the current view on the evolutionary history and taxonomy of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements. However, the pol signal does not support the <it>Retroviridae </it>root suggested by the gag-pol tree, and does not reveal a well-supported alternative link between the <it>Ty3/Gypsy </it>and <it>Retroviridae </it>groups. Pol tree is consistent with gag-pol tree in to delineate a scenario of emergence for vertebrate retroviruses preceding the protostomes-deuterostomes split. However, the root suggested by pol tree falls close to errantiviruses the canonical <it>Ty3/Gypsy </it>retroviruses of flies <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. Indirectly, this indicates that whatever the relationship between <it>Micropia/Mgd3 </it>clade and the <it>Retroviridae</it>, the relationship depends on the gag polyprotein. Consistent with this, the independent phylogenetic analysis of the gag polyprotein (Figure <figr fid="F1">1C</figr>) groups the <it>Retroviridae </it>class II with <it>Micropia/Mdg3 </it>clade and other <it>Ty3/Gypsy </it>lineages described in bilateria genomes. The gag phylogeny also reveals how the <it>Ty3/Gypsy </it>origin of vertebrate retroviruses is anything but straightforward. This tree also clusters gammaretroviruses (class I) with the <it>Athila/Tat </it>clades of plants, and suggests proximity between the <it>Retroviridae </it>class III and errantiviruses, and other <it>Ty3/Gypsy </it>lineages. In other words, the gag signal fails to support the monophyly of the two <it>Ty3/Gypsy </it>or <it>Retroviridae </it>groups and suggests an alternative scenario. That is, based on gag and depending on the class, it follows that the <it>Retroviridae </it>code for different gags, each having one or more distant counterparts among <it>Ty3/Gypsy </it>LTR retroelements.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Phylogenetic analyses.</p>
               </caption>
               <text>
                  <p><b>Phylogenetic analyses</b>. A)<it>Ty3/Gypsy </it>and <it>Retroviridae </it>phylogeny inferred based on the concatenated analysis of both gag and pol polyproteins. This tree is robust as gag and pol signals complement and correct each other. It also supports with significant bootstrap values the 2 groups of LTR retroelements and all their accepted lineages (clades, genera and classes). An extended version of this tree facilitating names, lineages, hosts, and Genbank accessions of all retroelement taxa used is provided as the Additional file <supplr sid="S1">1</supplr> accompanying this paper (see the Section "Sequences and databases" in Methods). Decomposition of gag-pol tree and analysis of its two components separately, reveals similar phylogenetic signal but conflicting evolutionary perspectives. B) The phylogenetic signal of the pol polyprotein is robust and therefore responsible for the current known taxonomy and classification of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements into lineages. C) The gag signal supports the clades, genera and classes described in each group, but does not supports the 2 groups. Gag tree outlines an alternative scenario that may relate each <it>Retroviridae </it>class with one or more <it>Ty3/Gypsy </it>lineages.</p>
               </text>
               <graphic file="1471-2148-8-276-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Retroviridae differentiation into classes outlines phenotypic differences in the gag polyprotein that distantly relate each class with one or more <it>Ty3/Gypsy </it>lineages</p>
            </st>
            <p>Phylogenetic analyses performed based on gag are rarely reported, due to the fast rate of evolution of this polyprotein. However, the alignment from which we inferred the gag tree was manually constructed and its accuracy tested by comparative analyses. We contrasted all gag sequences with each other using the NCBI BLAST search <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> available at GyDB. Comparisons revealed that gag sequences belonging to a <it>Ty3/Gypsy or Retroviridae </it>clade, genus or class are usually more similar to their lineage counterparts than to other gag sequences (data not shown). This analysis also revealed a core of similarity that is common to all Ty3/Gypsy and Retroviridae gags. This core spans the CA-NC region and its most conserved traits appear to be the MHR at CA <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, and the zinc finger Cys-X2-Cys-X4-His-X4-Cys (CCHC) array at NC <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Evaluation of this core shows that the <it>Retroviridae </it>code for 3 different types of gag, each exhibiting a particular amino acidic architecture phenotype that depends on the class differentiation. While the 2 <it>Retroviridae </it>classes I and II appear to be related according to BLAST analyses (data not shown), they present greater divergence based on several phenotypic features preserved depending on the class (Figure <figr fid="F2">2A</figr> and <figr fid="F2">2B</figr>). Class III is extremely dissimilar to classes I and II based on gag, but preserves several features at the C-terminus that might be distantly related or equivalent to those of class I (Figure <figr fid="F2">2A</figr> and <figr fid="F2">2C</figr>). The most prominent, but obviously not unique, difference between the 3 classes is the variability in the number of CCHC arrays at NC. Class I NCs usually show one CCHC array, class II NCs exhibit two, and class III gags have no CCHC arrays at their C-terminus. BLAST analyses also revealed how the <it>Ty3/Gypsy </it>lineages related to classes I and II by gag tree, display greater similarity to different <it>Retroviridae </it>taxa belonging to these 2 classes than to other <it>Ty3/Gypsy </it>lineages. As an example, Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr> summarize the top similarity hits obtained from 4 comparisons conducted using 2 Micropia/Mdg3 and 2 Tatgag sequences as queries. All BLAST analyses were supported by additional sequence comparisons between the different gag queries and the collection of HMM profiles, available at GyDB via the HMM server (data not show). Additionally, we provide qualitative evidence of this relationship through alignment comparisons. Figure <figr fid="F3">3</figr> shows a multiple alignment revealing domain similarity between gammaretroviruses (i.e. class I) and the <it>Athila </it>and <it>Tat </it>clades of plants. Figure <figr fid="F4">4</figr>A demonstrates that <it>Micropia/Mdg3 </it>clade and other bilateria <it>Ty3/Gypsy </it>lineages, such as the <it>Mag </it>clade, code for gags following similar CA-NC architecture to class II lentiviral gags. Gag relationship similarities between class III and other <it>Ty3/Gypsy </it>or <it>Retroviridae </it>lineages are not supported by BLAST analyses. However, Figure <figr fid="F4">4B</figr> shows a multiple alignment between spumaretroviruses and errantiviruses, which according to the qualitative domain similarity merits further attention.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Hits of BLASTp similarity between Micropia/Mdg3 and other Ty3/Gypsy and Retroviridae gags</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c cspan="3" ca="center">
                        <p>Query: Micropia gag</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Query: Mdg3 gag</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Element</p>
                     </c>
                     <c ca="left">
                        <p>Score</p>
                     </c>
                     <c ca="left">
                        <p>E-value</p>
                     </c>
                     <c ca="left">
                        <p>Element</p>
                     </c>
                     <c ca="left">
                        <p>Score</p>
                     </c>
                     <c ca="left">
                        <p>E-value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*EIAV</p>
                     </c>
                     <c ca="left">
                        <p>51.2</p>
                     </c>
                     <c ca="left">
                        <p>1e-08</p>
                     </c>
                     <c ca="left">
                        <p>*HIV-2</p>
                     </c>
                     <c ca="left">
                        <p>45.4</p>
                     </c>
                     <c ca="left">
                        <p>9e-07</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*SA-OMVV</p>
                     </c>
                     <c ca="left">
                        <p>43.1</p>
                     </c>
                     <c ca="left">
                        <p>3e-06</p>
                     </c>
                     <c ca="left">
                        <p>*SIVMAC</p>
                     </c>
                     <c ca="left">
                        <p>44.7</p>
                     </c>
                     <c ca="left">
                        <p>2e-06</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Beetle1</p>
                     </c>
                     <c ca="left">
                        <p>42.4</p>
                     </c>
                     <c ca="left">
                        <p>5e-06</p>
                     </c>
                     <c ca="left">
                        <p>*SIVMND</p>
                     </c>
                     <c ca="left">
                        <p>43.9</p>
                     </c>
                     <c ca="left">
                        <p>3e-06</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*HIV-2</p>
                     </c>
                     <c ca="left">
                        <p>42.0</p>
                     </c>
                     <c ca="left">
                        <p>7e-06</p>
                     </c>
                     <c ca="left">
                        <p>*HIV-1</p>
                     </c>
                     <c ca="left">
                        <p>42.4</p>
                     </c>
                     <c ca="left">
                        <p>8e-06</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Pyggy</p>
                     </c>
                     <c ca="left">
                        <p>40.0</p>
                     </c>
                     <c ca="left">
                        <p>3e-05</p>
                     </c>
                     <c ca="left">
                        <p>*HTLV-2</p>
                     </c>
                     <c ca="left">
                        <p>38.9</p>
                     </c>
                     <c ca="left">
                        <p>9e-05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*FIV</p>
                     </c>
                     <c ca="left">
                        <p>40.0</p>
                     </c>
                     <c ca="left">
                        <p>3e-05</p>
                     </c>
                     <c ca="left">
                        <p>*STcLV2PP1664</p>
                     </c>
                     <c ca="left">
                        <p>38.1</p>
                     </c>
                     <c ca="left">
                        <p>1e-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Real</p>
                     </c>
                     <c ca="left">
                        <p>38.5</p>
                     </c>
                     <c ca="left">
                        <p>7e-05</p>
                     </c>
                     <c ca="left">
                        <p>Legolas</p>
                     </c>
                     <c ca="left">
                        <p>37.0</p>
                     </c>
                     <c ca="left">
                        <p>3e-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Skippy</p>
                     </c>
                     <c ca="left">
                        <p>38.1</p>
                     </c>
                     <c ca="left">
                        <p>1e-04</p>
                     </c>
                     <c ca="left">
                        <p>*EIAV</p>
                     </c>
                     <c ca="left">
                        <p>36.6</p>
                     </c>
                     <c ca="left">
                        <p>4e-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*CAEV</p>
                     </c>
                     <c ca="left">
                        <p>38.1</p>
                     </c>
                     <c ca="left">
                        <p>1e-04</p>
                     </c>
                     <c ca="left">
                        <p>*FIV</p>
                     </c>
                     <c ca="left">
                        <p>36.2</p>
                     </c>
                     <c ca="left">
                        <p>6e-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*SIVMAC</p>
                     </c>
                     <c ca="left">
                        <p>37.7</p>
                     </c>
                     <c ca="left">
                        <p>1e-04</p>
                     </c>
                     <c ca="left">
                        <p>*BIV</p>
                     </c>
                     <c ca="left">
                        <p>35.8</p>
                     </c>
                     <c ca="left">
                        <p>7e-04</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SURL</p>
                     </c>
                     <c ca="left">
                        <p>37.4</p>
                     </c>
                     <c ca="left">
                        <p>2e-04</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cer4</p>
                     </c>
                     <c ca="left">
                        <p>37.4</p>
                     </c>
                     <c ca="left">
                        <p>2e-04</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*RCHO-K1</p>
                     </c>
                     <c ca="left">
                        <p>36.2</p>
                     </c>
                     <c ca="left">
                        <p>4e-04</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>We only summarize the most significant (top) hits of similarity obtained with each search. Retroviridae gags belonging to class II are indicated with asterisks</p>
               </tblfn>
            </tbl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Hits of BLASTp similarity between Tat and other Ty3/Gypsy and Retroviridae gags</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c cspan="3" ca="center">
                        <p>Query: Retrosor1 gag</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Query: Tat4-1 gag</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Element</p>
                     </c>
                     <c ca="left">
                        <p>Score</p>
                     </c>
                     <c ca="left">
                        <p>E-value</p>
                     </c>
                     <c ca="left">
                        <p>Element</p>
                     </c>
                     <c ca="left">
                        <p>Score</p>
                     </c>
                     <c ca="left">
                        <p>E-value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Diaspora</p>
                     </c>
                     <c ca="left">
                        <p>50.4</p>
                     </c>
                     <c ca="left">
                        <p>5e-08</p>
                     </c>
                     <c ca="left">
                        <p>*KoRV</p>
                     </c>
                     <c ca="left">
                        <p>34.3</p>
                     </c>
                     <c ca="left">
                        <p>0.003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Calypso5-1</p>
                     </c>
                     <c ca="left">
                        <p>42.4</p>
                     </c>
                     <c ca="left">
                        <p>1e-05</p>
                     </c>
                     <c ca="left">
                        <p>*GALV</p>
                     </c>
                     <c ca="left">
                        <p>33.5</p>
                     </c>
                     <c ca="left">
                        <p>0.005</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ulysses</p>
                     </c>
                     <c ca="left">
                        <p>40.0</p>
                     </c>
                     <c ca="left">
                        <p>6e-05</p>
                     </c>
                     <c ca="left">
                        <p>*HERV-K10</p>
                     </c>
                     <c ca="left">
                        <p>32.0</p>
                     </c>
                     <c ca="left">
                        <p>0.015</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*GALV</p>
                     </c>
                     <c ca="left">
                        <p>39.7</p>
                     </c>
                     <c ca="left">
                        <p>8e-05</p>
                     </c>
                     <c ca="left">
                        <p>*PERV-MSL</p>
                     </c>
                     <c ca="left">
                        <p>30.8</p>
                     </c>
                     <c ca="left">
                        <p>0.033</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*KoRV</p>
                     </c>
                     <c ca="left">
                        <p>39.3</p>
                     </c>
                     <c ca="left">
                        <p>1e-04</p>
                     </c>
                     <c ca="left">
                        <p>*SRV-1</p>
                     </c>
                     <c ca="left">
                        <p>29.6</p>
                     </c>
                     <c ca="left">
                        <p>0.074</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*MdEV</p>
                     </c>
                     <c ca="left">
                        <p>38.1</p>
                     </c>
                     <c ca="left">
                        <p>2e-04</p>
                     </c>
                     <c ca="left">
                        <p>*MPMV</p>
                     </c>
                     <c ca="left">
                        <p>29.6</p>
                     </c>
                     <c ca="left">
                        <p>0.074</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*PERV-MSL</p>
                     </c>
                     <c ca="left">
                        <p>37.7</p>
                     </c>
                     <c ca="left">
                        <p>3e-04</p>
                     </c>
                     <c ca="left">
                        <p>*MuLV</p>
                     </c>
                     <c ca="left">
                        <p>29.6</p>
                     </c>
                     <c ca="left">
                        <p>0.074</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cer3</p>
                     </c>
                     <c ca="left">
                        <p>36.2</p>
                     </c>
                     <c ca="left">
                        <p>0.001</p>
                     </c>
                     <c ca="left">
                        <p>*SERV</p>
                     </c>
                     <c ca="left">
                        <p>29.3</p>
                     </c>
                     <c ca="left">
                        <p>0.097</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cyclops-2</p>
                     </c>
                     <c ca="left">
                        <p>36.2</p>
                     </c>
                     <c ca="left">
                        <p>0.001</p>
                     </c>
                     <c ca="left">
                        <p>*JSRV</p>
                     </c>
                     <c ca="left">
                        <p>28.9</p>
                     </c>
                     <c ca="left">
                        <p>0.13</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*MuLV</p>
                     </c>
                     <c ca="left">
                        <p>34.3</p>
                     </c>
                     <c ca="left">
                        <p>0.003</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Sushi-ichi</p>
                     </c>
                     <c ca="left">
                        <p>32.3</p>
                     </c>
                     <c ca="left">
                        <p>0.013</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>*BAEVM</p>
                     </c>
                     <c ca="left">
                        <p>28.9</p>
                     </c>
                     <c ca="left">
                        <p>0.15</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>We only summarize the most significant (top) hits of similarity obtained with each search. Retroviridae gags belonging to class I are indicated with asterisks.</p>
               </tblfn>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Phenotypic capsid-nucleocapsid differences of the gag polyprotein based on the three classes.</p>
               </caption>
               <text>
                  <p><b>Phenotypic capsid-nucleocapsid differences of the gag polyprotein based on the three classes</b>. <it>Retroviridae </it>differentiation into 3 previously proposed classes suggests how vertebrate retroviruses code for 3 different gag polyproteins, based on the CA-NC region. A) Sequence logo describing the CA-NC region coded by all gamma- and epsilonretroviruses (class I) used in this study. Class I gag exhibits several features (underlined in the Figure) the presence of a single CCHC array at NC being the most prominent. B) Sequence logo describing the class II CA-NC region was built on an alignment including lentiviral (HIV-1, HIV-2, SIVMAC, VMV, SA-OMVV and CAEV), betaretroviral (MPMV, SERV and SRV-1), alpharetroviral (LPDV and RSV), and deltaretroviral (HTLV-1, HTLV-2 and BLV) sequences. Class II gag amino acidic architecture is similar but displays important differences from that of class I. Note, for instance, how the C-terminus of class II gag is based on a trait we call "NAN-C-C-KA-P" followed by 2 CCHC arrays at NC. C) Sequence logo constructed based on all class III gags used. Class III gag has a CA trait extremely dissimilar from those of classes I and II. On the other hand, class III NC equivalent trait is rich on residues having similar physiochemical properties to those displayed in class I, but have no CCHC arrays.</p>
               </text>
               <graphic file="1471-2148-8-276-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Gag comparison between class I and <it>Athila/Tat </it>LTR retroelements of plants.</p>
               </caption>
               <text>
                  <p><b>Gag comparison between class I and <it>Athila/Tat </it>LTR retroelements of plants</b>. The <it>Retroviridae </it>differentiation into the 3 classes reveals how based on the CA-NC region, class I gammaretroviruses and <it>Athila/Tat </it>LTR retroelements of plants are more similar than previously supposed. Among others features in common (underlined and named following the nomenclature of Figure 2), both Athila/Tat and class I gags are characterized by the presence of a single CCHC array at NC. Note, however, how Tat NCs exhibit a CHHC motif substituting the canonical CCHC array.</p>
               </text>
               <graphic file="1471-2148-8-276-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Gag comparisons between classes II and III and bilateria <it>Ty3/Gypsy </it>LTR retroelements.</p>
               </caption>
               <text>
                  <p><b>Gag comparisons between classes II and III and bilateria <it>Ty3/Gypsy </it>LTR retroelements</b>. Based on the CA-NC region, classes II and III have counterparts among the <it>Ty3/Gypsy </it>LTR retroelements described in bilateria organisms. A) Multiple alignments of <it>Micropia/Mdg3 </it>and <it>Mag </it>clades and class II lentiviruses (the phenotypic features in common following the nomenclature of Figure 2 are underlined and named). Note the particular NC similarity based on the common presence of two CCHC arrays plus an additional trait displaying the trace of a NAN-C-C motif. B) Multiple alignment showing domain similarity between spumaretroviruses (class III) and several <it>Ty3/Gypsy </it>errantiviruses based on gag. Neither errantiviral nor spumaretroviral gags have CCHC arrays at their C-terminus.</p>
               </text>
               <graphic file="1471-2148-8-276-4"/>
            </fig>
            <p>Comparative analyses confirm phenotypic features in the gag polyprotein that distantly relate each <it>Retroviridae </it>class with one or more of the <it>Ty3/Gypsy </it>lineages evaluated. The similarity spans the CA-NC core and the most prominent feature in common is the variability in the number of CCHC arrays per NC. With very few exceptions, the <it>Athila/Tat </it>elements of plants usually code for NCs exhibiting one CCHC array, <it>Micropia/Mdg3 </it>and <it>Mag </it>elements code for NCs usually exhibiting 2 arrays (except <it>Mag </it>elements of <it>C.elegans</it>), and errantiviral gags have not CCHC arrays at their C-terminus. This indicates that the number of CCHC arrays per NC is evolutionarily preserved depending on the <it>Ty3/Gypsy </it>lineage and the <it>Retroviridae </it>class, and that this phenotype is an excellent indicator of taxonomy and evolution. For simplicity's sake, we do not discuss all <it>Ty3/Gypsy </it>cases. We discuss but one example, the most interesting instance of using this indicator &#8211; the chromodomain-containing <it>Ty3/Gypsy </it>LTR retrotransposons <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> called chromoviruses <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Chromoviruses are the most ancient branch of <it>Ty3/Gypsy </it>LTR retroelements as they have been described in the genomes of plants, fungi and vertebrates (for a more extensive information about chromoviruses, see <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>). It noteworthy that all <it>Ty3/Gypsy </it>LTR retroelements of plants can be divided in 2 major branches &#8211; chromoviruses and <it>Athila/Tat </it>&#8211; and that chromoviruses appear to be the only branch of <it>Ty3/Gypsy </it>LTR retroelements capable of colonizing the genomes of fungi. A prior study <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> reported that this branch of <it>Ty3/Gypsy </it>LTR retroelements displays similarity (we confirm) to gammaretroviruses based on CA-NC. However, we have also found how that chromoviruses show similarities to class II in addition to a number of <it>Ty3/Gypsy </it>lineages (for this reason chromoviruses fall at an intermediate position in the gag phylogeny). With rare exceptions, NCs coded by chromoviruses usually bear one CCHC array (data not shown). In contrast, the different <it>Ty3/Gypsy </it>lineages described in bilateria organisms show greater variability in the number of CCHC arrays at NC thantheir <it>Ty3/Gypsy </it>counterparts of plants and fungi (i.e. chromoviruses and the <it>Athila/Tat </it>branch). Gag evidence thus relates class I to the most likely CA-NC phenotype of <it>Ty3/Gypsy </it>ancestors predating the split between plants and the ophistokonts (fungi and animals) and classes II and III with other CA-NC phenotypes, more frequently observed among the <it>Ty3/Gypsy </it>LTR retroelements of protostomes and deuterostomes.</p>
         </sec>
         <sec>
            <st>
               <p>Retroviridae differentiation into classes reveals three protease isoforms based on flap motif polymorphisms, which are common to <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements</p>
            </st>
            <p>Through phylogenetic analyses, we have shown that the pol signal is primarily responsible for the branching of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements in 2 monophyletic groups. That is the usual evolutionary perspective based on the RT and other pol polyprotein domains. We have also shown that gag signal discloses an alternative scenario wherein each <it>Retroviridae </it>class can be related to one or more <it>Ty3/Gypsy </it>lineages. An in-depth examination of gag diversity through comparative analyses has revealed the phenotypic variations involved in this differential similarity. Gag evidence is thus well supported. An interesting question is whether this evidence should be considered a convergence due to the fast rate of evolution of the gag polyprotein, or if it is due to an ancient divergence. Certainly, the most robust components of the pol polyprotein &#8211; the RT, RNAse H and INT &#8211; usually support the traditional perspective originally delineated by RT analyses <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. However, the strong signal from these 3 proteins disguises the particular perspective provided by another pol protein domain &#8211; the PR. Non-redundant studies focusing on Ty3/Gypsy and Retroviridae PRs are rarely reported as this enzyme presents identical analytical difficulty to gag due to its fast rate of evolution. Despite this it is well known that LTR retroelement PRs in general are aspartic peptidases belonging to clan AA (following MEROPS Database classification <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>). Within clan AA, Retroviridae PRs are divided into 2 protein families, retropepsins (family A2) and spumaretropepsins (family A9). Family A2 groups all PRs coded by classes I and II and family A9 collects the PRs coded by spumaretroviruses (class III). Such a classification keeps going because retropepsins and spumaretropepsins are strongly dissimilar each other and do not group on a single branch in any analysis (data not shown). On the other hand, Ty3/Gypsy PRs are extremely variable and little is known about them. MEROPS Database at least classifies many Ty3/Gypsy examples within family A2 because these PRs display great similarity to retropepsins. However, not all Ty3/Gypsy PR are similar to retropepsins as not all Retroviridae PRs are retropepsins. Because no study evaluates the relationships between Ty3/Gypsy and Retroviridae PRs, we investigated this topic, taking into consideration the differentiation of the 2 groups of LTR retroelements into lineages. It is worth remembering that while gag and pol signals are in disagreement over the taxonomical groups, they do support the differentiation into clades, genera and classes of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements.</p>
            <p>Prior research performed using structure-based alignments and structural comparisons based on HIV-1 PR and other retropepsins, have revealed how LTR retroelement PRs dimerize in their active form (for a more extensive review in this topic, see <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and references therein). Each lobe of the PR dimer carries a structural feature called the flap, which is a &#946;-hairpin loop that covers the active site and has 2 flexible alternating forms, closed and semi-open (see Figure <figr fid="F5">5</figr>). We have extensively studied not only Ty3/Gypsy and Retroviridae PRs but also other clan AA PRs (data not shown). Interestingly, the <it>Retroviridae </it>differentiation into classes reveals 3 PR isoforms each preserving a particular flap motif. Class II PRs usually harbor a sequence GIGG amino acid motif (Figure <figr fid="F5">5A</figr>), which at the tertiary structure level constitute the flap in HIV-1 PR and other class II PRs (see <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and references therein). In contrast class I PRs were found to preserve a GATG variant of this motif (Figure <figr fid="F5">5B</figr>), and within class III spumaretroviral PRs preserve a TIHG variant of the same sequence motif (Figure <figr fid="F5">5C</figr>). <it>Ty3/Gypsy </it>LTR retroelements also code for a variety of isoforms, which evolutionarily preserve a particular flap motif state depending on the lineage, in the same manner as classes I, II and III. A number of these states are very similar but not identical to that preserved by class I. Multiple alignment of gammaretroviruses (class I) and several <it>Ty3/Gypsy </it>lineages based on PR is shown in Figure <figr fid="F6">6A</figr>. In its consensus form, this variant delineates a GANG motif recognizable by the predominance of an alanine (or a hydrophobic residue) and an aspartate/asparagine/threonine at the second and third positions of the motif, respectively. The GANG variant is widespread among the PRs coded by <it>Ty3/Gypsy </it>LTR retroelements of plants, fungi and animals. This variant also predominates in the PRs coded by caulimoviruses of plants and <it>Ty1/Copia </it>LTR retroelements, and two datasets of prokaryotic PRs related to clan AA (data not shown). Therefore, GANG variant appears to be the most likely ancestral state of the flap of the PRs coded by <it>Ty3/Gypsy </it>ancestors predating the split between plants and the ophistokonts. Consistent with gag evidence, GIGG and TIHG PR variants exhibited by classes II and III PRs are rarely observed among <it>Ty3/Gypsy </it>LTR retroelements of plants and fungi. In plants, only <it>Tat </it>clade elements code for PRs presenting a poorly preserved flap motif, which might be discretely related to the GIGG variant (data not shown). As <it>Athila </it>clade elements (the sibling of <it>Tat </it>clade in plants) code for GANG PRs, we may assume that the PR flap motif transits from one state to another. Among the <it>Ty3/Gypsy </it>lineages of fungi, only TF1-2 clade code for GIGG PRs, which is a variant more frequently observed among <it>Ty3/Gypsy </it>LTR retroelements of protostomes. In contrast, the GIGG variant carried by the PRs coded by <it>Micropia/Mdg3 </it>clade and other <it>Ty3/Gypsy </it>lineages is almost identical to that of <it>Retroviridae </it>class II (Figure <figr fid="F6">6B</figr>). The TIHG variant is absent from the Ty3/Gypsy PRs of plants. In fungi, only a putative chromoviral lineage called <it>Ty3 </it>clade (see <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and references therein) code for PRs harboring a highly diverged motif that in its consensus form can be distantly related to the TIHG variant (data not shown). In contrast, a number of <it>Ty3/Gypsy </it>errantiviruses code for PRs carrying a TIHG motif identical to that of class III spumaretroviruses (Figure <figr fid="F6">6C</figr>). Finally, investigating other sequences not considered in this study, we also found that <it>Gmr-1 </it>clade <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp> a <it>Ty3/Gypsy </it>lineage recently described in deuterostomes also code for TIHG PRs (data not shown). The PR scenario thus reveals consistency with gag in suggesting that <it>Retroviridae </it>class I is most likely related to the phenotype of <it>Ty3/Gypsy </it>ancestors predating the spilt between plants and the ophistokonts. In contrast, classes II and IIIshould be more properly related to <it>Ty3/Gypsy </it>lineages whose ancestors probably emerged before or during the transition of bilateria organisms into protostomes and deuterostomes.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Retroviridae protease isoforms.</p>
               </caption>
               <text>
                  <p><b>Retroviridae protease isoforms</b>. Retroviridae PRs dimerize in their active form and each lobe of this enzyme usually has a structural flap (the two &#946;-hairpin loops enclosed in a circle covering the catalytic DT/SG dyad). <it>Retroviridae </it>differentiation into the 3 classes reveals 3 different isoforms of the same enzyme, each exhibiting a particular flap motif. A) Sequence logo describing class II PRs, the flap correspondence on sequence in this PR is a GIGG amino acid motif included in a box. B) Sequence logo describing class I PRs; this variant preserves a GATG motif at the same flap sequence position. C) Sequence logo built based on class III PRs revealing a TIHG motif in this position. To improve the visualization on amino acidic architecture, we have used the HFV and SFV-1 sequences (see methods) plus FFV (Genbank accession <ext-link ext-link-type="gen" ext-link-id="CAA70075">CAA70075</ext-link>), FSV (<ext-link ext-link-type="gen" ext-link-id="AAC58531">AAC58531</ext-link>), SFV-3 (<ext-link ext-link-type="gen" ext-link-id="AAA47796">AAA47796</ext-link>), and EFV (<ext-link ext-link-type="gen" ext-link-id="AAF64414">AAF64414</ext-link>), to build the logo.</p>
               </text>
               <graphic file="1471-2148-8-276-5"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Protease comparisons between <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements.</p>
               </caption>
               <text>
                  <p><b>Protease comparisons between <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements</b>. Each Retroviridae PR isoform has one or more distant counterparts found among the variety of PR isoforms coded by <it>Ty3/Gypsy </it>LTR retroelements. A) Multiple alignment of class I and several <it>Ty3/Gypsy </it>lineages. This comparison reveals a similar, but not identical, flap sequence motif that in consensus defines an idealized GANG motif (logo above). B) Multiple alignment showing how the <it>Micropia/Mdg3 </it>clade and other <it>Ty3/Gypsy </it>LTR retroelements code for PRs harboring a GIGG flap variant almost identical to that of class II PRs. C) Multiple alignment between spumaretroviruses and errantiviruses showing how these 2 lineages commonly code for PRs bearing the TIHG variant.</p>
               </text>
               <graphic file="1471-2148-8-276-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Retroviridae class I is a molecular fossil preserving GPY/F module phenotypes that probably were predominant among <it>Ty3/Gypsy </it>ancestors predating the split between plants fungi and animals</p>
            </st>
            <p>As already shown, gag polyprotein and the PR depict a new scenario as an alternative to the traditional monophyletic insight (2 groups of LTR retroelements) suggested by prior RT, RNAse H and INT analyses. Onto understand the two opposing scenarios, we performed phylogenetic analyses based on the RT, RNAse H and INT and found consistency with the traditional perspective of 2 separate LTR retroelement groups using the RT and RNAse H (<abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>). Analysis of the INT revealed different perspectives depending on the NJ or parsimony method used in the analysis (see Methods). While the NJ method supports the 2 LTR retroelement groups, the parsimony method splits the <it>Retroviridae </it>into 2 branches not supported by bootstrap (data not shown). This is because our model of INT alignment covers the 3 subdomains described in the amino acidic architecture of a conventional INT domain. The traditional core used for inferring INT phylogenies is common to all INTs in general, and includes 2 of these sub-domains; the conserved zinc finger "HHCC" binding motif <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> at the N-terminus, and the central sub-domain containing the conserved D-D-E trait <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>. The C-terminal sub-domain of all INTs is usually dismissed from analysis because it is less preserved than the other 2 sub-domains. In Ty3/Gypsy and RetroviridaeINTs, it is definite that this sub-domain is a small trait called GPY/F module, which was probably recruited modularly during evolution <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The module name refers to the strongly preserved GPY/F amino acid motif <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, which will be referred to as the canonical motif throughout the rest of this paper. Indeed, the GPY/F module appears to be responsible of the signal discrepancy in phylogenetic analyses (INT parsimony tree performed without this module is in agreement with the NJ analysis, data not shown). From that point, we investigated the GPY/F module in relation to the 3 <it>Retroviridae </it>classes. The module, seen from this viewpoint, shows a number of protein isoforms based on GPY/F motif polymorphisms. With rare exceptions, the modules of class I INTs usually preserve the canonical motif, while the modules of classes II and III exhibit other variants (Figure <figr fid="F7">7A</figr>). Here, classes II and III do not make an intrinsic phenotypic distinction, each genus exhibiting a particular variant of the motif within these 2 classes. The modules coded by <it>Ty3/Gypsy </it>LTR retroelements delineate similar perspective. As shown in Figure <figr fid="F7">7B</figr>, while the canonical motif is practically predominant in the modules of <it>Ty3/Gypsy </it>elements of plants and fungi, the modules of bilateria <it>Ty3/Gypsy </it>LTR retroelements are rich in motif polymorphisms (canonical motif included). This indicates that <it>Ty3/Gypsy </it>LTR retroelements described in bilateria organisms exhibit greater GPY/F motif variability than their <it>Ty3/Gypsy </it>counterparts of plants and fungi, and strongly suggests a number of transitions from the canonical motif toward other states during evolution. This scenario is not completely consistent with gag and PR perspectives; for instance, while Micropia/Mdg3 modules preserve the canonical motif, the different <it>Retroviridae </it>genera belonging to class II exhibit different motif polymorphisms. Nevertheless, the GPY/F module relates the <it>Retroviridae </it>class I with <it>Ty3/Gypsy </it>LTR retroelements of plants and fungi through the common preservation of the canonical motif, while classes II and III can be related with bilateria <it>Ty3/Gypsy </it>LTR retroelements by an increase of the motif variability. In fact, the whole module of class I INTs appears to be more similar to those preserved by the INTs of chromoviruses (Figure <figr fid="F8">8A</figr>) and <it>Athila </it>and <it>Tat </it>clades (Figure <figr fid="F8">8B</figr>) than to those of classes II and III. Alignment between classes I and II reveals a dramatic loss of sequence information by class II during evolution (Figure <figr fid="F8">8C</figr>). The module carried by spumaretroviral INTs is similar to that of class I, but they greatly differ in the motif (Figure <figr fid="F8">8D</figr>). That is, spumaretroviral modules lost the GPY/F motif, substituting it with a highly diverged KT/SP motif. Again, this outlines an intriguing parallelism between spumaretroviruses and <it>Ty3/Gypsy </it>errantiviruses because the modules of these 2 LTR retroelement lineages are qualitatively similar (Figure <figr fid="F8">8D</figr>). Moreover, <it>Ty3/Gypsy </it>errantiviruses also lost their GPY/F motif during evolution. Therefore, whatever the INT function involving the GPY/F module coded by the <it>Retroviridae </it>class I, this class appears to be a molecular fossil preserving GPY/F module phenotypes that were predominant among <it>Ty3/Gypsy </it>ancestors, predating the split between plants fungi and animals. In contrast, <it>Retroviridae </it>classes II and III maintain a number of module isoforms more recently emerged during evolution.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>GPY/F motif transitions.</p>
               </caption>
               <text>
                  <p><b>GPY/F motif transitions</b>. The amino acid motif that gives its name to the GPY/F module at INT of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements is polymorphic. While the modules of <it>Retroviridae </it>class I and <it>Ty3/Gypsy </it>elements of plants and fungi, usually preserve the canonical GPY/F motif, classes II and III, and bilateria <it>Ty3/Gypsy </it>LTR retroelements display a number of module isoforms based on that motif.</p>
               </text>
               <graphic file="1471-2148-8-276-7"/>
            </fig>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>GPY/F module comparisons.</p>
               </caption>
               <text>
                  <p><b>GPY/F module comparisons</b>. Based on the GPY/F module, <it>Retroviridae </it>class I appears to be more similar to <it>Ty3/Gypsy </it>LTR retroelements of plants and fungi than other <it>Retroviridae </it>classes. A) Multiple alignment of gammaretroviruses (class I) and chromoviruses. B) Between gammaretroviruses and <it>Athila/Tat </it>elements. C) Between gammaretroviruses and <it>Retroviridae </it>class II, this alignment reveals important differences between the two classes, as well as the evolutionary transition of the canonical GPY/F motif towards other motif states. D) Based on the GPY/F module, spumaretroviruses (class III) is similar to class I, but also to <it>Ty3/Gypsy </it>errantiviruses. In fact, both spumaretroviral and errantiviral modules lost the canonical motif during evolution.</p>
               </text>
               <graphic file="1471-2148-8-276-8"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Retroviridae differentiation into the 3 classes I, II and III unravels phenotypic aspects of vertebrate retroviruses, which are probably related with their ancient <it>Ty3/Gypsy </it>origins</p>
            </st>
            <p>Phylogenetic analysis inferred based on all concatenated gag and pol products coded by <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements shows the robustness of their phylogenetic signal regarding the clustering of OTUs <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. We used the parsimony method to infer this phylogeny, but the clustering of OTUs is independent of the method of phylogenetic reconstruction used (see Methods). The gag-pol analysis also divides <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements into 2 separate branches, as suggested by original approaches in this topic <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B41">41</abbr></abbrgrp>. We do not disagree this classification for 2 reasons; first, the strong phylogenetic signal of RT, RNAseH, and INT cannot be dismissed;and second, the <it>Retroviridae </it>(except gammaretroviruses) can be distinguished from <it>Ty3/Gypsy </it>LTR retroelements by features such as the presence of accessory genes. Nevertheless, thecurrent <it>Ty3/Gypsy </it>and <it>Retroviridae </it>classification only exposes the modern evolutionary history of these 2 groups of retroelements (we have shown how their ancient history is not straightforward). Due to the wide distribution of <it>Ty3/Gypsy </it>elements in eukaryotes, the usual means of transference of a canonical <it>Ty3/Gypsy </it>LTR retrotransposon is probably vertical. However, the viral nature of a true <it>Ty3/Gypsy </it>or <it>Retroviridae </it>exogenous retrovirus resides in its capability of horizontal transference from one host to another via infection. Moreover, the incidence of mechanisms such as gene recruitment, genome rearrangement, recombination and chimerism in LTR retroelement evolution, presents difficulties in identifying the true natural history of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements. This suggests that the most realistic (not yet proposed) model for describing <it>Ty3/Gypsy </it>and <it>Retroviridae </it>evolution alternates gradual and modular evolution, and combines vertical and horizontal means of transference.</p>
            <p>The traditional argument supporting the <it>Ty3/Gypsy </it>origins of vertebrate retroviruses is shown by their similarity in sequence and genome structure <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. The question is, however, what genetic material is more informative for exploring the relationships between these two (and other) groupsof LTR retroelements, highl y variable traits such as gag and PR or strongly preserved substrates such as the RT, RNAseH and INT? Certainly, RT, RNAseH and INT are an excellent means of classifying <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements into lineages. However, phylogenetic analyses based on RT, RNAseH and INT are not exact enough to resolve the ancient evolutionary history of these 2 groups. This is because the inferred phylogeny based on these proteins does not necessarily coincide with the true natural history of the full-length retroelement genome. Here, the advantage of using the gag-pol alignment to infer the phylogeny is the increase in statistical power of the analysis, allowing the opportunity to correct the single gene tree discrepancies. This analytical strategy is useful but has limitations for which solutions remain elusive; the inferred tree can accumulate systematic errors due to the use of concatenated information. We have shown how gag-pol tree suggests a <it>Ty3/Gypsy </it>root in the origins of vertebrate retroviruses that is close to the <it>Micropia/Mdg3 </it>clade. However, evaluation of gag and pol polyproteins separately yields discordant information. Here, while pol phylogeny supports the traditional perspective (2 retroelement groups), gag phylogeny describes a new scenario that appears to be informative with respect to the ancient patterns of diversity of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements. Certainly, the phylogenetic signal of the gag polyprotein has several limitations due to its fast evolution. To overcome these limitations we investigated other protein domains and used different methodologies to evaluate the significance of the new scenario. The most important feature here is that, for first time in the scientific literature, we have carried out a non-redundant study of three independent proteins that have rarely been attempted before because their difficulty.</p>
            <p>Our investigation conclusively reveals that the taxonomical differentiation into the 3 <it>Retroviridae </it>classes I, II and III discloses 3 different gag and PR products, and that each product has one or more distant <it>Ty3/Gypsy </it>counterparts. The analysis of the GPY/F module reveals partial consistency and how the similarity of class I to <it>Ty3/Gypsy </it>LTR retroelements of plant and fungi, is significant. Our results thus support an ancient scenario of polyphyly involving the 3 <it>Retroviridae </it>classes and different <it>Ty3/Gypsy </it>lineages. Here, we stress that the identification of the <it>Retroviridae </it>classes is not a conclusion but an assumption based on previous studies <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Notwithstanding, we cannot argue for the existence of a direct ancestor between each class and any particular <it>Ty3/Gypsy </it>lineage. Classes I and II are sufficiently similar to corroborate their accepted evolutionary relationship, and it can also be assumed that <it>Ty3/Gypsy </it>and <it>Retroviridae </it>phylogeny is incomplete (sequencing projects are continuously disclosing new lineages). Despite this, the similarity of each class by simple convergence to different <it>Ty3/Gypsy </it>lineages based on 3 independent protein products is an implausible parsimonious explanation. Moreover, while class III spumaretroviruses are dissimilar to classes I and II, our results reveal that they in turn display an intriguing domain similarity to errantiviruses that ought to be followed up. Hence we think that the class differentiation probably unravels certain aspects of vertebrate retroviruses related to their ancient <it>Ty3/Gypsy </it>origins. Instead of a single root to this new scenario, we show how an ancient evolutionary network between the 2 groups can exist, with its most interesting aspect being its polyphyly. (The <it>Ty3/Gypsy </it>lineages related to each class does not constitute a monophyletic branch in any phylogeny). Therefore, our approach strongly suggest that class I is a molecular fossil that emerged quite soon in <it>Ty3/Gypsy </it>evolution, while classes II and III emerged later, together with the ancestors of <it>Ty3/Gypsy </it>LTR retroelements described in protostomes.</p>
         </sec>
         <sec>
            <st>
               <p>Introducing the Three Kings Hypothesis: A new principle for debate and further evaluation about the subject of the <it>Ty3/Gypsy </it>origins of vertebrate retroviruses</p>
            </st>
            <p>The evolutionary network identified by classes I, II, III is inconsistent with the idea of a unique <it>Retroviridae </it>ancestor. It follows that various scenarios may either support or disprove such a network. Assuming this network exists, the most likely scenario relates <it>Ty3/Gypsy </it>elements of plants and fungi with the <it>Retroviridae </it>class I. This scenario assumes the existence of a distant evolutionary relationship between the lineages or an ancient horizontal transfer of chromoviruses from fungi (or plants) to vertebrates. Indeed, chromoviruses are the most ancient lineage of <it>Ty3/Gypsy </it>LTR retrotransposons. They are rich in genetic variability, and are also present in the genome of many vertebrates <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. In both cases, the most likely explanation for the relationship between class I and <it>Athila/Tat </it>retroviruses and retrotransposons of plants is that chromoviruses and class I are related, an argument suggested by a previous study <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Nevertheless, chromoviruses of vertebrate organisms are usually more similar to their chromoviral counterparts of fungi than to those of plants. Therefore the chromoviral scenario does not explain why class I and <it>Athila/Tat </it>elements of plants are similar each other based on gag. On the other hand, chromoviruses have not yet been described in protostomes, echinoderms and urochordates; furthermore it remains unclear whether chromoviruses were inexorably driven to extinction in these organisms or were horizontally transmitted from plants/fungi to vertebrates. Consequently, the chromoviral scenario does not clarify why classes II and III and the <it>Ty3/Gypsy </it>lineages of protostomes share sequence similarities and phenotypic features rarely found among the <it>Ty3/Gypsy </it>lineages of plants and fungi. With this in mind, a new theoretical principle is posited here for debate and further research. The simplest hypothesis is that classes I, II and III probably evolved from at least 3 <it>Ty3/Gypsy </it>ancestors and emerged at different evolutionary times prior to the split between protostomes and deuterostomes (<it>the three kings hypothesis</it>). Several points involved in the background of this hypothesis should be emphasized. First, we include the words "at least" to acknowledge the three classes but do not dismiss the possibility of more <it>Ty3/Gypsy </it>ancestors in the evolutionary history of the <it>Retroviridae</it>. Second, "different times of emergence" suggests, but does not necessarily mean, independent origins. Class II may in fact be directly related to class I, but the emergence of class II seems more recent and in parallel with the emergence of the ancestorsof several <it>Ty3/Gypsy </it>lineages, such as the <it>Micropia/Mdg3 </it>clade (or others). Class III spumaretroviruses delineate identical perspective with <it>Ty3/Gypsy </it>errantiviruses. Third, we use the term "polyphyletic" because the <it>Ty3/Gypsy </it>lineages related to each class do not constitute a monophyletic branch in any phylogeny. Moreover, viral evolution is always a polyphyletic challenge involving ecological parameters such as host populations, environment, vectors, mechanisms of transmissions, etc.</p>
         </sec>
         <sec>
            <st>
               <p>The polyphyletic recurrence of vertebrate retroviruses into the evolutionary performance of <it>Ty3/Gypsy </it>LTR retroelements</p>
            </st>
            <p>We have described how the different gags, PRs and GPY/F modules evaluated show a variability that is preserved, depending on the <it>Ty3/Gypsy </it>lineage and <it>Retroviridae </it>class (or genus). While class I can be related to <it>Ty3/Gypsy </it>elements of plants and fungi, classes II and III preserve phenotypic features typically observed among <it>Ty3/Gypsy </it>elements of protostomes. That is the evolutionary perspective provided by the protein product of 3 independent coding regions. We have discussed this evidence but have not yet interpreted why the diversity and phylogeny of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements are so different regarding the different gag or pol substrates. In general, the action of viruses and mobile genetic elements is important in host evolution <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp> because they are vectors of evolution and potential inducers of diseases and genetic disorders, such as chromosome rearrangements and inversions <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. However, if the action of viruses and mobile genetic elements might somehow influence the host evolution, it is reasonable that host evolution could also constrain the evolution of these genetic agents. We thus speculate with the possibility of selective influences imposed on <it>Retroviridae </it>genes such as the <it>rt, rnase h </it>and <it>int </it>(and other regions) to optimize essential functions, such as retrotranscription and integration (according to the complexity of the new genome environment provided by vertebrate organisms). This probably involves gradual evolution but also a number of molecular mechanisms, such as gene recruitment and recombination to generate variability and new effective genetic combinations. Here, it is important to keep in mind that except gammaretroviruses and other exceptions, the <it>Retroviridae </it>usually incorporate accessory genes, usually needed to adjust diverse aspects of their replication and infectivity (these features appear to be specific of retroviruses infecting vertebrate organisms). On the other hand, a prior study <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> supports a putative chimeric origin of the <it>Retroviridae </it>RNAse H domain and the modular acquisition of the GPY/F module by Ty3/Gypsy and Retroviridae INTs <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Moreover, D-type betaretroviruses probably are viral hybrids between a B-type betaretrovirus and a C-type gammaretrovirus <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B17">17</abbr><abbr bid="B49">49</abbr></abbrgrp>. Finally, a number of studies reveal how recombination is a mechanism frequently embraced by HIV evolution to generate variability. Two studies reveal for instance how recombination of M subtypes, has resulted in the generation of multiple circulating recombinant forms consisting of mosaic HIV-1 lineages <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>.</p>
            <p>Regarding coding regions such as <it>gag, pr </it>and <it>gpy/f </it>module, we think that these traits reveal features and aspects involving different evolutionary strategies, but which are intrinsic and taxonomically related with ancient events of retroelement speciation and divergence. This argument finds an important evolutionary marker in the variability in the number of CCHC arrays at NC and the different PR and GPY/F module isoforms. Indeed, the CCHC array at NC is involved in virion assembly, RNA packaging, reverse transcription and integration processes <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. On the other hand, the flap lies over the PR active site and conveys specificity to the enzyme by carrying important substrate-binding functions (for more information in this topic, see <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr></abbrgrp>). Finally, while the GPY/F module is now under investigation, the C-terminal end of the INT appears to be important in the integration of the retroelement into the host genome <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>. The variability of these three regions probably reveals different evolutionary strategies of speciation and divergence, which can be assumed older than previously supposed, since it does not only occur in the <it>Retroviridae </it>group, but also in all <it>Ty3/Gypsy </it>LTR retroelements of plants, fungi and animals. Here, the <it>three kings hypothesis </it>and its testing (in one sense or another) does not affect the evidence we have presented. That is, class I, II and III taxonomically code for 3 gag, PR and GPY/F products that have one or more distant counterparts among <it>Ty3/Gypsy </it>LTR retroelements. However, the most interesting aspect of the gag-PR-GPY/F variability is that it appears to be constrained by the bio-distribution of <it>Ty3/Gypsy </it>LTR retroelements. In turn, the diversity patterns of the <it>Retroviridae </it>based on these regions appear to be recurrent into the evolutionary performance of <it>Ty3/Gypsy </it>LTR retroelements, the most interesting aspect of which is that they seem polyphyletic. Therefore the evolutionary network between <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements is informative regarding an ancestral history, which is in some respects similar to those models of evolution indistinctly described by population genetics and quasi-species theory (for more details see <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>). This means that further analysis of the evolutionary network we disclose in this study challenges the involvement of different parameters such as bio-distribution, host's populations, environment, vectors and mechanisms of transmissions, etc. With this aim, our hypothesis makes possible a first evaluation of this new scenario we present in a forthcoming manuscript (submitted for publication). In this approach, we use the number of CCHC arrays at NC and the different PR and GPY/F module isoforms as evolutionary markers to trace the network. This is by superimposing not only <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements, but also other LTR retroelement groups over their host bio-distribution.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p><it>Retroviridae </it>classes I, II and III exhibit phenotypic differences that delineate a network never before reported between <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements. This new scenario reveals how the diversity of vertebrate retroviruses is polyphyletically recurrent into the <it>Ty3/Gypsy </it>evolution, i.e. older than previously thought. The simplest hypothesis to explain this finding is that classes I, II and III trace back to at least 3 <it>Ty3/Gypsy </it>ancestors that emerged at different evolutionary times prior to protostomes-deuterostomes divergence. We have called this "the three kings hypothesis" concerning the origin of vertebrate retroviruses.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Sequences and databases</p>
            </st>
            <p>This work is part of the GyDB Project <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> an ongoing database launched with the aim of phylogenetically analyzing and classifying mobile genetic elements based on their diversity and evolutionary profile. In the first iteration, we consider the <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements of eukaryotes. We have investigated 120 non-redundant full-length <it>Ty3/Gypsy </it>and <it>Retroviridae </it>genomes collected from NCBI <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. An extended version of the gag-pol tree evaluated summarizing names, taxonomy, hosts, and Genbank accessions of all retroelement taxa used to perform this analysis, is available online as the Additional file <supplr sid="S1">1</supplr> accompanying this paper. By clicking the name of each OTU in this tree, the user can browse the GyDB and locate a file providing information of the OTU selected, including a link to the Genbank accession of the requested element at NCBI. The gag-pol tree can also be found online in the Section Phylogenies at GyDB <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p><b>Expanding gag-pol phylogeny.</b> Expanded version of gag-pol tree illustrated in Figure <figr fid="F1">1</figr> inferred based on the 120 <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements used in this study. The tree includes information about the names, Genbank accessions and hosts of all LTR retroelement taxa used. By clicking the name of each OTU, the user can locate a file at GyDB providing information of the sequence selected, including a link to its Genbank accession at NCBI.</p>
               </text>
               <file name="1471-2148-8-276-S1.zip">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Multiple alignments and comparative analyses</p>
            </st>
            <p>In general, all <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements have 2 polyproteins in common &#8211; gag and pol. Gag is composed of 3 domains -MA, CA and NC -, pol is usually carrier of 4 domains &#8211; PR, RT, RNAse H and INT. Note however that PR can be coded separately or in frame with gag and other protein domains. We have used and analyzed a gag-pol multiple alignment ~1700 residues in size, constructed based on the concatenation of the CA, NC, PR, RT, RNAseH and INT cores. The gag-pol alignment is freely accessible within the GyDB collection deposited at Biotechvana Bioinformatics <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. The alignment is available in 6 formats at the following URL <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. We have also analyzed the gag and pol polyproteins by separate dividing the gag-pol alignment into 2 independent alignments CA-NC and PR-RT-RNAseH-INT, to perform phylogenetic or comparative analyses.</p>
            <p>Alignments were compared using GENEDOC editor <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> in shaded mode and the following groups of amino acid similarity: [T,S small nucleophile amino acids] [K,R,H basic amino acids], [D,E,N,Q acidic amino acid and relative amides], and [L,I,V,M,A,G,P,F,Y,W hydrophobic amino acids]. Similarities between gag sequences were correlated using different gag queries to the CORES database available via the NCBI BLAST search <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> at GyDB, using BLASTp search mode. BLAST databases available at GyDB are non-redundant, small and include only <it>Ty3/Gypsy </it>and <it>Retroviridae </it>or related sequences, allowing flexible comparisons between both distantly and closely related sequences with homologous known functions.</p>
            <p>Comparative analyses based on sequence logos involved CheckAlign 1.0 <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> in Shannon's algorithm mode <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> and correction factor. Sequence logomethodology was originally introduced by Schneider et al. <abbrgrp><abbr bid="B65">65</abbr><abbr bid="B66">66</abbr></abbrgrp> to display consensus sequences for DNA and protein alignments. Later, Schneider dismissed the term "consensus" <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>, arguing that a logo provides more information than the consensus sequence of a protein or DNA alignment. While this can be controversial because there are many manners to obtain or describe a consensus sequence, logos methodology being one of them, we are in agreement with the proposition of the original author in the use of the term "sequence logo" suggested in his website <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>. We employ the term "sequence logo" to describe the resultant output reported by this analysis, and then refer to the protein information underlying the content shape of the logos constructed, based on our alignments as "amino acidic architecture". This term may be useful to describe with a single word &#8211; consensus, core and amino acid patterns. CheckAlign directly builds the logo from an ungapped alignment using the conventional methodology <abbrgrp><abbr bid="B65">65</abbr><abbr bid="B66">66</abbr></abbrgrp>. Here, the maximum uncertainty by position in a protein alignment is log<sub>2 </sub>20 = 4.3. In the case of gapped alignments, CheckAlign automatically builds the logo, taking the gap as another amino acid species. Here, the tool considers the maximum uncertainty by position to be log<sub>2 </sub>21 = 4.4 for protein alignments (for more details about CheckAlign see <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>).</p>
            <p>The 3D structure of the HIV-1 PR <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> was modeled using SWISS-PDBViewer 3.7 SP5 <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>, and PDB file 1A30 as input. The PDB file was downloaded from RCSB Protein Data Bank <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogenetic analyses</p>
            </st>
            <p>Phylogenetic reconstructions of <it>Ty3/Gypsy </it>and <it>Retroviridae </it>LTR retroelements inferred from gag-pol, pol and gag alignments employed the PHYLIP 3.6 package <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>. We first generated 100 bootstrap replicates of each alignment using SEQBOOT. Second, we used the protein sequence parsimony method of Felsenstein, based on the approaches of Eck and Dayhoff <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> and Fitch <abbrgrp><abbr bid="B74">74</abbr></abbrgrp> to perform the analyses. Here, the bootstrap file was used as an input to PROTPARS and the input randomized using the following parameters, random number seed = 5 and number of times to jumble = 5. Third, CONSENSE was used to obtain a MRC tree <abbrgrp><abbr bid="B75">75</abbr></abbrgrp> using the tree file generated by PROTPARS as an input. As the MRC tree usually consists of all clusters that occur >50% of the time, we took consensus values >55 as a bootstrap reference. Bootstrap values were used to scale the trees.</p>
            <p>We also tested the NJ method <abbrgrp><abbr bid="B76">76</abbr></abbrgrp> using different models of distances implemented in PROTDIST. Here, it is important to keep in mind that the overall efficiency of the different methods of phylogenetic reconstruction in building the true tree vary with substitution rate, transition-transversion ratio, and sequence divergence <abbrgrp><abbr bid="B77">77</abbr><abbr bid="B78">78</abbr></abbrgrp>. With the particular material we studied, parsimony and NJ trees support the clustering of OTUs into clades and genera in gag-pol and pol analyses, and they are consistent in not supporting the monophyly of each group in gag analyses. However, parsimony phylogenies proved more consistent with comparative analyses than NJ trees when inferring phylogenies including or evaluating the gag and/or PR proteins. Parsimony analyses also reported better bootstrapping and were more consistent with the three <it>Retroviridae </it>classes than NJ analyses (NJ trees only support classes I and II).</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>(AIDS): Acquired Immune Deficiency Syndrome; (BLV): Bovine Leukemia Virus; (CA): Capsid; (CAEV): Caprine Arthritis Encephalitis Virus; (EFV): Equine Foamy Virus; (FeLV): Feline Leukemia Virus; (FFV): Feline Foamy Virus; (FSV): Feline Syncytial Virus; (ICTV): International Committee on Taxonomy of Viruses; (GyDB): Gypsy Database; (HIV): Human Immunodeficiency Virus; (HFV): Human Foamy Virus; (HTLV): Human T-cell Leukemia Virus; (HMM profile): Hidden Markov Model; (INT): Integrase; (LTR): Long terminal repeat; (LPDV): Lymphoproliferative Disease Virus; (MHR): Major homology region; (VMV): Maedi Visna Virus; (MRC): Majority-rule consensus; (MPMV): Mason-Pfizer Monkey Virus; (MA): Matrix; (MMTV): Mouse Mammary Tumor Virus; (NCBI): National Center of Biotechnology Information; (NJ): Neighbor joining; (NC): Nucleocapsid; (OTU): Operative taxonomical unit ; (SA-OMVV): Ovine Maedi Visna Virus; (PR): Protease; (RCSB): Research Collaboratory for Structural Bioinformatics; (RT): Reverse transcriptase; (RNAse H): Ribonuclease H; (RSV): Rous Sarcoma Virus; (SCSIE): Servei Central de Suport a la Investigaci&#243; Experimental; (SERV): Simian Endogenous Retrovirus of Mandrill; (SIVMAC): Simian Immunodeficiency Retrovirus of Macaques; Simian Foamy Virus (SFV); (SRV): Simian Retrovirus; (3D structure): Three-dimensional structure.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>CL and AM conceived and designed the study. CL performed the analyses and CL and MAF wrote the paper.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Javier Ortiz and Isaac Fernandez of the SCSIE at University of Valencia for technical support, and the 2 anonymous referees for their useful comments for improving the original manuscript. The GyDB project was awarded the NOVA 2006 by IMPIVA and Conselleria d'Empresa, Universitat i C&#236;encia of Valencia. The research has been partly supported by grants IMCBTA/2005/45, IMIDTD/2006/158 and IMIDTD/2007/33 from IMPIVA, by grant BFU2005-00503 from MEC to AM, and by financial grant 17092008 from ENISA (Empresa Nacional de Innovacion SA) to Biotechvana. Funding to pay the Open Access publication charges for this article was provided by University of Valencia.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Detection and isolation of type C retrovirus particles from fresh and cultured lymphocytes of a patient with cutaneous T-cell lymphoma</p>
            </title>
            <aug>
               <au>
                  <snm>Poiesz</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Ruscetti</snm>
                  <fnm>FW</fnm>
               </au>
               <au>
                  <snm>Gazdar</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Bunn</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Minna</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Gallo</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1980</pubdate>
            <volume>77</volume>
            <fpage>7415</fpage>
            <lpage>7419</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">350514</pubid>
                  <pubid idtype="pmpid">6261256</pubid>
                  <pubid idtype="doi">10.1073/pnas.77.12.7415</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Isolation and characterization of retrovirus from cell lines of human adult T-cell leukemia and its implication in the disease</p>
            </title>
            <aug>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Miyoshi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Hinuma</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1982</pubdate>
            <volume>79</volume>
            <fpage>2031</fpage>
            <lpage>2035</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">346116</pubid>
                  <pubid idtype="pmpid" link="fulltext">6979048</pubid>
                  <pubid idtype="doi">10.1073/pnas.79.6.2031</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS)</p>
            </title>
            <aug>
               <au>
                  <snm>Barre-Sinoussi</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Chermann</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Rey</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Nugeyre</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Chamaret</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gruest</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dauguet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>xler-Blin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vezinet-Brun</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Rouzioux</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1983</pubdate>
            <volume>220</volume>
            <fpage>868</fpage>
            <lpage>871</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.6189183</pubid>
                  <pubid idtype="pmpid" link="fulltext">6189183</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Frequent detection and isolation of cytopathic retroviruses (HTLV-III) from patients with AIDS and at risk for AIDS</p>
            </title>
            <aug>
               <au>
                  <snm>Gallo</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Salahuddin</snm>
                  <fnm>SZ</fnm>
               </au>
               <au>
                  <snm>Popovic</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shearer</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Kaplan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Haynes</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Palker</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Redfield</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Oleske</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Safai</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1984</pubdate>
            <volume>224</volume>
            <fpage>500</fpage>
            <lpage>503</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.6200936</pubid>
                  <pubid idtype="pmpid" link="fulltext">6200936</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Virus Taxonomy: the classification and nomenclature of viruses</p>
            </title>
            <aug>
               <au>
                  <snm>Van Regenmortel</snm>
                  <fnm>MHV</fnm>
               </au>
               <au>
                  <snm>Fauquet</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Bishop</snm>
                  <fnm>DHL</fnm>
               </au>
               <au>
                  <snm>Carstens</snm>
                  <fnm>EB</fnm>
               </au>
               <au>
                  <snm>Estes</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Lemon</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Maniloff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mayo</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>McGeoch</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Pringle</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Wickner</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <publisher>San Diego, California</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Initial sequencing and analysis of the human genome</p>
            </title>
            <aug>
               <au>
                  <cnm>International Human Genome Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <fpage>860</fpage>
            <lpage>921</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35057062</pubid>
                  <pubid idtype="pmpid" link="fulltext">11237011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Initial sequencing and analysis of the human genome</p>
            </title>
            <aug>
               <au>
                  <cnm>International Human Genome Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Endogenous Human Retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Wilkinson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Mager</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Leong</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>The Retroviridae</source>
            <publisher>New York, N.Y.: Plenum Press, Inc</publisher>
            <editor>Levy JA</editor>
            <pubdate>1994</pubdate>
            <volume>II</volume>
            <fpage>465</fpage>
            <lpage>535</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The evolution, distribution and diversity of endogenous retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Gifford</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tristem</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Virus Genes</source>
            <pubdate>2003</pubdate>
            <volume>26</volume>
            <fpage>291</fpage>
            <lpage>315</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1024455415443</pubid>
                  <pubid idtype="pmpid" link="fulltext">12876457</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Evolution and distribution of class II-related endogenous retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Gifford</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kabat</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lynch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tristem</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>2005</pubdate>
            <volume>79</volume>
            <fpage>6478</fpage>
            <lpage>6486</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1091674</pubid>
                  <pubid idtype="pmpid" link="fulltext">15858031</pubid>
                  <pubid idtype="doi">10.1128/JVI.79.10.6478-6486.2005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Origin and Evolution of retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Malik</snm>
                  <fnm>HS</fnm>
               </au>
            </aug>
            <source>Mobile DNA II</source>
            <publisher>Washington DC.: ASM Press</publisher>
            <editor>Craig NL, Craigie R, Gellert M, Lambowitz AM</editor>
            <pubdate>2002</pubdate>
            <fpage>1111</fpage>
            <lpage>1144</lpage>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Origin and evolution of retroelements based upon their reverse transcriptase sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Xiong</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1990</pubdate>
            <volume>9</volume>
            <fpage>3353</fpage>
            <lpage>3362</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">552073</pubid>
                  <pubid idtype="pmpid">1698615</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data</p>
            </title>
            <aug>
               <au>
                  <snm>Marin</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Llorens</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2000</pubdate>
            <volume>17</volume>
            <fpage>1040</fpage>
            <lpage>1049</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10889217</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>Malik</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>J Virol</source>
            <pubdate>1999</pubdate>
            <volume>73</volume>
            <fpage>5186</fpage>
            <lpage>5190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">112568</pubid>
                  <pubid idtype="pmpid" link="fulltext">10233986</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Malik</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1187</fpage>
            <lpage>1197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.185101</pubid>
                  <pubid idtype="pmpid" link="fulltext">11435400</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A mammalian gene evolved from the integrase domain of an LTR retrotransposon</p>
            </title>
            <aug>
               <au>
                  <snm>Llorens</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Marin</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>1597</fpage>
            <lpage>1600</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11470852</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>The Gypsy Database (GyDB) of Mobile Genetic Elements</p>
            </title>
            <aug>
               <au>
                  <snm>Llorens</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Futami</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bezemer</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Moya</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research (NAR)</source>
            <pubdate>2008</pubdate>
            <volume>36</volume>
            <fpage>38</fpage>
            <lpage>46</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pubmed">17895280</pubid>
                  <pubid idtype="doi">10.1093/nar/gkm697</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Profile hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>755</fpage>
            <lpage>763</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid>
                  <pubid idtype="pmpid" link="fulltext">9918945</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Potential retroviruses in plants: Tat1 is related to a group of Arabidopsis thaliana Ty3/gypsy retrotransposons that encode envelope-like proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Wright</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Voytas</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1998</pubdate>
            <volume>149</volume>
            <fpage>703</fpage>
            <lpage>715</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1460185</pubid>
                  <pubid idtype="pmpid" link="fulltext">9611185</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Athila4 of Arabidopsis and Calypso of soybean define a lineage of endogenous plant retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Wright</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Voytas</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>122</fpage>
            <lpage>131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">155253</pubid>
                  <pubid idtype="pmpid" link="fulltext">11779837</pubid>
                  <pubid idtype="doi">10.1101/gr.196001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>CsRn1, a novel active retrotransposon in a parasitic trematode, Clonorchis sinensis, discloses a new phylogenetic clade of Ty3/gypsy-like LTR retrotransposons</p>
            </title>
            <aug>
               <au>
                  <snm>Bae</snm>
                  <fnm>YA</fnm>
               </au>
               <au>
                  <snm>Moon</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Kong</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Rhyu</snm>
                  <fnm>MG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>1474</fpage>
            <lpage>1483</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11470838</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements</p>
            </title>
            <aug>
               <au>
                  <snm>Bowen</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>924</fpage>
            <lpage>935</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.9.10.924</pubid>
                  <pubid idtype="pmpid" link="fulltext">10523521</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Evolutionary genomics of chromoviruses in eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Gorinsek</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gubensek</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kordis</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>781</fpage>
            <lpage>798</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh057</pubid>
                  <pubid idtype="pmpid" link="fulltext">14739248</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Active gypsy/Ty3 retrotransposons or retroviruses in Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Britten</snm>
                  <fnm>RJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1995</pubdate>
            <volume>92</volume>
            <fpage>599</fpage>
            <lpage>601</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">42789</pubid>
                  <pubid idtype="pmpid" link="fulltext">7530364</pubid>
                  <pubid idtype="doi">10.1073/pnas.92.2.599</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Evolutionary History of Cer Elements and Their Impact on the C.elegans genome</p>
            </title>
            <aug>
               <au>
                  <snm>Ganko</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Fielman</snm>
                  <fnm>KT</fnm>
               </au>
               <au>
                  <snm>MacDonald</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>2066</fpage>
            <lpage>2074</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311226</pubid>
                  <pubid idtype="pmpid" link="fulltext">11731497</pubid>
                  <pubid idtype="doi">10.1101/gr.196201</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Virus taxonomy, The Universal System of Virus Taxonomy, updated to include the new proposals ratified by the International Committee on Taxonomy of Viruses during 1998</p>
            </title>
            <aug>
               <au>
                  <snm>Pringle</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Archives of Virology</source>
            <pubdate>1999</pubdate>
            <volume>144</volume>
            <fpage>421</fpage>
            <lpage>429</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s007050050515</pubid>
                  <pubid idtype="pmpid" link="fulltext">10470265</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Metaviridae</p>
            </title>
            <aug>
               <au>
                  <snm>Boeke</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Sandmeyer</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Voytas</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Virus Taxonomy: ICTV VIIth report</source>
            <publisher>Springer-Verlag, New York</publisher>
            <pubdate>1999</pubdate>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Classification of reverse transcribing elements: a discussion document</p>
            </title>
            <aug>
               <au>
                  <snm>Hull</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Archives of Virology</source>
            <pubdate>1999</pubdate>
            <volume>144</volume>
            <fpage>209</fpage>
            <lpage>214</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s007050050498</pubid>
                  <pubid idtype="pmpid" link="fulltext">10076522</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Pyret, a Ty3/Gypsy retrotransposon in Magnaporthe grisea contains an extra domain between the nucleocapsid and protease domains</p>
            </title>
            <aug>
               <au>
                  <snm>Nakayashiki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Matsuo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Chuma</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ikeda</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Betsuyaku</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kusaba</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tosa</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Mayama</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>4106</fpage>
            <lpage>4113</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">60222</pubid>
                  <pubid idtype="pmpid" link="fulltext">11600699</pubid>
                  <pubid idtype="doi">10.1093/nar/29.20.4106</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>A retroviral Cys-Xaa2-Cys-Xaa4-His-Xaa4-Cys peptide binds metal ions: spectroscopic studies and a proposed three-dimensional structure</p>
            </title>
            <aug>
               <au>
                  <snm>Green</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Berg</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1989</pubdate>
            <volume>86</volume>
            <fpage>4047</fpage>
            <lpage>4051</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">287385</pubid>
                  <pubid idtype="pmpid" link="fulltext">2786206</pubid>
                  <pubid idtype="doi">10.1073/pnas.86.11.4047</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Phylogenomic analysis of chromoviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Gorinsek</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gubensek</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kordis</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Cytogenet Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>110</volume>
            <fpage>543</fpage>
            <lpage>552</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1159/000084987</pubid>
                  <pubid idtype="pmpid" link="fulltext">16093707</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>A genomic perspective on the chromodomain-containing retrotransposons: Chromoviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Kordis</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2005</pubdate>
            <volume>347</volume>
            <fpage>161</fpage>
            <lpage>173</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2004.12.017</pubid>
                  <pubid idtype="pmpid" link="fulltext">15777633</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>MEROPS: the peptidase database</p>
            </title>
            <aug>
               <au>
                  <snm>Rawlings</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Tolle</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D160</fpage>
            <lpage>D164</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308805</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681384</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Structural and biochemical studies of retroviral proteases</p>
            </title>
            <aug>
               <au>
                  <snm>Wlodawer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gustchina</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>2000</pubdate>
            <volume>1477</volume>
            <fpage>16</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10708846</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>An unusual vertebrate LTR retrotransposon from the cod Gadus morhua</p>
            </title>
            <aug>
               <au>
                  <snm>Butler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Goodwin</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Poulter</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>443</fpage>
            <lpage>447</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11230547</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>A group of deuterostome Ty3/gypsy-like retrotransposons with Ty1/copia-like pol-domain orders</p>
            </title>
            <aug>
               <au>
                  <snm>Goodwin</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Poulter</snm>
                  <fnm>RT</fnm>
               </au>
            </aug>
            <source>Mol Genet Genomics</source>
            <pubdate>2002</pubdate>
            <volume>267</volume>
            <fpage>481</fpage>
            <lpage>491</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00438-002-0679-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12111555</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Solution structure of the DNA binding domain of HIV-1 integrase</p>
            </title>
            <aug>
               <au>
                  <snm>Lodi</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Ernst</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Kuszewski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hickman</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Engelman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Craigie</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Clore</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Gronenborn</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1995</pubdate>
            <volume>34</volume>
            <fpage>9826</fpage>
            <lpage>9833</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi00031a002</pubid>
                  <pubid idtype="pmpid">7632683</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Bacterial transposases and retroviral integrases</p>
            </title>
            <aug>
               <au>
                  <snm>Polard</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Chandler</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>1995</pubdate>
            <volume>15</volume>
            <fpage>13</fpage>
            <lpage>23</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.1995.tb02217.x</pubid>
                  <pubid idtype="pmpid">7752887</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Retroviral integrase domains: DNA binding and the recognition of LTR sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Khan</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Mack</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Katz</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kulkosky</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Skalka</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1991</pubdate>
            <volume>19</volume>
            <fpage>851</fpage>
            <lpage>860</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">333721</pubid>
                  <pubid idtype="pmpid" link="fulltext">1850126</pubid>
                  <pubid idtype="doi">10.1093/nar/19.4.851</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Origin and evolutionary relationships of LTR retroelements</p>
            </title>
            <aug>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>The evolutionary Biology of viruses</source>
            <publisher>New York: Raven</publisher>
            <editor>Morse SS</editor>
            <pubdate>1994</pubdate>
            <fpage>121</fpage>
            <lpage>157</lpage>
         </bibl>
         <bibl id="B42">
            <title>
               <p>The origins of genome complexity</p>
            </title>
            <aug>
               <au>
                  <snm>Lynch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Conery</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <fpage>1401</fpage>
            <lpage>1404</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1089370</pubid>
                  <pubid idtype="pmpid" link="fulltext">14631042</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Evidence for the contribution of LTR retrotransposons to C. elegans gene evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Ganko</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Bhattacharjee</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Schliekelman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>1925</fpage>
            <lpage>1931</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msg200</pubid>
                  <pubid idtype="pmpid" link="fulltext">12885961</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Transposable elements as a source of genetic innovation: expression and evolution of a family of retrotransposon-derived neogenes in mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Brandt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schrauth</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Veith</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Froschauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Haneke</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schultheis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gessler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Leimeister</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Volff</snm>
                  <fnm>JN</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2005</pubdate>
            <volume>345</volume>
            <fpage>101</fpage>
            <lpage>111</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2004.11.022</pubid>
                  <pubid idtype="pmpid" link="fulltext">15716091</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Repetitive sequences in complex genomes: structure and evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Jurka</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kapitonov</snm>
                  <fnm>VV</fnm>
               </au>
               <au>
                  <snm>Kohany</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Jurka</snm>
                  <fnm>MV</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>241</fpage>
            <lpage>259</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.8.080706.092416</pubid>
                  <pubid idtype="pmpid" link="fulltext">17506661</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Volff</snm>
                  <fnm>JN</fnm>
               </au>
            </aug>
            <source>Bioessays</source>
            <pubdate>2006</pubdate>
            <volume>28</volume>
            <fpage>913</fpage>
            <lpage>922</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/bies.20452</pubid>
                  <pubid idtype="pmpid" link="fulltext">16937363</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Mobile elements: drivers of genome evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Kazazian</snm>
                  <fnm>HH</fnm>
                  <suf>Jr</suf>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>303</volume>
            <fpage>1626</fpage>
            <lpage>1632</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1089670</pubid>
                  <pubid idtype="pmpid" link="fulltext">15016989</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Selfish genetic elements and speciation</p>
            </title>
            <aug>
               <au>
                  <snm>Hurst</snm>
                  <fnm>GDD</fnm>
               </au>
               <au>
                  <snm>Schilthuizen</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Heredity</source>
            <pubdate>1998</pubdate>
            <volume>80</volume>
            <fpage>2</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1046/j.1365-2540.1998.00337.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Nucleotide sequence of Mason-Pfizer monkey virus: an immunosuppressive D-type retrovirus</p>
            </title>
            <aug>
               <au>
                  <snm>Sonigo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hunter</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Wain-Hobson</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1986</pubdate>
            <volume>45</volume>
            <fpage>375</fpage>
            <lpage>385</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(86)90323-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">2421920</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Travel and the spread of HIV-1 genetic variants</p>
            </title>
            <aug>
               <au>
                  <snm>Perrin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kaiser</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Yerly</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Lancet Infect Dis</source>
            <pubdate>2003</pubdate>
            <volume>3</volume>
            <fpage>22</fpage>
            <lpage>27</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1473-3099(03)00484-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">12505029</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>The causes and consequences of HIV evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Rambaut</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Posada</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Crandall</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Holmes</snm>
                  <fnm>EC</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>52</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1246</pubid>
                  <pubid idtype="pmpid" link="fulltext">14708016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>6th International Symposium on Retroviral Nucleocapsid</p>
            </title>
            <aug>
               <au>
                  <snm>Berkhout</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gorelick</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Summers</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Mely</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Darlix</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Retrovirology</source>
            <pubdate>2008</pubdate>
            <volume>5</volume>
            <fpage>21</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2276516</pubid>
                  <pubid idtype="pmpid" link="fulltext">18298807</pubid>
                  <pubid idtype="doi">10.1186/1742-4690-5-21</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Evolutionarily conserved functional mechanics across pepsin-like and retroviral aspartic proteases</p>
            </title>
            <aug>
               <au>
                  <snm>Cascella</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Micheletti</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rothlisberger</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Carloni</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Am Chem Soc</source>
            <pubdate>2005</pubdate>
            <volume>127</volume>
            <fpage>3734</fpage>
            <lpage>3742</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/ja044608+</pubid>
                  <pubid idtype="pmpid" link="fulltext">15771507</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>HIV-1 protease flaps spontaneously close to the correct structure in simulations following manual placement of an inhibitor into the open state</p>
            </title>
            <aug>
               <au>
                  <snm>Hornak</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Okur</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rizzo</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Simmerling</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Am Chem Soc</source>
            <pubdate>2006</pubdate>
            <volume>128</volume>
            <fpage>2812</fpage>
            <lpage>2813</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2555982</pubid>
                  <pubid idtype="pmpid" link="fulltext">16506755</pubid>
                  <pubid idtype="doi">10.1021/ja058211x</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>High-frequency homologous recombination in plants mediated by zinc-finger nucleases</p>
            </title>
            <aug>
               <au>
                  <snm>Wright</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Townsend</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Winfrey</snm>
                  <fnm>RJ</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Irwin</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Rajagopal</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lonosky</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hall</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Jondle</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Voytas</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Plant J</source>
            <pubdate>2005</pubdate>
            <volume>44</volume>
            <fpage>693</fpage>
            <lpage>705</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-313X.2005.02551.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16262717</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion</p>
            </title>
            <aug>
               <au>
                  <snm>Singleton</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Levin</snm>
                  <fnm>HL</fnm>
               </au>
            </aug>
            <source>Eukaryot Cell</source>
            <pubdate>2002</pubdate>
            <volume>1</volume>
            <fpage>44</fpage>
            <lpage>55</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">118054</pubid>
                  <pubid idtype="pmpid" link="fulltext">12455970</pubid>
                  <pubid idtype="doi">10.1128/EC.01.1.44-55.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Quasispecies theory in the context of population genetics</p>
            </title>
            <aug>
               <au>
                  <snm>Wilke</snm>
                  <fnm>CO</fnm>
               </au>
            </aug>
            <source>BMC Evolutionary Biology</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <fpage>44</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1208876</pubid>
                  <pubid idtype="pmpid" link="fulltext">16107214</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-5-44</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>National Center of Biotechnology Information</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov</url>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Gag-pol tree</p>
            </title>
            <url>http://gydb.uv.es/gydb/phylogeny.php?tree=gagpol</url>
         </bibl>
         <bibl id="B60">
            <title>
               <p>The GyDB collection: Ty3/Gypsy and Retroviridae LTR retroelements and related nonviral proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Llorens</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Futami</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Moya</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Biotechvana Bioinformatics</source>
            <publisher>CR: GyDB Collection</publisher>
            <pubdate>2008</pubdate>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Gag-pol multiple alignment URL</p>
            </title>
            <url>http://gydb.uv.es/biotechvana/collection/alignment.php?alignment=GAGPOL_retroelement&amp;format=htm</url>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Genedoc</p>
            </title>
            <url>http://www.nrbsc.org/gfx/genedoc/index.html</url>
         </bibl>
         <bibl id="B63">
            <title>
               <p>The CheckAlign logo-maker application in analyses of both gapped and ungapped DNA and protein alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Llorens</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Futami</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vicente-Ripolles</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moya</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Biotechvana Bioinformatics</source>
            <publisher>SOFT: CheckAlign</publisher>
            <pubdate>2008</pubdate>
         </bibl>
         <bibl id="B64">
            <title>
               <p>The mathematical theory of communication. 1963</p>
            </title>
            <aug>
               <au>
                  <snm>Shannon</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>MD Comput</source>
            <pubdate>1997</pubdate>
            <volume>14</volume>
            <fpage>306</fpage>
            <lpage>317</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9230594</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Sequence Logos &#8211; A New Way to Display Consensus Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Schneider</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Stephens</snm>
                  <fnm>RM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1990</pubdate>
            <volume>18</volume>
            <fpage>6097</fpage>
            <lpage>6100</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">332411</pubid>
                  <pubid idtype="pmpid" link="fulltext">2172928</pubid>
                  <pubid idtype="doi">10.1093/nar/18.20.6097</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>Information content of binding sites on nucleotide sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Schneider</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Gold</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ehrenfeucht</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1986</pubdate>
            <volume>188</volume>
            <fpage>415</fpage>
            <lpage>431</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(86)90165-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">3525846</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>Consensus sequence Zen</p>
            </title>
            <aug>
               <au>
                  <snm>Schneider</snm>
                  <fnm>TD</fnm>
               </au>
            </aug>
            <source>Appl Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>1</volume>
            <fpage>111</fpage>
            <lpage>119</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1852464</pubid>
                  <pubid idtype="pmpid">15130839</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Tom Schneider Web site</p>
            </title>
            <url>http://www-lecb.ncifcrf.gov/~toms/</url>
         </bibl>
         <bibl id="B69">
            <title>
               <p>Hydrophilic peptides derived from the transframe region of Gag-Pol inhibit the HIV-1 protease</p>
            </title>
            <aug>
               <au>
                  <snm>Louis</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Dyda</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Nashed</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Kimmel</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1998</pubdate>
            <volume>37</volume>
            <fpage>2105</fpage>
            <lpage>2110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi972059x</pubid>
                  <pubid idtype="pmpid" link="fulltext">9485357</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B70">
            <title>
               <p>SWISS-MODEL: An automated protein homology-modeling server</p>
            </title>
            <aug>
               <au>
                  <snm>Schwede</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kopp</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Guex</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Peitsch</snm>
                  <fnm>MC</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>3381</fpage>
            <lpage>3385</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">168927</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824332</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg520</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B71">
            <title>
               <p>RCSB Protein Data Bank</p>
            </title>
            <url>http://www.rcsb.org/pdb/home/home.do</url>
         </bibl>
         <bibl id="B72">
            <title>
               <p>PHYLIP package of programs for inferring phylogenies. Version 3.6a3</p>
            </title>
            <url>http://evolution.genetics.washington.edu/phylip.html</url>
         </bibl>
         <bibl id="B73">
            <title>
               <p>Atlas of Protein Sequence and Structure</p>
            </title>
            <aug>
               <au>
                  <snm>Eck</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Dayhoff</snm>
                  <fnm>MO</fnm>
               </au>
            </aug>
            <publisher>National Biomedical Research Foundation, Silver Spring, Maryland</publisher>
            <pubdate>1966</pubdate>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology</p>
            </title>
            <aug>
               <au>
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Systematic Zoology</source>
            <pubdate>1971</pubdate>
            <volume>20</volume>
            <fpage>406</fpage>
            <lpage>416</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2412116</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>Consensus n-trees</p>
            </title>
            <aug>
               <au>
                  <snm>Margus</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>McMorris</snm>
                  <fnm>FR</fnm>
               </au>
            </aug>
            <source>Bull Math Biol</source>
            <pubdate>1981</pubdate>
            <volume>43</volume>
            <fpage>239</fpage>
            <lpage>244</lpage>
         </bibl>
         <bibl id="B76">
            <title>
               <p>The neighbor-joining method: a new method for reconstructing phylogenetic trees</p>
            </title>
            <aug>
               <au>
                  <snm>Saitou</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1987</pubdate>
            <volume>4</volume>
            <fpage>406</fpage>
            <lpage>425</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3447015</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>Phylogenetic inference, DNA sequence analysis, and the future of molecular systematics</p>
            </title>
            <aug>
               <au>
                  <snm>Miyamoto</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Cracraft</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <publisher>Oxford University Press, Oxford, England</publisher>
            <pubdate>1991</pubdate>
         </bibl>
         <bibl id="B78">
            <title>
               <p>Molecular evolution and phylogenetics</p>
            </title>
            <aug>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <publisher>Oxford University Press, Oxford, England</publisher>
            <pubdate>2000</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>
